8. Boltz-2

Boltz-2 (paper, code) is a biomolecular foundation model that jointly models complex structures and binding affinities. It’s the first deep learning model to approach the accuracy of physics-based free-energy perturbation (FEP) methods while running 1000x faster.

Why Use Boltz-2?

Structure + Affinity: Predict both binding pose and binding strength
Drug discovery ready: Affinity predictions useful for hit-to-lead optimization
Multi-modal: Handles proteins, nucleic acids, small molecules, covalent modifications
Speed: 1000x faster than FEP methods for affinity prediction

Related Tools: For structure prediction only, see Chai-1. For protein-ligand docking, see PLACER or DiffDock-PP.

Resource Requirements

Resource	Minimum	Recommended	Notes
GPU RAM	16 GB	32+ GB	Scales with complex size
CPU RAM	16 GB	32 GB	For preprocessing
Disk Space	5 GB	10 GB	Model weights
Python	3.9+	3.11	Required

Preparation

Mark as complete

Prerequisites:

Completed HPC Setup guide
Conda/Mamba installed
CUDA-capable GPU (recommended) or CPU

Important: Install Boltz in a fresh Python environment to avoid dependency conflicts.

Installation

Mark as complete

Create a fresh environment:

mamba create -n boltz python=3.11
mamba activate boltz

Install Boltz with CUDA support:

pip install boltz[cuda] -U

For CPU-only or non-CUDA GPUs:

pip install boltz -U

Alternative: Install from GitHub (for latest updates):

git clone https://github.com/jwohlwend/boltz.git
cd boltz
pip install -e .[cuda]

Testing the Installation

Mark as complete

Create a test YAML file test_input.yaml:

version: 1
sequences:
  - protein:
      id: [A, B]
      sequence: MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Run prediction:

boltz predict test_input.yaml --use_msa_server

Success indicators:

Command completes without errors
Output directory contains:
- Predicted structure files (CIF format)
- Confidence scores

Expected runtime: 1-3 minutes for this small test.

HPC Job Script

#!/bin/bash
#SBATCH --job-name=boltz
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

module load cuda/12.1

# source ~/.bashrc
mamba activate boltz

# Run prediction
boltz predict my_complex.yaml --use_msa_server --out_dir results/

Usage Examples

Structure prediction only:

boltz predict structure.yaml

With MSA server (higher accuracy):

boltz predict input.yaml --use_msa_server

With affinity prediction:

# input.yaml
version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
properties:
  - affinity

boltz predict input.yaml

Input Format (YAML)

Boltz uses YAML files to describe biomolecules:

Simple protein:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLKSIVRILERSKEPVSG...

Protein-ligand complex:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CCO"

Protein complex (homodimer):

version: 1
sequences:
  - protein:
      id: [A, B]  # Same sequence for both chains
      sequence: MKTVRQERLK...

With affinity prediction:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CC(=O)NC1=CC=C(O)C=C1"
properties:
  - affinity

See prediction documentation for full format details.

Binding Affinity Predictions

Boltz-2 provides two affinity metrics:

Metric	Range	Use Case
`affinity_probability_binary`	0-1	Hit discovery - probability that ligand is a binder
`affinity_pred_value`	log10(IC50) in μM	Lead optimization - compare binding strengths

Interpretation:

affinity_probability_binary: Higher = more likely to bind
affinity_pred_value: Lower = stronger binding (lower IC50)

MSA Server Authentication

For servers requiring authentication:

export BOLTZ_MSA_TOKEN="your_token_here"
boltz predict input.yaml --use_msa_server

Understanding the Output

Output directory structure:

boltz_results_<input>/
├── predictions/
│   ├── model_0.cif      # Predicted structure
│   └── confidence.json  # Confidence scores
├── msa/                 # Generated MSAs (if using server)
└── affinity/            # Affinity predictions (if requested)

Confidence metrics:

pLDDT: Per-residue confidence
pTM: Predicted TM-score
interface pTM: For complexes

Performance Comparison

Method	Speed	Affinity Accuracy
FEP (physics-based)	Hours-days	Gold standard
Boltz-2	Seconds-minutes	Comparable to FEP
Traditional docking	Seconds	Lower accuracy

Troubleshooting

Common Issues

Installation issues:

Use a fresh environment
Try removing [cuda] if CUDA issues arise
Verify CUDA version compatibility

“MSA server error”:

Check network connectivity
Verify authentication token if required
Try without --use_msa_server for testing

Out of memory:

Request more GPU memory
Reduce complex size
Try CPU-only mode for testing

Slow without GPU:

CPU mode is functional but significantly slower
Always use GPU for production runs

YAML parsing errors:

Check YAML syntax (indentation matters)
Ensure SMILES strings are quoted
Verify sequence format