8. Boltz-2
Boltz-2 (paper, code) is a biomolecular foundation model that jointly models complex structures and binding affinities. It’s the first deep learning model to approach the accuracy of physics-based free-energy perturbation (FEP) methods while running 1000x faster.
Why Use Boltz-2?
- Structure + Affinity: Predict both binding pose and binding strength
- Drug discovery ready: Affinity predictions useful for hit-to-lead optimization
- Multi-modal: Handles proteins, nucleic acids, small molecules, covalent modifications
- Speed: 1000x faster than FEP methods for affinity prediction
Related Tools: For structure prediction only, see Chai-1. For protein-ligand docking, see PLACER or DiffDock-PP.
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 16 GB | 32+ GB | Scales with complex size |
| CPU RAM | 16 GB | 32 GB | For preprocessing |
| Disk Space | 5 GB | 10 GB | Model weights |
| Python | 3.9+ | 3.11 | Required |
Preparation
Mark as complete
Prerequisites:
- Completed HPC Setup guide
- Conda/Mamba installed
- CUDA-capable GPU (recommended) or CPU
Important: Install Boltz in a fresh Python environment to avoid dependency conflicts.
Installation
Mark as complete
- Create a fresh environment:
mamba create -n boltz python=3.11
mamba activate boltz- Install Boltz with CUDA support:
pip install boltz[cuda] -UFor CPU-only or non-CUDA GPUs:
pip install boltz -UAlternative: Install from GitHub (for latest updates):
git clone https://github.com/jwohlwend/boltz.git
cd boltz
pip install -e .[cuda]Testing the Installation
Mark as complete
Create a test YAML file test_input.yaml:
version: 1
sequences:
- protein:
id: [A, B]
sequence: MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGGRun prediction:
boltz predict test_input.yaml --use_msa_serverSuccess indicators:
- Command completes without errors
- Output directory contains:
- Predicted structure files (CIF format)
- Confidence scores
Expected runtime: 1-3 minutes for this small test.
HPC Job Script
#!/bin/bash
#SBATCH --job-name=boltz
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out
module load cuda/12.1
# source ~/.bashrc
mamba activate boltz
# Run prediction
boltz predict my_complex.yaml --use_msa_server --out_dir results/Usage Examples
Structure prediction only:
boltz predict structure.yamlWith MSA server (higher accuracy):
boltz predict input.yaml --use_msa_serverWith affinity prediction:
# input.yaml
version: 1
sequences:
- protein:
id: A
sequence: MKTVRQERLK...
- ligand:
id: L
smiles: "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
properties:
- affinityboltz predict input.yamlInput Format (YAML)
Boltz uses YAML files to describe biomolecules:
Simple protein:
version: 1
sequences:
- protein:
id: A
sequence: MKTVRQERLKSIVRILERSKEPVSG...Protein-ligand complex:
version: 1
sequences:
- protein:
id: A
sequence: MKTVRQERLK...
- ligand:
id: L
smiles: "CCO"Protein complex (homodimer):
version: 1
sequences:
- protein:
id: [A, B] # Same sequence for both chains
sequence: MKTVRQERLK...With affinity prediction:
version: 1
sequences:
- protein:
id: A
sequence: MKTVRQERLK...
- ligand:
id: L
smiles: "CC(=O)NC1=CC=C(O)C=C1"
properties:
- affinitySee prediction documentation for full format details.
Binding Affinity Predictions
Boltz-2 provides two affinity metrics:
| Metric | Range | Use Case |
|---|---|---|
affinity_probability_binary |
0-1 | Hit discovery - probability that ligand is a binder |
affinity_pred_value |
log10(IC50) in μM | Lead optimization - compare binding strengths |
Interpretation:
affinity_probability_binary: Higher = more likely to bindaffinity_pred_value: Lower = stronger binding (lower IC50)
MSA Server Authentication
For servers requiring authentication:
export BOLTZ_MSA_TOKEN="your_token_here"
boltz predict input.yaml --use_msa_serverUnderstanding the Output
Output directory structure:
boltz_results_<input>/
├── predictions/
│ ├── model_0.cif # Predicted structure
│ └── confidence.json # Confidence scores
├── msa/ # Generated MSAs (if using server)
└── affinity/ # Affinity predictions (if requested)
Confidence metrics:
- pLDDT: Per-residue confidence
- pTM: Predicted TM-score
- interface pTM: For complexes
Performance Comparison
| Method | Speed | Affinity Accuracy |
|---|---|---|
| FEP (physics-based) | Hours-days | Gold standard |
| Boltz-2 | Seconds-minutes | Comparable to FEP |
| Traditional docking | Seconds | Lower accuracy |
Troubleshooting
Installation issues:
- Use a fresh environment
- Try removing
[cuda]if CUDA issues arise - Verify CUDA version compatibility
“MSA server error”:
- Check network connectivity
- Verify authentication token if required
- Try without
--use_msa_serverfor testing
Out of memory:
- Request more GPU memory
- Reduce complex size
- Try CPU-only mode for testing
Slow without GPU:
- CPU mode is functional but significantly slower
- Always use GPU for production runs
YAML parsing errors:
- Check YAML syntax (indentation matters)
- Ensure SMILES strings are quoted
- Verify sequence format