8. Boltz-2

Boltz-2 (paper, code) is a biomolecular foundation model that jointly models complex structures and binding affinities. It’s the first deep learning model to approach the accuracy of physics-based free-energy perturbation (FEP) methods while running 1000x faster.

Why Use Boltz-2?

  • Structure + Affinity: Predict both binding pose and binding strength
  • Drug discovery ready: Affinity predictions useful for hit-to-lead optimization
  • Multi-modal: Handles proteins, nucleic acids, small molecules, covalent modifications
  • Speed: 1000x faster than FEP methods for affinity prediction

Related Tools: For structure prediction only, see Chai-1. For protein-ligand docking, see PLACER or DiffDock-PP.

Resource Requirements

Resource Minimum Recommended Notes
GPU RAM 16 GB 32+ GB Scales with complex size
CPU RAM 16 GB 32 GB For preprocessing
Disk Space 5 GB 10 GB Model weights
Python 3.9+ 3.11 Required

Preparation

Mark as complete

Prerequisites:

  • Completed HPC Setup guide
  • Conda/Mamba installed
  • CUDA-capable GPU (recommended) or CPU

Important: Install Boltz in a fresh Python environment to avoid dependency conflicts.

Installation

Mark as complete

  1. Create a fresh environment:
mamba create -n boltz python=3.11
mamba activate boltz
  1. Install Boltz with CUDA support:
pip install boltz[cuda] -U

For CPU-only or non-CUDA GPUs:

pip install boltz -U

Alternative: Install from GitHub (for latest updates):

git clone https://github.com/jwohlwend/boltz.git
cd boltz
pip install -e .[cuda]

Testing the Installation

Mark as complete

Create a test YAML file test_input.yaml:

version: 1
sequences:
  - protein:
      id: [A, B]
      sequence: MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Run prediction:

boltz predict test_input.yaml --use_msa_server

Success indicators:

  • Command completes without errors
  • Output directory contains:
    • Predicted structure files (CIF format)
    • Confidence scores

Expected runtime: 1-3 minutes for this small test.

HPC Job Script

#!/bin/bash
#SBATCH --job-name=boltz
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

module load cuda/12.1

# source ~/.bashrc
mamba activate boltz

# Run prediction
boltz predict my_complex.yaml --use_msa_server --out_dir results/

Usage Examples

Structure prediction only:

boltz predict structure.yaml

With MSA server (higher accuracy):

boltz predict input.yaml --use_msa_server

With affinity prediction:

# input.yaml
version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
properties:
  - affinity
boltz predict input.yaml

Input Format (YAML)

Boltz uses YAML files to describe biomolecules:

Simple protein:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLKSIVRILERSKEPVSG...

Protein-ligand complex:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CCO"

Protein complex (homodimer):

version: 1
sequences:
  - protein:
      id: [A, B]  # Same sequence for both chains
      sequence: MKTVRQERLK...

With affinity prediction:

version: 1
sequences:
  - protein:
      id: A
      sequence: MKTVRQERLK...
  - ligand:
      id: L
      smiles: "CC(=O)NC1=CC=C(O)C=C1"
properties:
  - affinity

See prediction documentation for full format details.

Binding Affinity Predictions

Boltz-2 provides two affinity metrics:

Metric Range Use Case
affinity_probability_binary 0-1 Hit discovery - probability that ligand is a binder
affinity_pred_value log10(IC50) in μM Lead optimization - compare binding strengths

Interpretation:

  • affinity_probability_binary: Higher = more likely to bind
  • affinity_pred_value: Lower = stronger binding (lower IC50)

MSA Server Authentication

For servers requiring authentication:

export BOLTZ_MSA_TOKEN="your_token_here"
boltz predict input.yaml --use_msa_server

Understanding the Output

Output directory structure:

boltz_results_<input>/
├── predictions/
│   ├── model_0.cif      # Predicted structure
│   └── confidence.json  # Confidence scores
├── msa/                 # Generated MSAs (if using server)
└── affinity/            # Affinity predictions (if requested)

Confidence metrics:

  • pLDDT: Per-residue confidence
  • pTM: Predicted TM-score
  • interface pTM: For complexes

Performance Comparison

Method Speed Affinity Accuracy
FEP (physics-based) Hours-days Gold standard
Boltz-2 Seconds-minutes Comparable to FEP
Traditional docking Seconds Lower accuracy

Troubleshooting

WarningCommon Issues

Installation issues:

  • Use a fresh environment
  • Try removing [cuda] if CUDA issues arise
  • Verify CUDA version compatibility

“MSA server error”:

  • Check network connectivity
  • Verify authentication token if required
  • Try without --use_msa_server for testing

Out of memory:

  • Request more GPU memory
  • Reduce complex size
  • Try CPU-only mode for testing

Slow without GPU:

  • CPU mode is functional but significantly slower
  • Always use GPU for production runs

YAML parsing errors:

  • Check YAML syntax (indentation matters)
  • Ensure SMILES strings are quoted
  • Verify sequence format