7. Chai-1
Chai-1 (paper, code) is a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across diverse benchmarks. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.
Why Use Chai-1?
- Multi-modal: Predict proteins, nucleic acids, small molecules, and modifications in one model
- State-of-the-art: Top performance on structure prediction benchmarks
- Flexible inputs: Handles complex multi-component assemblies
- Experimental restraints: Can incorporate known distance constraints
Related Tools: For protein-only predictions, see ESMFold (faster) or LocalColabFold (MSA-based). For binding affinity predictions, see Boltz-2.
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 24 GB | 80 GB | A100 80GB or H100 ideal |
| CPU RAM | 32 GB | 64 GB | For preprocessing |
| Disk Space | 10 GB | 20 GB | Model weights |
| Python | 3.10+ | 3.11 | Required |
GPU Compatibility: Requires bfloat16 support. Compatible GPUs include:
- A100, H100, L40S (recommended)
- A10, A30, RTX 4090 (works)
- Older GPUs may not support bfloat16
Preparation
Mark as complete
Prerequisites:
- Completed HPC Setup guide
- GPU with bfloat16 support
- Python 3.10+
Verify bfloat16 support:
import torch
print(torch.cuda.is_bf16_supported()) # Should print TrueInstallation
Mark as complete
- Create a conda environment:
mamba create -n chailab python=3.11
mamba activate chailab- Install Chai-1:
pip install chai_lab==0.6.1Expected download: ~5-10 GB of model weights (downloaded on first run).
Alternative: Latest development version:
pip install git+https://github.com/chaidiscovery/chai-lab.gitTesting the Installation
Mark as complete
Create a test FASTA file test.fasta:
>protein|name=example
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
Run prediction:
chai-lab fold test.fasta output_folder/Success indicators:
- Command completes without errors
output_folder/contains:pred.model_idx_0.cif- Predicted structurescores.model_idx_0.npz- Confidence scores
Expected runtime: 2-5 minutes for first run (includes model download), ~30 seconds for subsequent runs.
Note: By default, this generates 5 sample predictions using embeddings without MSAs.
HPC Job Script
#!/bin/bash
#SBATCH --job-name=chai
#SBATCH --partition=gpu
#SBATCH --gpus=a100:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out
module load cuda/12.1
# source ~/.bashrc
mamba activate chailab
# Set custom download directory (avoid filling home)
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_models
# Run prediction with MSAs
chai-lab fold --use-msa-server --use-templates-server \
my_complex.fasta \
predictions/Usage Examples
Basic prediction (no MSAs, fast):
chai-lab fold input.fasta output/With MSAs (higher accuracy, uses ColabFold server):
chai-lab fold --use-msa-server --use-templates-server input.fasta output/Using internal MSA server (if your HPC has one):
chai-lab fold --use-msa-server \
--msa-server-url "https://internal.colabserver.edu" \
input.fasta output/Generate more samples:
chai-lab fold --num-trunk-recycles 5 --num-diffn-timesteps 200 \
input.fasta output/Input Format
Chai-1 uses a modified FASTA format with entity type headers:
Protein:
>protein|name=my_protein
MKTVRQERLKSIVRILERSKEPVSG...
Ligand (SMILES):
>ligand|name=my_drug
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
DNA:
>dna|name=promoter
ATGCATGCATGCATGC
RNA:
>rna|name=aptamer
AUGCAUGCAUGCAUGC
Protein complex (multiple chains):
>protein|name=chain_A
MKTVRQERLK...
>protein|name=chain_B
MVKLTAEGSE...
Python API
from chai_lab.chai1 import run_inference
results = run_inference(
fasta_file="input.fasta",
output_dir="output/",
num_trunk_recycles=3,
num_diffn_timesteps=200,
seed=42
)See examples/predict_structure.py in the repository for more details.
Advanced Features
Custom Templates:
chai-lab fold --custom-template template.cif input.fasta output/Experimental Restraints: Specify inter-chain contacts:
# See: github.com/chaidiscovery/chai-lab/tree/main/examples/restraintsCovalent Bonds: Specify covalent modifications:
# See: github.com/chaidiscovery/chai-lab/tree/main/examples/covalent_bondsUnderstanding the Output
| File | Description |
|---|---|
pred.model_idx_N.cif |
Predicted structure (mmCIF format) |
scores.model_idx_N.npz |
Confidence scores |
msa_*.a3m |
Generated MSAs (if using MSA server) |
Confidence metrics (in scores file):
- pLDDT: Per-residue confidence
- pTM: Predicted TM-score
- pAE: Predicted aligned error
- interface scores: For multi-chain predictions
Web Server
For quick tests without installation: lab.chaidiscovery.com
Troubleshooting
“bfloat16 not supported”:
- Your GPU doesn’t support bfloat16
- Try a newer GPU (A100, H100, RTX 4090)
- Older GPUs (V100, etc.) may not work
Out of memory:
- Request GPU with more memory
- Reduce
--num-diffn-timesteps - For very large complexes, split into smaller units
Model download location:
# Set before running
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_modelsMSA server rate limits:
- The public ColabFold MMseqs2 server is a shared resource
- For batch jobs, space out requests
- Consider setting up a local MSA server for high-throughput
Slow first run:
- First run downloads ~5-10 GB of model weights
- Subsequent runs are much faster
- Set
CHAI_DOWNLOADS_DIRto avoid re-downloading