7. Chai-1

Chai-1 (paper, code) is a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across diverse benchmarks. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.

Why Use Chai-1?

Multi-modal: Predict proteins, nucleic acids, small molecules, and modifications in one model
State-of-the-art: Top performance on structure prediction benchmarks
Flexible inputs: Handles complex multi-component assemblies
Experimental restraints: Can incorporate known distance constraints

Related Tools: For protein-only predictions, see ESMFold (faster) or LocalColabFold (MSA-based). For binding affinity predictions, see Boltz-2.

Resource Requirements

Resource	Minimum	Recommended	Notes
GPU RAM	24 GB	80 GB	A100 80GB or H100 ideal
CPU RAM	32 GB	64 GB	For preprocessing
Disk Space	10 GB	20 GB	Model weights
Python	3.10+	3.11	Required

GPU Compatibility: Requires bfloat16 support. Compatible GPUs include:

A100, H100, L40S (recommended)
A10, A30, RTX 4090 (works)
Older GPUs may not support bfloat16

Preparation

Mark as complete

Prerequisites:

Completed HPC Setup guide
GPU with bfloat16 support
Python 3.10+

Verify bfloat16 support:

import torch
print(torch.cuda.is_bf16_supported())  # Should print True

Installation

Mark as complete

Create a conda environment:

mamba create -n chailab python=3.11
mamba activate chailab

Install Chai-1:

pip install chai_lab==0.6.1

Expected download: ~5-10 GB of model weights (downloaded on first run).

Alternative: Latest development version:

pip install git+https://github.com/chaidiscovery/chai-lab.git

Testing the Installation

Mark as complete

Create a test FASTA file test.fasta:

>protein|name=example
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Run prediction:

chai-lab fold test.fasta output_folder/

Success indicators:

Command completes without errors
output_folder/ contains:
- pred.model_idx_0.cif - Predicted structure
- scores.model_idx_0.npz - Confidence scores

Expected runtime: 2-5 minutes for first run (includes model download), ~30 seconds for subsequent runs.

Note: By default, this generates 5 sample predictions using embeddings without MSAs.

HPC Job Script

#!/bin/bash
#SBATCH --job-name=chai
#SBATCH --partition=gpu
#SBATCH --gpus=a100:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

module load cuda/12.1

# source ~/.bashrc
mamba activate chailab

# Set custom download directory (avoid filling home)
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_models

# Run prediction with MSAs
chai-lab fold --use-msa-server --use-templates-server \
    my_complex.fasta \
    predictions/

Usage Examples

Basic prediction (no MSAs, fast):

chai-lab fold input.fasta output/

With MSAs (higher accuracy, uses ColabFold server):

chai-lab fold --use-msa-server --use-templates-server input.fasta output/

Using internal MSA server (if your HPC has one):

chai-lab fold --use-msa-server \
    --msa-server-url "https://internal.colabserver.edu" \
    input.fasta output/

Generate more samples:

chai-lab fold --num-trunk-recycles 5 --num-diffn-timesteps 200 \
    input.fasta output/

Input Format

Chai-1 uses a modified FASTA format with entity type headers:

Protein:

>protein|name=my_protein
MKTVRQERLKSIVRILERSKEPVSG...

Ligand (SMILES):

>ligand|name=my_drug
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O

DNA:

>dna|name=promoter
ATGCATGCATGCATGC

RNA:

>rna|name=aptamer
AUGCAUGCAUGCAUGC

Protein complex (multiple chains):

>protein|name=chain_A
MKTVRQERLK...
>protein|name=chain_B
MVKLTAEGSE...

Python API

from chai_lab.chai1 import run_inference

results = run_inference(
    fasta_file="input.fasta",
    output_dir="output/",
    num_trunk_recycles=3,
    num_diffn_timesteps=200,
    seed=42
)

See examples/predict_structure.py in the repository for more details.

Advanced Features

Custom Templates:

chai-lab fold --custom-template template.cif input.fasta output/

Experimental Restraints: Specify inter-chain contacts:

# See: github.com/chaidiscovery/chai-lab/tree/main/examples/restraints

Covalent Bonds: Specify covalent modifications:

# See: github.com/chaidiscovery/chai-lab/tree/main/examples/covalent_bonds

Understanding the Output

File	Description
`pred.model_idx_N.cif`	Predicted structure (mmCIF format)
`scores.model_idx_N.npz`	Confidence scores
`msa_*.a3m`	Generated MSAs (if using MSA server)

Confidence metrics (in scores file):

pLDDT: Per-residue confidence
pTM: Predicted TM-score
pAE: Predicted aligned error
interface scores: For multi-chain predictions

Web Server

For quick tests without installation: lab.chaidiscovery.com

Troubleshooting

Common Issues

“bfloat16 not supported”:

Your GPU doesn’t support bfloat16
Try a newer GPU (A100, H100, RTX 4090)
Older GPUs (V100, etc.) may not work

Out of memory:

Request GPU with more memory
Reduce --num-diffn-timesteps
For very large complexes, split into smaller units

Model download location:

# Set before running
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_models

MSA server rate limits:

The public ColabFold MMseqs2 server is a shared resource
For batch jobs, space out requests
Consider setting up a local MSA server for high-throughput

Slow first run:

First run downloads ~5-10 GB of model weights
Subsequent runs are much faster
Set CHAI_DOWNLOADS_DIR to avoid re-downloading