7. Chai-1

Chai-1 (paper, code) is a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across diverse benchmarks. Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.

Why Use Chai-1?

  • Multi-modal: Predict proteins, nucleic acids, small molecules, and modifications in one model
  • State-of-the-art: Top performance on structure prediction benchmarks
  • Flexible inputs: Handles complex multi-component assemblies
  • Experimental restraints: Can incorporate known distance constraints

Related Tools: For protein-only predictions, see ESMFold (faster) or LocalColabFold (MSA-based). For binding affinity predictions, see Boltz-2.

Resource Requirements

Resource Minimum Recommended Notes
GPU RAM 24 GB 80 GB A100 80GB or H100 ideal
CPU RAM 32 GB 64 GB For preprocessing
Disk Space 10 GB 20 GB Model weights
Python 3.10+ 3.11 Required

GPU Compatibility: Requires bfloat16 support. Compatible GPUs include:

  • A100, H100, L40S (recommended)
  • A10, A30, RTX 4090 (works)
  • Older GPUs may not support bfloat16

Preparation

Mark as complete

Prerequisites:

  • Completed HPC Setup guide
  • GPU with bfloat16 support
  • Python 3.10+

Verify bfloat16 support:

import torch
print(torch.cuda.is_bf16_supported())  # Should print True

Installation

Mark as complete

  1. Create a conda environment:
mamba create -n chailab python=3.11
mamba activate chailab
  1. Install Chai-1:
pip install chai_lab==0.6.1

Expected download: ~5-10 GB of model weights (downloaded on first run).

Alternative: Latest development version:

pip install git+https://github.com/chaidiscovery/chai-lab.git

Testing the Installation

Mark as complete

Create a test FASTA file test.fasta:

>protein|name=example
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG

Run prediction:

chai-lab fold test.fasta output_folder/

Success indicators:

  • Command completes without errors
  • output_folder/ contains:
    • pred.model_idx_0.cif - Predicted structure
    • scores.model_idx_0.npz - Confidence scores

Expected runtime: 2-5 minutes for first run (includes model download), ~30 seconds for subsequent runs.

Note: By default, this generates 5 sample predictions using embeddings without MSAs.

HPC Job Script

#!/bin/bash
#SBATCH --job-name=chai
#SBATCH --partition=gpu
#SBATCH --gpus=a100:1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

module load cuda/12.1

# source ~/.bashrc
mamba activate chailab

# Set custom download directory (avoid filling home)
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_models

# Run prediction with MSAs
chai-lab fold --use-msa-server --use-templates-server \
    my_complex.fasta \
    predictions/

Usage Examples

Basic prediction (no MSAs, fast):

chai-lab fold input.fasta output/

With MSAs (higher accuracy, uses ColabFold server):

chai-lab fold --use-msa-server --use-templates-server input.fasta output/

Using internal MSA server (if your HPC has one):

chai-lab fold --use-msa-server \
    --msa-server-url "https://internal.colabserver.edu" \
    input.fasta output/

Generate more samples:

chai-lab fold --num-trunk-recycles 5 --num-diffn-timesteps 200 \
    input.fasta output/

Input Format

Chai-1 uses a modified FASTA format with entity type headers:

Protein:

>protein|name=my_protein
MKTVRQERLKSIVRILERSKEPVSG...

Ligand (SMILES):

>ligand|name=my_drug
CC(C)CC1=CC=C(C=C1)C(C)C(=O)O

DNA:

>dna|name=promoter
ATGCATGCATGCATGC

RNA:

>rna|name=aptamer
AUGCAUGCAUGCAUGC

Protein complex (multiple chains):

>protein|name=chain_A
MKTVRQERLK...
>protein|name=chain_B
MVKLTAEGSE...

Python API

from chai_lab.chai1 import run_inference

results = run_inference(
    fasta_file="input.fasta",
    output_dir="output/",
    num_trunk_recycles=3,
    num_diffn_timesteps=200,
    seed=42
)

See examples/predict_structure.py in the repository for more details.

Advanced Features

Custom Templates:

chai-lab fold --custom-template template.cif input.fasta output/

Experimental Restraints: Specify inter-chain contacts:

# See: github.com/chaidiscovery/chai-lab/tree/main/examples/restraints

Covalent Bonds: Specify covalent modifications:

# See: github.com/chaidiscovery/chai-lab/tree/main/examples/covalent_bonds

Understanding the Output

File Description
pred.model_idx_N.cif Predicted structure (mmCIF format)
scores.model_idx_N.npz Confidence scores
msa_*.a3m Generated MSAs (if using MSA server)

Confidence metrics (in scores file):

  • pLDDT: Per-residue confidence
  • pTM: Predicted TM-score
  • pAE: Predicted aligned error
  • interface scores: For multi-chain predictions

Web Server

For quick tests without installation: lab.chaidiscovery.com

Troubleshooting

WarningCommon Issues

“bfloat16 not supported”:

  • Your GPU doesn’t support bfloat16
  • Try a newer GPU (A100, H100, RTX 4090)
  • Older GPUs (V100, etc.) may not work

Out of memory:

  • Request GPU with more memory
  • Reduce --num-diffn-timesteps
  • For very large complexes, split into smaller units

Model download location:

# Set before running
export CHAI_DOWNLOADS_DIR=/scratch/$USER/chai_models

MSA server rate limits:

  • The public ColabFold MMseqs2 server is a shared resource
  • For batch jobs, space out requests
  • Consider setting up a local MSA server for high-throughput

Slow first run:

  • First run downloads ~5-10 GB of model weights
  • Subsequent runs are much faster
  • Set CHAI_DOWNLOADS_DIR to avoid re-downloading