2. LocalColabFold

LocalColabFold (code) is a local installation of ColabFold, which provides an efficient implementation of AlphaFold2 protein structure prediction. ColabFold combines fast MSA generation from MMseqs2 with AlphaFold2’s structure prediction capabilities, making it significantly faster than the original AlphaFold2 implementation.

Why Use LocalColabFold?

High-throughput predictions: Run batch jobs without Colab time limits
No internet dependency: All computations run locally after setup
HPC integration: Leverage your cluster’s GPUs for faster predictions
MSA flexibility: Use pre-computed MSAs or generate them on-the-fly

Related Tools: For structure prediction without MSAs, see ESMFold. For multi-modal complexes, see Chai-1 or Boltz-2.

Resource Requirements

Resource	Minimum	Recommended	Notes
GPU RAM	16 GB	40+ GB	A100 recommended for proteins >500 aa
CPU RAM	32 GB	64 GB	MSA generation is memory-intensive
Disk Space	15 GB	100+ GB	Model weights + optional databases
CUDA	11.1+	12.1+	Check compatibility

Preparation

Mark as complete

Prerequisites

Completed HPC Setup guide
Access to a GPU node for testing
~15 GB disk space for installation

Verify your environment

nvidia-smi          # Check GPU is available
nvcc --version      # Check CUDA version

Installation

Mark as complete

Download the installation script: Navigate to the directory where you want to install LocalColabFold (e.g., your scratch directory or apps folder).

wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh

Make the script executable and run it:

chmod +x install_colabbatch_linux.sh
./install_colabbatch_linux.sh

This creates a localcolabfold directory containing: - A conda environment (colabfold_batch) - ColabFold and all dependencies - Model weights (~10-15 GB, downloaded automatically)

Expected install time: 15-30 minutes depending on network speed.

Add the environment to your PATH (add to ~/.bashrc for permanent access):

export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:$PATH"

Testing the Installation

Mark as complete

Activate the ColabFold environment:

source localcolabfold/colabfold-conda/bin/activate

Create a directory for testing and a test FASTA file:

mkdir -p tests
echo ">test_protein
MKFLKFSLLTAVLLSVVFAFSSCGDDDDTYPYDVPDYAGTCGDDDDTYPYDVPDYA" > tests/test.fasta

Run prediction:

colabfold_batch tests/test.fasta tests/test_output/

Success indicators:

Command completes without errors
tests/test_output/ directory contains:
- test_protein_relaxed_rank_001_*.pdb (predicted structure)
- test_protein_scores_rank_001_*.json (confidence scores)
- test_protein_coverage.png (MSA coverage plot)

Expected runtime: 2-5 minutes for this small test protein.

Verify GPU is being used:

# In another terminal while prediction runs:
nvidia-smi
# Look for python process using GPU memory

HPC Job Script

#!/bin/bash
#SBATCH --job-name=colabfold
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

# Activate environment
source /path/to/localcolabfold/colabfold-conda/bin/activate

# Optional: Use shared database location
export COLABFOLD_DOWNLOAD_DIR=/shared/colabfold_db

# Run prediction
colabfold_batch input.fasta output_dir/

Usage Examples

Basic prediction:

colabfold_batch sequences.fasta predictions/

With custom MSA server (if your HPC has one):

colabfold_batch --msa-server "https://internal.server.edu" sequences.fasta predictions/

Multimer prediction (protein complexes):

# Separate chains with : in the FASTA file
# >complex
# SEQUENCEA:SEQUENCEB
colabfold_batch complex.fasta complex_output/

Batch with templates:

colabfold_batch --templates sequences.fasta predictions/

Reduce memory usage (for large proteins):

colabfold_batch --amber --num-recycle 3 large_protein.fasta output/

Understanding the Output

File	Description
`_relaxed_rank_001_.pdb`	Best predicted structure (Amber-relaxed)
`_unrelaxed_rank_001_.pdb`	Best prediction before relaxation
`_scores_rank_001_.json`	pLDDT and pTM scores
`*_coverage.png`	MSA coverage visualization
`*_pae.png`	Predicted Aligned Error heatmap

Confidence scores:

pLDDT (per-residue): >90 high confidence, 70-90 confident, 50-70 low, <50 very low
pTM (overall): >0.8 high confidence for whole structure
PAE (pairwise): Lower is better, indicates domain organization confidence

Troubleshooting

Common Issues

“CUDA out of memory”:

Request GPU with more memory (#SBATCH --gpus=a100:1)
Use --amber flag to reduce peak memory
For very large proteins (>1000 aa), use Chai-1 or Boltz-2 instead

MSA generation is slow:

Use the MMseqs2 server option for faster MSA generation
Pre-compute MSAs for frequently used sequences

Database location filling home directory:

# Set in ~/.bashrc before running
export COLABFOLD_DOWNLOAD_DIR=/scratch/$USER/colabfold_db

Model weights download fails:

Check network connectivity
Manually download from: https://storage.googleapis.com/alphafold/
Place in ~/.cache/colabfold/params/

GPU not being used (slow prediction):

# Verify CUDA is detected
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True