2. LocalColabFold

LocalColabFold (code) is a local installation of ColabFold, which provides an efficient implementation of AlphaFold2 protein structure prediction. ColabFold combines fast MSA generation from MMseqs2 with AlphaFold2’s structure prediction capabilities, making it significantly faster than the original AlphaFold2 implementation.

Why Use LocalColabFold?

  • High-throughput predictions: Run batch jobs without Colab time limits
  • No internet dependency: All computations run locally after setup
  • HPC integration: Leverage your cluster’s GPUs for faster predictions
  • MSA flexibility: Use pre-computed MSAs or generate them on-the-fly

Related Tools: For structure prediction without MSAs, see ESMFold. For multi-modal complexes, see Chai-1 or Boltz-2.

Resource Requirements

Resource Minimum Recommended Notes
GPU RAM 16 GB 40+ GB A100 recommended for proteins >500 aa
CPU RAM 32 GB 64 GB MSA generation is memory-intensive
Disk Space 15 GB 100+ GB Model weights + optional databases
CUDA 11.1+ 12.1+ Check compatibility

Preparation

Mark as complete

ImportantPrerequisites
  • Completed HPC Setup guide
  • Access to a GPU node for testing
  • ~15 GB disk space for installation
NoteVerify your environment
nvidia-smi          # Check GPU is available
nvcc --version      # Check CUDA version

Installation

Mark as complete

  1. Download the installation script: Navigate to the directory where you want to install LocalColabFold (e.g., your scratch directory or apps folder).
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh
  1. Make the script executable and run it:
chmod +x install_colabbatch_linux.sh
./install_colabbatch_linux.sh

This creates a localcolabfold directory containing: - A conda environment (colabfold_batch) - ColabFold and all dependencies - Model weights (~10-15 GB, downloaded automatically)

Expected install time: 15-30 minutes depending on network speed.

  1. Add the environment to your PATH (add to ~/.bashrc for permanent access):
export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:$PATH"

Testing the Installation

Mark as complete

  1. Activate the ColabFold environment:
source localcolabfold/colabfold-conda/bin/activate
  1. Create a directory for testing and a test FASTA file:
mkdir -p tests
echo ">test_protein
MKFLKFSLLTAVLLSVVFAFSSCGDDDDTYPYDVPDYAGTCGDDDDTYPYDVPDYA" > tests/test.fasta
  1. Run prediction:
colabfold_batch tests/test.fasta tests/test_output/

Success indicators:

  • Command completes without errors
  • tests/test_output/ directory contains:
    • test_protein_relaxed_rank_001_*.pdb (predicted structure)
    • test_protein_scores_rank_001_*.json (confidence scores)
    • test_protein_coverage.png (MSA coverage plot)

Expected runtime: 2-5 minutes for this small test protein.

Verify GPU is being used:

# In another terminal while prediction runs:
nvidia-smi
# Look for python process using GPU memory

HPC Job Script

#!/bin/bash
#SBATCH --job-name=colabfold
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

# Activate environment
source /path/to/localcolabfold/colabfold-conda/bin/activate

# Optional: Use shared database location
export COLABFOLD_DOWNLOAD_DIR=/shared/colabfold_db

# Run prediction
colabfold_batch input.fasta output_dir/

Usage Examples

Basic prediction:

colabfold_batch sequences.fasta predictions/

With custom MSA server (if your HPC has one):

colabfold_batch --msa-server "https://internal.server.edu" sequences.fasta predictions/

Multimer prediction (protein complexes):

# Separate chains with : in the FASTA file
# >complex
# SEQUENCEA:SEQUENCEB
colabfold_batch complex.fasta complex_output/

Batch with templates:

colabfold_batch --templates sequences.fasta predictions/

Reduce memory usage (for large proteins):

colabfold_batch --amber --num-recycle 3 large_protein.fasta output/

Understanding the Output

File Description
*_relaxed_rank_001_*.pdb Best predicted structure (Amber-relaxed)
*_unrelaxed_rank_001_*.pdb Best prediction before relaxation
*_scores_rank_001_*.json pLDDT and pTM scores
*_coverage.png MSA coverage visualization
*_pae.png Predicted Aligned Error heatmap

Confidence scores:

  • pLDDT (per-residue): >90 high confidence, 70-90 confident, 50-70 low, <50 very low
  • pTM (overall): >0.8 high confidence for whole structure
  • PAE (pairwise): Lower is better, indicates domain organization confidence

Troubleshooting

WarningCommon Issues

“CUDA out of memory”:

  • Request GPU with more memory (#SBATCH --gpus=a100:1)
  • Use --amber flag to reduce peak memory
  • For very large proteins (>1000 aa), use Chai-1 or Boltz-2 instead

MSA generation is slow:

  • Use the MMseqs2 server option for faster MSA generation
  • Pre-compute MSAs for frequently used sequences

Database location filling home directory:

# Set in ~/.bashrc before running
export COLABFOLD_DOWNLOAD_DIR=/scratch/$USER/colabfold_db

Model weights download fails:

  • Check network connectivity
  • Manually download from: https://storage.googleapis.com/alphafold/
  • Place in ~/.cache/colabfold/params/

GPU not being used (slow prediction):

# Verify CUDA is detected
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True