2. LocalColabFold
LocalColabFold (code) is a local installation of ColabFold, which provides an efficient implementation of AlphaFold2 protein structure prediction. ColabFold combines fast MSA generation from MMseqs2 with AlphaFold2’s structure prediction capabilities, making it significantly faster than the original AlphaFold2 implementation.
Why Use LocalColabFold?
- High-throughput predictions: Run batch jobs without Colab time limits
- No internet dependency: All computations run locally after setup
- HPC integration: Leverage your cluster’s GPUs for faster predictions
- MSA flexibility: Use pre-computed MSAs or generate them on-the-fly
Related Tools: For structure prediction without MSAs, see ESMFold. For multi-modal complexes, see Chai-1 or Boltz-2.
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 16 GB | 40+ GB | A100 recommended for proteins >500 aa |
| CPU RAM | 32 GB | 64 GB | MSA generation is memory-intensive |
| Disk Space | 15 GB | 100+ GB | Model weights + optional databases |
| CUDA | 11.1+ | 12.1+ | Check compatibility |
Preparation
Mark as complete
- Completed HPC Setup guide
- Access to a GPU node for testing
- ~15 GB disk space for installation
nvidia-smi # Check GPU is available
nvcc --version # Check CUDA versionInstallation
Mark as complete
- Download the installation script: Navigate to the directory where you want to install LocalColabFold (e.g., your scratch directory or apps folder).
wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh- Make the script executable and run it:
chmod +x install_colabbatch_linux.sh
./install_colabbatch_linux.shThis creates a localcolabfold directory containing: - A conda environment (colabfold_batch) - ColabFold and all dependencies - Model weights (~10-15 GB, downloaded automatically)
Expected install time: 15-30 minutes depending on network speed.
- Add the environment to your PATH (add to
~/.bashrcfor permanent access):
export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:$PATH"Testing the Installation
Mark as complete
- Activate the ColabFold environment:
source localcolabfold/colabfold-conda/bin/activate- Create a directory for testing and a test FASTA file:
mkdir -p tests
echo ">test_protein
MKFLKFSLLTAVLLSVVFAFSSCGDDDDTYPYDVPDYAGTCGDDDDTYPYDVPDYA" > tests/test.fasta- Run prediction:
colabfold_batch tests/test.fasta tests/test_output/Success indicators:
- Command completes without errors
tests/test_output/directory contains:test_protein_relaxed_rank_001_*.pdb(predicted structure)test_protein_scores_rank_001_*.json(confidence scores)test_protein_coverage.png(MSA coverage plot)
Expected runtime: 2-5 minutes for this small test protein.
Verify GPU is being used:
# In another terminal while prediction runs:
nvidia-smi
# Look for python process using GPU memoryHPC Job Script
#!/bin/bash
#SBATCH --job-name=colabfold
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out
# Activate environment
source /path/to/localcolabfold/colabfold-conda/bin/activate
# Optional: Use shared database location
export COLABFOLD_DOWNLOAD_DIR=/shared/colabfold_db
# Run prediction
colabfold_batch input.fasta output_dir/Usage Examples
Basic prediction:
colabfold_batch sequences.fasta predictions/With custom MSA server (if your HPC has one):
colabfold_batch --msa-server "https://internal.server.edu" sequences.fasta predictions/Multimer prediction (protein complexes):
# Separate chains with : in the FASTA file
# >complex
# SEQUENCEA:SEQUENCEB
colabfold_batch complex.fasta complex_output/Batch with templates:
colabfold_batch --templates sequences.fasta predictions/Reduce memory usage (for large proteins):
colabfold_batch --amber --num-recycle 3 large_protein.fasta output/Understanding the Output
| File | Description |
|---|---|
*_relaxed_rank_001_*.pdb |
Best predicted structure (Amber-relaxed) |
*_unrelaxed_rank_001_*.pdb |
Best prediction before relaxation |
*_scores_rank_001_*.json |
pLDDT and pTM scores |
*_coverage.png |
MSA coverage visualization |
*_pae.png |
Predicted Aligned Error heatmap |
Confidence scores:
- pLDDT (per-residue): >90 high confidence, 70-90 confident, 50-70 low, <50 very low
- pTM (overall): >0.8 high confidence for whole structure
- PAE (pairwise): Lower is better, indicates domain organization confidence
Troubleshooting
“CUDA out of memory”:
- Request GPU with more memory (
#SBATCH --gpus=a100:1) - Use
--amberflag to reduce peak memory - For very large proteins (>1000 aa), use Chai-1 or Boltz-2 instead
MSA generation is slow:
- Use the MMseqs2 server option for faster MSA generation
- Pre-compute MSAs for frequently used sequences
Database location filling home directory:
# Set in ~/.bashrc before running
export COLABFOLD_DOWNLOAD_DIR=/scratch/$USER/colabfold_dbModel weights download fails:
- Check network connectivity
- Manually download from: https://storage.googleapis.com/alphafold/
- Place in
~/.cache/colabfold/params/
GPU not being used (slow prediction):
# Verify CUDA is detected
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True