Monday: Tool Installation
Overview
Monday is dedicated to installing the essential ML-based protein design and structure prediction tools on your HPC cluster. Each module guides you through installing a specific tool, from structure predictors like LocalColabFold and ESMFold to design tools like LigandMPNN and BindCraft.
- Request GPU resources and navigate a module-based HPC environment (CUDA, conda, containers).
- Install and smoke-test each of the core structure prediction tools (LocalColabFold, ESMFold, Chai-1, Boltz-2) on your cluster.
- Install and smoke-test the core design and docking tools (LigandMPNN, RFdiffusion2, DiffDock-PP, PLACER, BindCraft).
- Explain, at a high level, what each tool takes as input, produces as output, and when you’d reach for it.
- Troubleshoot the most common install failures (CUDA mismatch, missing weights, environment conflicts) well enough to keep moving.
Pre-work
Before starting the main modules, complete these pre-work assignments to ensure your local environment is ready.
| # | Module | Description | Status |
|---|---|---|---|
| P1 | Environment & GitHub | Setup conda, git, and GitHub | Required |
| P2 | PyMOL & VS Code | Install visualization and coding tools | Required |
| P3 | Python Refresher | Refresh Python skills for bioinformatics | Recommended |
Getting Started
Before installing individual tools, complete the HPC Setup module to ensure your environment is properly configured:
| # | Module | Description | Status |
|---|---|---|---|
| 1 | Common HPC Setup | CUDA, Conda, containers, and environment setup | Start Here |
Tool Installation Modules
| # | Tool | Description | Status |
|---|---|---|---|
| 2 | LocalColabFold | Fast AlphaFold2 structure prediction | Required |
| 3 | LigandMPNN | Context-aware protein sequence design | Required |
| 4 | RFdiffusion2 | Atom-level active site scaffolding | Required |
| 5 | ESMFold | Single-sequence structure prediction | Required |
| 6 | OpenFold | Open-source AlphaFold2 reproduction | Optional |
| 7 | Chai-1 | Multi-modal biomolecular structure prediction | Required |
| 8 | Boltz-2 | Structure + binding affinity prediction | Required |
| 9 | DiffDock-PP | Protein-protein docking | Required |
| 10 | PLACER | Protein-ligand conformational ensemble prediction | Required |
| 11 | BindCraft | End-to-end binder design pipeline | Required |
| 12 | ESM3 | Multimodal protein generation | Optional |
| 13 | RFdiffusion All Atom | All-atom protein design (predecessor to RFd2) | Optional |
Tool Categories
Structure Prediction
| Tool | Input | Output | Speed | Best For |
|---|---|---|---|---|
| LocalColabFold | Sequence + MSA | Structure | Medium | High-accuracy single proteins |
| ESMFold | Sequence only | Structure | Fast | Quick predictions, no MSA needed |
| OpenFold | Sequence + MSA | Structure | Medium | Research, custom training |
| Chai-1 | Multi-modal | Complex structures | Medium | Proteins + ligands + nucleic acids |
| Boltz-2 | Multi-modal | Structure + affinity | Medium | Drug discovery |
Protein Design
| Tool | Input | Output | Best For |
|---|---|---|---|
| LigandMPNN | Backbone | Sequence | Sequence design with ligand context |
| RFdiffusion2 | Constraints | Backbone | Active site scaffolding |
| BindCraft | Target structure | Binder designs | End-to-end binder design |
Docking
| Tool | Input | Output | Best For |
|---|---|---|---|
| DiffDock-PP | Two proteins | Docked complex | Protein-protein docking |
| PLACER | Protein + ligand | Ensemble poses | Protein-ligand docking |
Tips for Success
Start with HPC Setup - Complete Module 1 before installing any tools
Use separate conda environments - Each tool should have its own environment to avoid dependency conflicts
Check GPU availability - Most tools require GPU access; make sure you can request GPU nodes on your cluster
Note your paths - Keep track of where you install each tool; you’ll need these paths later
Test each installation - Don’t move on until you’ve verified each tool works
Check shared resources - Your HPC may already have databases (AlphaFold, ColabFold) installed
Resource Overview
Approximate requirements across all tools:
| Resource | Total Needed |
|---|---|
| Disk Space | 50-100 GB (tools only), 2+ TB (with databases) |
| GPU RAM | 16-80 GB depending on task |
| CPU RAM | 32-64 GB |
Getting Help
If you encounter issues:
- Check the tool’s official documentation (linked in each module)
- Search for existing GitHub issues on the tool’s repository
- Report an issue on the bootcamp site (GitHub account required)