4. RFdiffusion2
RFdiffusion2 (paper, code) is a protein design model capable of atom-level active site scaffolding. It extends the original RFdiffusion to enable precise control over protein-ligand interactions at the atomic level.
Why Use RFdiffusion2?
- Atomic-level control: Design proteins with precise active site geometries
- Ligand scaffolding: Build proteins around small molecules with atomic accuracy
- Motif grafting: Incorporate functional motifs into new scaffolds
- Flexible backbone design: Generate novel folds with specific functional constraints
Related Tools: Use with LigandMPNN for sequence design after backbone generation. For the earlier version without atomic control, see RFdiffusion All Atom (Optional).
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 16 GB | 32+ GB | A100 for larger designs |
| CPU RAM | 16 GB | 32 GB | Container-based execution |
| Disk Space | 10 GB | 20 GB | Container + weights |
| Container | Apptainer/Singularity | Required | No native Docker on HPC |
Preparation
Mark as complete
Prerequisites:
- Completed HPC Setup guide
- Apptainer/Singularity available on your cluster
- CUDA-capable GPU
Verify your environment:
module load apptainer # or: module load singularity
apptainer --version
nvidia-smiImportant: RFdiffusion2 uses containers. Most academic HPCs do NOT support Docker for security reasons - use Apptainer/Singularity instead.
Installation
Mark as complete
- Clone the repository:
git clone https://github.com/RosettaCommons/RFdiffusion2.git
cd RFdiffusion2- Add the repo to your PYTHONPATH (add to
~/.bashrc):
export PYTHONPATH="/path/to/your/RFdiffusion2:$PYTHONPATH"- Download the model weights and container:
python setup.pyExpected download: ~5-10 GB (container + weights). This can take 30+ minutes.
If download is interrupted:
python setup.py overwrite- Verify Apptainer/Singularity is available:
module load apptainer
# or: module load singularityThe downloaded .sif file in rf_diffusion/exec/ is the Singularity container.
Testing the Installation
Mark as complete
Run a demo case:
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
rf_diffusion/benchmark/pipeline.py \
--config-name=open_source_demo \
sweep.benchmarks=active_site_unindexed_atomic_partial_ligandNote: Omit --nv flag if running without GPU (will be very slow).
Success indicators:
- Command completes without errors
- Output directory created at
pipeline_outputs/<timestamp>_open_source_demo/ - Contains PDB files with designed structures
Expected runtime: 5-15 minutes on GPU, 30+ minutes on CPU.
HPC Job Script
#!/bin/bash
#SBATCH --job-name=rfdiff2
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out
module load apptainer
module load cuda/12.1
cd /path/to/RFdiffusion2
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
rf_diffusion/benchmark/pipeline.py \
--config-name=open_source_demoUsage Examples
Basic backbone design:
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
rf_diffusion/benchmark/pipeline.py \
--config-name=my_configWith custom output directory:
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
rf_diffusion/benchmark/pipeline.py \
--config-name=open_source_demo \
sweep.output_dir=/path/to/outputMultiple design benchmarks:
apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
rf_diffusion/benchmark/pipeline.py \
--config-name=open_source_demo \
sweep.benchmarks="[benchmark1,benchmark2]"Docker to Apptainer Translation
The official documentation may show Docker commands. Here’s how to translate:
| Docker Command | Apptainer Equivalent |
|---|---|
docker run --gpus all image |
apptainer exec --nv image.sif |
docker run -v /path:/path |
apptainer exec --bind /path:/path |
-it (interactive) |
apptainer shell --nv |
Example conversion:
# Docker (won't work on HPC):
docker run --gpus all -v $(pwd):/workspace rfdiffusion:latest python script.py
# Apptainer (works on HPC):
apptainer exec --nv --bind $(pwd):/workspace rfdiffusion.sif python script.pyUnderstanding the Output
Output structure:
pipeline_outputs/
└── <timestamp>_<config_name>/
├── designs/
│ ├── design_0.pdb # Designed backbone
│ ├── design_1.pdb
│ └── ...
├── logs/
│ └── run.log # Execution log
└── config.yaml # Configuration used
Configuration System
RFdiffusion2 uses Hydra for configuration. Key config options:
| Parameter | Description |
|---|---|
sweep.benchmarks |
Which design task(s) to run |
sweep.output_dir |
Output directory |
diffuser.T |
Number of diffusion timesteps |
inference.num_designs |
Number of designs to generate |
Troubleshooting
“No GPU available” / extremely slow:
- Ensure
--nvflag is included - Verify GPU allocation:
nvidia-smi - Load CUDA module:
module load cuda/12.1
Container permission errors:
chmod +x rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif“FileNotFoundError” for weights:
- Re-run
python setup.pyto ensure all files downloaded - Check
rf_diffusion/weights/directory exists
Container not found:
- Provide full path to
.siffile - Or run from the RFdiffusion2 directory
Setup script hangs during download:
- Large files may take 30+ minutes
- Check network connectivity
- If interrupted, run
python setup.py overwrite
Module not found errors inside container:
- Ensure PYTHONPATH is set correctly
- Container may need
--bindfor additional paths