4. RFdiffusion2

RFdiffusion2 (paper, code) is a protein design model capable of atom-level active site scaffolding. It extends the original RFdiffusion to enable precise control over protein-ligand interactions at the atomic level.

Why Use RFdiffusion2?

  • Atomic-level control: Design proteins with precise active site geometries
  • Ligand scaffolding: Build proteins around small molecules with atomic accuracy
  • Motif grafting: Incorporate functional motifs into new scaffolds
  • Flexible backbone design: Generate novel folds with specific functional constraints

Related Tools: Use with LigandMPNN for sequence design after backbone generation. For the earlier version without atomic control, see RFdiffusion All Atom (Optional).

Resource Requirements

Resource Minimum Recommended Notes
GPU RAM 16 GB 32+ GB A100 for larger designs
CPU RAM 16 GB 32 GB Container-based execution
Disk Space 10 GB 20 GB Container + weights
Container Apptainer/Singularity Required No native Docker on HPC

Preparation

Mark as complete

Prerequisites:

  • Completed HPC Setup guide
  • Apptainer/Singularity available on your cluster
  • CUDA-capable GPU

Verify your environment:

module load apptainer    # or: module load singularity
apptainer --version
nvidia-smi

Important: RFdiffusion2 uses containers. Most academic HPCs do NOT support Docker for security reasons - use Apptainer/Singularity instead.

Installation

Mark as complete

  1. Clone the repository:
git clone https://github.com/RosettaCommons/RFdiffusion2.git
cd RFdiffusion2
  1. Add the repo to your PYTHONPATH (add to ~/.bashrc):
export PYTHONPATH="/path/to/your/RFdiffusion2:$PYTHONPATH"
  1. Download the model weights and container:
python setup.py

Expected download: ~5-10 GB (container + weights). This can take 30+ minutes.

If download is interrupted:

python setup.py overwrite
  1. Verify Apptainer/Singularity is available:
module load apptainer
# or: module load singularity

The downloaded .sif file in rf_diffusion/exec/ is the Singularity container.

Testing the Installation

Mark as complete

Run a demo case:

apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
    rf_diffusion/benchmark/pipeline.py \
    --config-name=open_source_demo \
    sweep.benchmarks=active_site_unindexed_atomic_partial_ligand

Note: Omit --nv flag if running without GPU (will be very slow).

Success indicators:

  • Command completes without errors
  • Output directory created at pipeline_outputs/<timestamp>_open_source_demo/
  • Contains PDB files with designed structures

Expected runtime: 5-15 minutes on GPU, 30+ minutes on CPU.

HPC Job Script

#!/bin/bash
#SBATCH --job-name=rfdiff2
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --time=04:00:00
#SBATCH --output=%x_%j.out

module load apptainer
module load cuda/12.1

cd /path/to/RFdiffusion2

apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
    rf_diffusion/benchmark/pipeline.py \
    --config-name=open_source_demo

Usage Examples

Basic backbone design:

apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
    rf_diffusion/benchmark/pipeline.py \
    --config-name=my_config

With custom output directory:

apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
    rf_diffusion/benchmark/pipeline.py \
    --config-name=open_source_demo \
    sweep.output_dir=/path/to/output

Multiple design benchmarks:

apptainer exec --nv rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif \
    rf_diffusion/benchmark/pipeline.py \
    --config-name=open_source_demo \
    sweep.benchmarks="[benchmark1,benchmark2]"

Docker to Apptainer Translation

The official documentation may show Docker commands. Here’s how to translate:

Docker Command Apptainer Equivalent
docker run --gpus all image apptainer exec --nv image.sif
docker run -v /path:/path apptainer exec --bind /path:/path
-it (interactive) apptainer shell --nv

Example conversion:

# Docker (won't work on HPC):
docker run --gpus all -v $(pwd):/workspace rfdiffusion:latest python script.py

# Apptainer (works on HPC):
apptainer exec --nv --bind $(pwd):/workspace rfdiffusion.sif python script.py

Understanding the Output

Output structure:

pipeline_outputs/
└── <timestamp>_<config_name>/
    ├── designs/
    │   ├── design_0.pdb    # Designed backbone
    │   ├── design_1.pdb
    │   └── ...
    ├── logs/
    │   └── run.log         # Execution log
    └── config.yaml         # Configuration used

Configuration System

RFdiffusion2 uses Hydra for configuration. Key config options:

Parameter Description
sweep.benchmarks Which design task(s) to run
sweep.output_dir Output directory
diffuser.T Number of diffusion timesteps
inference.num_designs Number of designs to generate

Troubleshooting

WarningCommon Issues

“No GPU available” / extremely slow:

  • Ensure --nv flag is included
  • Verify GPU allocation: nvidia-smi
  • Load CUDA module: module load cuda/12.1

Container permission errors:

chmod +x rf_diffusion/exec/bakerlab_rf_diffusion_aa.sif

“FileNotFoundError” for weights:

  • Re-run python setup.py to ensure all files downloaded
  • Check rf_diffusion/weights/ directory exists

Container not found:

  • Provide full path to .sif file
  • Or run from the RFdiffusion2 directory

Setup script hangs during download:

  • Large files may take 30+ minutes
  • Check network connectivity
  • If interrupted, run python setup.py overwrite

Module not found errors inside container:

  • Ensure PYTHONPATH is set correctly
  • Container may need --bind for additional paths