9. DiffDock-PP

DiffDock-PP (paper, code) is a graph neural network trained for de-noising of rigid transformations (rotation and translation) to predict protein-protein docking orientations between two rigid protein subunits.

Why Use DiffDock-PP?

Fast protein-protein docking: Predicts binding orientations without expensive sampling
Validation tool: Orthogonally validate structure predictions from other methods
Ensemble predictions: Generate multiple docking poses for uncertainty estimation
Rigid-body docking: Efficient for cases where backbone flexibility is minimal

Related Tools: For protein-ligand docking, see PLACER. For flexible ligand binding with ensemble generation, see PLACER. For structure prediction of complexes, see Chai-1 or Boltz-2.

Resource Requirements

Resource	Minimum	Recommended	Notes
GPU RAM	8 GB	16 GB	Scales with protein size
CPU RAM	8 GB	16 GB	For preprocessing
Disk Space	2 GB	5 GB	Model weights
CUDA	11.6	11.6-11.7	Specific version required

Preparation

Mark as complete

Prerequisites:

Completed HPC Setup guide
Conda/Mamba installed
CUDA 11.6 or 11.7 available

Check CUDA availability:

module avail cuda
# Look for cuda/11.6 or cuda/11.7

Installation

Mark as complete

Clone the DiffDock-PP repository:

git clone https://github.com/ketatam/DiffDock-PP.git
cd DiffDock-PP

Create a new environment:

mamba create -n diffdock_pp python=3.9
mamba activate diffdock_pp

Install PyTorch with CUDA 11.6:

mamba install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia

Why CUDA 11.6? DiffDock-PP was developed and tested with this version. Using different versions may cause compatibility issues with PyG.

Install PyTorch Geometric (PyG) packages:

mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg

Install remaining dependencies:

mamba install mkl=2024.0 "numpy<2.0" dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlib

Why numpy<2.0? NumPy 2.0 introduced breaking changes that affect many scientific packages. Keeping NumPy below 2.0 ensures compatibility.

Testing the Installation

Mark as complete

Create required directories:

mkdir storage

Run the test script on the DB5 benchmark:

bash src/db5_inference.sh

Success indicators:

Command completes without errors
Output folder visualization/epoch-0/ is created
Directory contains PDB files of docked complexes (multiple .pdb files)

Expected runtime: 5-15 minutes depending on GPU.

Verify output:

ls visualization/epoch-0/*.pdb | wc -l
# Should show multiple PDB files

HPC Job Script

#!/bin/bash
#SBATCH --job-name=diffdock_pp
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=%x_%j.out

module load cuda/11.6

# source ~/.bashrc
mamba activate diffdock_pp

cd /path/to/DiffDock-PP

# Create output directory
mkdir -p storage

# Run inference
bash src/db5_inference.sh

Usage Examples

Run DB5 benchmark (default test):

bash src/db5_inference.sh

Custom docking (requires understanding the codebase):

DiffDock-PP requires input data in a specific format. For custom proteins:

Prepare receptor and ligand PDB files
Create data configuration files
Run inference script

See the repository documentation for detailed input format requirements.

Understanding the Output

Output structure:

visualization/
└── epoch-0/
    ├── complex_1_pose_0.pdb    # Docking pose 1
    ├── complex_1_pose_1.pdb    # Docking pose 2
    ├── complex_1_pose_2.pdb    # Docking pose 3
    └── ...                      # More poses

Each PDB file contains:

Both protein chains with predicted relative orientation
Multiple poses represent different docking predictions
Compare poses to assess uncertainty

Use Cases

Protein-Protein Docking: Predict binding orientations between protein chains
Complex Validation: Validate predicted protein-protein interfaces from other methods
Ensemble Generation: Generate multiple docking poses to capture uncertainty
Benchmarking: Compare against other docking methods

When to Use DiffDock-PP vs Other Tools

Tool	Best For
DiffDock-PP	Rigid protein-protein docking
PLACER	Protein-ligand docking with conformational sampling
Chai-1/Boltz-2	Ab initio complex structure prediction
BindCraft	De novo binder design

Troubleshooting

Common Issues

PyG installation fails:

Ensure CUDA toolkit is loaded before installing
Install PyTorch first, then PyG packages

Verify versions match:

python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

CUDA version mismatch:

Check your system’s CUDA version:

nvcc --version

This should match the pytorch-cuda version (11.6). If not:

module load cuda/11.6

“ModuleNotFoundError” for PyG components:

Install all PyG packages together:

mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg

NumPy errors:

Ensure numpy<2.0 is installed
Don’t upgrade NumPy even if prompted

GPU not detected:

# Verify CUDA is available to PyTorch
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Empty output directory:

Check for error messages in terminal output
Verify input files exist and are formatted correctly
Ensure storage/ directory was created