9. DiffDock-PP

DiffDock-PP (paper, code) is a graph neural network trained for de-noising of rigid transformations (rotation and translation) to predict protein-protein docking orientations between two rigid protein subunits.

Why Use DiffDock-PP?

  • Fast protein-protein docking: Predicts binding orientations without expensive sampling
  • Validation tool: Orthogonally validate structure predictions from other methods
  • Ensemble predictions: Generate multiple docking poses for uncertainty estimation
  • Rigid-body docking: Efficient for cases where backbone flexibility is minimal

Related Tools: For protein-ligand docking, see PLACER. For flexible ligand binding with ensemble generation, see PLACER. For structure prediction of complexes, see Chai-1 or Boltz-2.

Resource Requirements

Resource Minimum Recommended Notes
GPU RAM 8 GB 16 GB Scales with protein size
CPU RAM 8 GB 16 GB For preprocessing
Disk Space 2 GB 5 GB Model weights
CUDA 11.6 11.6-11.7 Specific version required

Preparation

Mark as complete

Prerequisites:

  • Completed HPC Setup guide
  • Conda/Mamba installed
  • CUDA 11.6 or 11.7 available

Check CUDA availability:

module avail cuda
# Look for cuda/11.6 or cuda/11.7

Installation

Mark as complete

  1. Clone the DiffDock-PP repository:
git clone https://github.com/ketatam/DiffDock-PP.git
cd DiffDock-PP
  1. Create a new environment:
mamba create -n diffdock_pp python=3.9
mamba activate diffdock_pp
  1. Install PyTorch with CUDA 11.6:
mamba install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia

Why CUDA 11.6? DiffDock-PP was developed and tested with this version. Using different versions may cause compatibility issues with PyG.

  1. Install PyTorch Geometric (PyG) packages:
mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg
  1. Install remaining dependencies:
mamba install mkl=2024.0 "numpy<2.0" dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlib

Why numpy<2.0? NumPy 2.0 introduced breaking changes that affect many scientific packages. Keeping NumPy below 2.0 ensures compatibility.

Testing the Installation

Mark as complete

  1. Create required directories:
mkdir storage
  1. Run the test script on the DB5 benchmark:
bash src/db5_inference.sh

Success indicators:

  • Command completes without errors
  • Output folder visualization/epoch-0/ is created
  • Directory contains PDB files of docked complexes (multiple .pdb files)

Expected runtime: 5-15 minutes depending on GPU.

Verify output:

ls visualization/epoch-0/*.pdb | wc -l
# Should show multiple PDB files

HPC Job Script

#!/bin/bash
#SBATCH --job-name=diffdock_pp
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=%x_%j.out

module load cuda/11.6

# source ~/.bashrc
mamba activate diffdock_pp

cd /path/to/DiffDock-PP

# Create output directory
mkdir -p storage

# Run inference
bash src/db5_inference.sh

Usage Examples

Run DB5 benchmark (default test):

bash src/db5_inference.sh

Custom docking (requires understanding the codebase):

DiffDock-PP requires input data in a specific format. For custom proteins:

  1. Prepare receptor and ligand PDB files
  2. Create data configuration files
  3. Run inference script

See the repository documentation for detailed input format requirements.

Understanding the Output

Output structure:

visualization/
└── epoch-0/
    ├── complex_1_pose_0.pdb    # Docking pose 1
    ├── complex_1_pose_1.pdb    # Docking pose 2
    ├── complex_1_pose_2.pdb    # Docking pose 3
    └── ...                      # More poses

Each PDB file contains:

  • Both protein chains with predicted relative orientation
  • Multiple poses represent different docking predictions
  • Compare poses to assess uncertainty

Use Cases

  • Protein-Protein Docking: Predict binding orientations between protein chains
  • Complex Validation: Validate predicted protein-protein interfaces from other methods
  • Ensemble Generation: Generate multiple docking poses to capture uncertainty
  • Benchmarking: Compare against other docking methods

When to Use DiffDock-PP vs Other Tools

Tool Best For
DiffDock-PP Rigid protein-protein docking
PLACER Protein-ligand docking with conformational sampling
Chai-1/Boltz-2 Ab initio complex structure prediction
BindCraft De novo binder design

Troubleshooting

WarningCommon Issues

PyG installation fails:

  • Ensure CUDA toolkit is loaded before installing

  • Install PyTorch first, then PyG packages

  • Verify versions match:

    python -c "import torch; print(torch.__version__, torch.cuda.is_available())"

CUDA version mismatch:

Check your system’s CUDA version:

nvcc --version

This should match the pytorch-cuda version (11.6). If not:

module load cuda/11.6

“ModuleNotFoundError” for PyG components:

  • Install all PyG packages together:

    mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg

NumPy errors:

  • Ensure numpy<2.0 is installed
  • Don’t upgrade NumPy even if prompted

GPU not detected:

# Verify CUDA is available to PyTorch
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Empty output directory:

  • Check for error messages in terminal output
  • Verify input files exist and are formatted correctly
  • Ensure storage/ directory was created