9. DiffDock-PP
DiffDock-PP (paper, code) is a graph neural network trained for de-noising of rigid transformations (rotation and translation) to predict protein-protein docking orientations between two rigid protein subunits.
Why Use DiffDock-PP?
- Fast protein-protein docking: Predicts binding orientations without expensive sampling
- Validation tool: Orthogonally validate structure predictions from other methods
- Ensemble predictions: Generate multiple docking poses for uncertainty estimation
- Rigid-body docking: Efficient for cases where backbone flexibility is minimal
Related Tools: For protein-ligand docking, see PLACER. For flexible ligand binding with ensemble generation, see PLACER. For structure prediction of complexes, see Chai-1 or Boltz-2.
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 8 GB | 16 GB | Scales with protein size |
| CPU RAM | 8 GB | 16 GB | For preprocessing |
| Disk Space | 2 GB | 5 GB | Model weights |
| CUDA | 11.6 | 11.6-11.7 | Specific version required |
Preparation
Mark as complete
Prerequisites:
- Completed HPC Setup guide
- Conda/Mamba installed
- CUDA 11.6 or 11.7 available
Check CUDA availability:
module avail cuda
# Look for cuda/11.6 or cuda/11.7Installation
Mark as complete
- Clone the DiffDock-PP repository:
git clone https://github.com/ketatam/DiffDock-PP.git
cd DiffDock-PP- Create a new environment:
mamba create -n diffdock_pp python=3.9
mamba activate diffdock_pp- Install PyTorch with CUDA 11.6:
mamba install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidiaWhy CUDA 11.6? DiffDock-PP was developed and tested with this version. Using different versions may cause compatibility issues with PyG.
- Install PyTorch Geometric (PyG) packages:
mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg- Install remaining dependencies:
mamba install mkl=2024.0 "numpy<2.0" dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlibWhy numpy<2.0? NumPy 2.0 introduced breaking changes that affect many scientific packages. Keeping NumPy below 2.0 ensures compatibility.
Testing the Installation
Mark as complete
- Create required directories:
mkdir storage- Run the test script on the DB5 benchmark:
bash src/db5_inference.shSuccess indicators:
- Command completes without errors
- Output folder
visualization/epoch-0/is created - Directory contains PDB files of docked complexes (multiple
.pdbfiles)
Expected runtime: 5-15 minutes depending on GPU.
Verify output:
ls visualization/epoch-0/*.pdb | wc -l
# Should show multiple PDB filesHPC Job Script
#!/bin/bash
#SBATCH --job-name=diffdock_pp
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=%x_%j.out
module load cuda/11.6
# source ~/.bashrc
mamba activate diffdock_pp
cd /path/to/DiffDock-PP
# Create output directory
mkdir -p storage
# Run inference
bash src/db5_inference.shUsage Examples
Run DB5 benchmark (default test):
bash src/db5_inference.shCustom docking (requires understanding the codebase):
DiffDock-PP requires input data in a specific format. For custom proteins:
- Prepare receptor and ligand PDB files
- Create data configuration files
- Run inference script
See the repository documentation for detailed input format requirements.
Understanding the Output
Output structure:
visualization/
└── epoch-0/
├── complex_1_pose_0.pdb # Docking pose 1
├── complex_1_pose_1.pdb # Docking pose 2
├── complex_1_pose_2.pdb # Docking pose 3
└── ... # More poses
Each PDB file contains:
- Both protein chains with predicted relative orientation
- Multiple poses represent different docking predictions
- Compare poses to assess uncertainty
Use Cases
- Protein-Protein Docking: Predict binding orientations between protein chains
- Complex Validation: Validate predicted protein-protein interfaces from other methods
- Ensemble Generation: Generate multiple docking poses to capture uncertainty
- Benchmarking: Compare against other docking methods
When to Use DiffDock-PP vs Other Tools
| Tool | Best For |
|---|---|
| DiffDock-PP | Rigid protein-protein docking |
| PLACER | Protein-ligand docking with conformational sampling |
| Chai-1/Boltz-2 | Ab initio complex structure prediction |
| BindCraft | De novo binder design |
Troubleshooting
PyG installation fails:
Ensure CUDA toolkit is loaded before installing
Install PyTorch first, then PyG packages
Verify versions match:
python -c "import torch; print(torch.__version__, torch.cuda.is_available())"
CUDA version mismatch:
Check your system’s CUDA version:
nvcc --versionThis should match the pytorch-cuda version (11.6). If not:
module load cuda/11.6“ModuleNotFoundError” for PyG components:
Install all PyG packages together:
mamba install pytorch-scatter pytorch-sparse pytorch-cluster pytorch-spline-conv pyg -c pyg
NumPy errors:
- Ensure
numpy<2.0is installed - Don’t upgrade NumPy even if prompted
GPU not detected:
# Verify CUDA is available to PyTorch
python -c "import torch; print(torch.cuda.is_available())"
# Should print: TrueEmpty output directory:
- Check for error messages in terminal output
- Verify input files exist and are formatted correctly
- Ensure
storage/directory was created