3. LigandMPNN
LigandMPNN (paper, code) is a deep learning model for context-aware protein sequence design. It extends ProteinMPNN to handle small molecules, metal ions, and other non-protein components in protein design tasks.
Live Workshop Session
Why Use LigandMPNN?
- Ligand-aware design: Design sequences that account for bound cofactors, substrates, or drug molecules
- Context preservation: Maintain interactions with metals, DNA, RNA, or other molecules
- Side chain packing: Evaluate and optimize side chain conformations
- Flexible residue control: Fix, bias, or vary specific positions
Related Tools: Use with RFdiffusion2 for backbone design, or BindCraft for complete binder design pipelines.
Resource Requirements
| Resource | Minimum | Recommended | Notes |
|---|---|---|---|
| GPU RAM | 4 GB | 16 GB | Scales with protein size |
| CPU RAM | 8 GB | 16 GB | CPU-only is viable but slower |
| Disk Space | 2 GB | 5 GB | Model weights |
| Python | 3.9+ | 3.11 | Required |
Preparation
Mark as complete
Prerequisites:
- Completed HPC Setup guide
- Conda/Mamba installed
- Git installed
Verify your environment:
python --version # Should be 3.9+
nvcc --version # For GPU support (optional)Installation
Mark as complete
- Clone the LigandMPNN repository:
git clone https://github.com/dauparas/LigandMPNN.git
cd LigandMPNN- Download the model parameters: Note: This step requires internet access. If your compute node doesnβt have internet, run this on a login node.
bash get_model_params.sh "./model_params"Expected download: ~500 MB of model weights.
- Create a new conda environment:
mamba create -n ligandmpnn_env python=3.11
mamba activate ligandmpnn_env- Install dependencies:
pip install -r requirements.txtThis installs PyTorch, NumPy, and ProDy for PDB file handling.
Testing the Installation
Mark as complete
Run a test design on the provided example structure:
python run.py \
--seed 111 \
--pdb_path "./inputs/1BC8.pdb" \
--out_folder "./outputs/test_output"Success indicators:
- Command completes without errors
- Output folder contains:
seqs/1BC8.fa- Designed sequences in FASTA formatbackbones/1BC8.pdb- Input backbone (for reference)packed/1BC8_1.pdb- Structure with designed side chains
Expected runtime: <1 minute on GPU, ~5 minutes on CPU.
HPC Job Script
#!/bin/bash
#SBATCH --job-name=ligandmpnn
#SBATCH --partition=gpu
#SBATCH --gpus=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
#SBATCH --time=02:00:00
#SBATCH --output=%x_%j.out
module load cuda/12.1
# source ~/.bashrc # Optional: Source shell profile if needed
mamba activate ligandmpnn_env
cd /path/to/LigandMPNN
python run.py \
--model_type "ligand_mpnn" \
--seed 111 \
--pdb_path "./inputs/my_protein.pdb" \
--out_folder "./outputs/my_design" \
--number_of_batches 10Usage Examples
Basic protein design (no ligand):
python run.py \
--pdb_path "protein.pdb" \
--out_folder "output/"Design with ligand context:
python run.py \
--model_type "ligand_mpnn" \
--pdb_path "protein_ligand.pdb" \
--out_folder "output/"Fix specific residues (keep them unchanged):
python run.py \
--pdb_path "protein.pdb" \
--fixed_residues "A10 A20 A30" \
--out_folder "output/"Design only specific positions:
python run.py \
--pdb_path "protein.pdb" \
--redesigned_residues "A50 A51 A52 A53" \
--out_folder "output/"Batch processing multiple structures:
# Create a JSON file listing inputs
echo '{"1": "input1.pdb", "2": "input2.pdb"}' > input_list.json
python run.py \
--pdb_path_multi "input_list.json" \
--out_folder "batch_output/"With temperature control (higher = more diverse):
python run.py \
--pdb_path "protein.pdb" \
--temperature 0.2 \
--out_folder "output/"Key Parameters
| Parameter | Description | Default |
|---|---|---|
--model_type |
Model variant: protein_mpnn, ligand_mpnn, soluble_mpnn, etc. |
protein_mpnn |
--temperature |
Sampling temperature (0.1-1.0). Lower = more conservative | 0.1 |
--number_of_batches |
Number of sequences to generate | 1 |
--batch_size |
Sequences per batch | 1 |
--fixed_residues |
Space-separated residues to keep unchanged | None |
--redesigned_residues |
Only design these residues | All |
--bias_AA |
Bias toward specific amino acids | None |
Model Types
| Model | Use Case |
|---|---|
protein_mpnn |
Standard protein sequence design |
ligand_mpnn |
Design with small molecule context |
soluble_mpnn |
Bias toward soluble sequences |
global_label_membrane_mpnn |
Membrane protein design |
per_residue_label_membrane_mpnn |
Fine-grained membrane design |
Understanding the Output
Output directory structure:
output/
βββ seqs/
β βββ protein.fa # Designed sequences
βββ backbones/
β βββ protein.pdb # Input structure
βββ packed/
βββ protein_1.pdb # Design 1 with side chains
βββ protein_2.pdb # Design 2 with side chains
FASTA output format:
>protein, score=1.234, seq_recovery=0.456
MVKLTAEGSE...
score: Negative log-likelihood (lower = better fit to backbone)seq_recovery: Fraction matching native sequence (if provided)
Troubleshooting
βRuntimeError: CUDA out of memoryβ:
- Use CPU instead: remove CUDA module and run without GPU
- Reduce
--batch_size - LigandMPNN is efficient; usually not memory-limited
PDB parsing errors:
- Ensure PDB has proper formatting
- Remove alternate conformations: keep only βAβ conformers
- Check that ligand has proper atom naming
Ligand not recognized:
- Ensure ligand is in the PDB file with HETATM records
- Use
--ligandflag to specify ligand residue name - Check that ligand coordinates are reasonable
Low sequence diversity:
- Increase
--temperature(e.g., 0.2 or 0.3) - Increase
--number_of_batches - Use different random seeds
Side chain clashes in output:
- This is expected - downstream relaxation is recommended
- Use PyRosetta or Rosetta FastRelax
- Or validate with your structure prediction tool of choice