1. PyMOL and Visual Studio Code
This module introduces two essential tools for computational structural biology: PyMOL for molecular visualization and Visual Studio Code (VSCode) for code development and remote computing.
Live Workshop Session
📊 View slide deck
Part 1: PyMOL - Molecular Visualization
What is PyMOL?
PyMOL is the industry-standard molecular visualization software used by structural biologists, computational chemists, and protein engineers worldwide. It’s built on Python under the hood, which means:
- Commands are Python-based: You can script complex visualization workflows
- Extensible: The PyMOL library allows you to add custom functionality
- Reproducible: Scripts ensure your figures can be regenerated exactly
- Download from: pymol.org
- Conda installation:
conda install -c schrodinger -c conda-forge pymol
The Conda installation is recommended for integration with Python workflows.
Why PyMOL Matters for AI/ML Protein Work
In the context of this bootcamp, PyMOL serves several critical functions:
- Diagnosing AI predictions: When AlphaFold2, ESMFold, or other tools generate structures, you need to visually assess whether the predictions are reasonable
- Evaluating designs: Tools like RFdiffusion and LigandMPNN produce protein designs—PyMOL lets you inspect binding pockets, interfaces, and overall fold quality
- Experimental planning: Before doing mutagenesis or other experiments, you can examine hydrogen bond networks, identify key residues, and plan mutations in silico
- Publication figures: High-quality structure figures are essential for papers and presentations
Interactive Selections
One of PyMOL’s most powerful features is its selection system. You can select atoms, residues, chains, or entire objects using:
Click selections: - Click on any atom to select it - Shift+click to add to selection - Ctrl+click to remove from selection
Selection algebra (in the command line):
# Select chain A
select chainA, chain A
# Select residues 50-100
select loop, resi 50-100
# Select all alpha carbons
select cas, name CA
# Select residues within 5 Angstroms of a ligand
select binding_site, byres all within 5 of organicRepresentations (“Show As”)
Different representations reveal different aspects of structure:
| Representation | Best For |
|---|---|
| Cartoon | Overall fold, secondary structure |
| Sticks | Active sites, ligand interactions |
| Surface | Binding pockets, electrostatics |
| Spheres | Space-filling, VDW radii |
| Lines | Quick overview, large structures |
| Ribbon | Simplified backbone trace |
Example workflow:
# Load a structure
fetch 1GFL
# Show cartoon for overall structure
show cartoon
# Show sticks for the chromophore
select chromophore, resn CRO
show sticks, chromophore
# Hide everything else as sticks
hide sticks, not chromophoreSequence and Structure Connection
PyMOL shows the sequence along the bottom of the window. This is incredibly useful because:
- Click on sequence: Highlights that residue in the 3D view
- Drag across sequence: Selects a range of residues
- Colored by secondary structure: Helices, sheets, and loops are color-coded
This sequence-structure connection is essential when you’re comparing a predicted structure to what you know about the protein’s function.
Creating Publication-Quality Figures
PyMOL excels at creating beautiful figures. Key settings to know:
Coloring options:
# Color by chain
util.cbc # Color by chain (automatic)
# Color by secondary structure
color red, ss h # Helices red
color yellow, ss s # Sheets yellow
color green, ss l+ # Loops green
# Rainbow coloring (N-terminus to C-terminus)
spectrum count, rainbow
# Color by B-factor (temperature or pLDDT)
spectrum b, blue_white_red, minimum=50, maximum=100Transparency for depth:
# Make cartoon semi-transparent
set cartoon_transparency, 0.5, object_name
# Pro tip: Set most of structure to 0.85 transparency,
# then set region of interest to 0 for highlighting
set cartoon_transparency, 0.85, all
set cartoon_transparency, 0, active_siteSaving figures:
# Set background to white (for publications)
bg_color white
# Ray-trace for high quality
ray 2400, 2400
# Save as PNG
png my_figure.png, dpi=300Useful Commands Reference
| Command | Description |
|---|---|
fetch XXXX |
Download structure from PDB |
load file.pdb |
Load local file |
align obj1, obj2 |
Superimpose structures |
rms_cur obj1, obj2 |
Calculate RMSD |
center selection |
Center view on selection |
zoom selection |
Zoom to fit selection |
orient |
Reset to default orientation |
set grid_mode, 1 |
View multiple objects side-by-side |
- PyMOL Wiki: pymolwiki.org - Comprehensive command reference
- Tutorial PDF: PyRosetta Workshop PyMOL Tutorial
Resources
- PyMOL Tutorial Slides (PDF)
- Aesthetic Script (Python) - Script to recreate RFdiffusion-style figure aesthetics
Using the Aesthetic Script:
- Download
aesthetic.py - In PyMOL: File → Run Script… and select the file
- Or from command line:
run /path/to/aesthetic.py - New commands available:
get_colorsandget_lighting
Part 2: Visual Studio Code
Why VSCode?
Visual Studio Code is one of the most popular IDEs (Integrated Development Environments) for coding, and it’s particularly powerful for computational biology work because:
- Remote development: SSH directly into HPC clusters and edit code as if it were local
- Extensions: Thousands of extensions for Python, Jupyter, Git, and more
- Customizable: Tailor the interface to your workflow
- Free: Open source and cross-platform
Getting Started
Download: code.visualstudio.com
After installation, the interface has several key areas:
- Explorer (left sidebar): File browser
- Editor (center): Where you write code
- Terminal (bottom): Integrated command line
- Extensions (left sidebar icon): Install add-ons
- Source Control (left sidebar): Git integration
Learn more: VSCode Introductory Videos
SSH into HPC Clusters
This is perhaps the most valuable feature for our bootcamp. Instead of editing files on the cluster with vim/nano, you can:
- Connect VSCode to your HPC account
- Browse files graphically
- Edit with full IDE features (syntax highlighting, autocomplete, etc.)
- Run terminals on the cluster
Setup Steps:
- Install the Remote - SSH extension in VSCode
- Click the green
><icon in the bottom-left corner - Select “Connect to Host…” → “Add New SSH Host…”
- Enter your SSH command:
username@cluster.university.edu - Follow prompts to authenticate
For easier connections, add your cluster to ~/.ssh/config:
Host mycluster
HostName cluster.university.edu
User yourusername
IdentityFile ~/.ssh/id_rsa
Then you can just connect to “mycluster” in VSCode.
Guides:
Viewing Proteins Remotely with Mol*
When working on an HPC cluster, you often want to view protein structures without downloading them to your local machine. VSCode has a Mol* extension that provides an interactive 3D protein viewer.
Setup:
- Install the Mol* extension in VSCode
- Right-click any
.pdbfile → “Open with Mol*”
Features:
- Color by chain, secondary structure, hydrophobicity, B-factor
- Calculate RMSD between structures
- Measure distances and angles
- Toggle between chains/models
- Create snapshots and animations
Documentation: Mol* Viewer Docs
Jupyter Notebooks in VSCode
Jupyter notebooks (.ipynb files) are excellent for:
- Exploratory analysis: Test code in small chunks
- Data visualization: Plot results inline
- Documentation: Mix code with markdown explanations
- Sharing: Notebooks show both code and output
VSCode + Jupyter:
- Install the Jupyter extension
- Create a new notebook:
code -r filename.ipynb - Select a Python kernel (your conda environment)
- Run cells with Shift+Enter
This works even when SSH’d into a cluster—you can run Jupyter notebooks on HPC GPUs through VSCode!
Documentation: Jupyter.org Docs
Useful Extensions
| Extension | Purpose |
|---|---|
| Remote - SSH | Connect to HPC clusters |
| Python | Python language support |
| Jupyter | Notebook support |
| Mol* | Protein structure viewer |
| autoDocstring | Generate docstrings automatically |
| GitLens | Enhanced Git integration |
Tips for Remote Work
Tmux - Terminal Multiplexer:
When running long jobs on a cluster, tmux lets you: - Detach from a session and reconnect later - Keep processes running even if your connection drops - Split your terminal into multiple panes
Learn more: Introduction to Tmux
Basic tmux workflow:
# Start a new session
tmux new -s mysession
# Detach (Ctrl+b, then d)
# List sessions
tmux ls
# Reattach
tmux attach -t mysessionHands-On Exercise
Part 1: PyMOL Basics
Goal: Get comfortable with PyMOL navigation and basic commands.
Download and install PyMOL (or use the version on your cluster/classroom computers)
Load your first structure:
fetch 1GFLThis downloads Green Fluorescent Protein (GFP) directly from the PDB.
Practice navigation:
- Rotate the structure (left-click drag)
- Zoom in and out (scroll wheel or right-click drag)
- Translate/pan (middle-click drag)
- Try
orientto reset the view
Explore representations:
# Try each of these: show cartoon show sticks show surface show spheres # Reset to just cartoon hide everything show cartoonWork with selections:
# Select the chromophore (the part that glows!) select chromophore, resn CRO # Show it as sticks show sticks, chromophore # Color it differently color yellow, chromophoreColor by B-factor (confidence in predicted structures):
spectrum b, blue_white_red, minimum=0, maximum=100
Part 2: Measurements and Analysis
Use the Measurement Wizard:
- Go to
Wizard → Measurement - Click on two atoms to measure their distance
- Try measuring a hydrogen bond (should be ~2.8-3.2 Å)
- Go to
Calculate RMSD between objects:
# Fetch another GFP structure fetch 1EMA # Align them align 1EMA, 1GFL # The RMSD will be printed in the console
Part 3: Create a Publication Figure
Set up for a nice figure:
# White background bg_color white # Show cartoon representation hide everything show cartoon # Color by chain util.cbc # Or try rainbow coloring spectrum count, rainbowHighlight the chromophore:
show sticks, chromophore color yellow, chromophoreRay-trace and save:
ray 1200, 1200 png my_first_pymol_figure.png, dpi=300
Part 4: Try the Aesthetic Script (Optional)
If you have time, use the provided aesthetic.py script:
- Download aesthetic.py
- In PyMOL:
run /path/to/aesthetic.py - Try the new commands:
get_colorsandget_lighting - Create a publication-quality render in the RFdiffusion style!
Questions to Consider
- What does the GFP chromophore look like? How is it positioned within the protein?
- When you colored by B-factor, which regions had high vs. low values? What might this mean?
- How similar were the two GFP structures (1GFL and 1EMA)? What was the RMSD?
- Why is it useful to show both cartoon (overall structure) and sticks (specific residues) simultaneously?
VSCode Setup Exercise
If you have access to an HPC cluster:
- Install the Remote-SSH extension in VSCode
- Connect to your cluster using the steps described above
- Navigate to your home directory and create a test file
- Open a terminal within VSCode (Terminal → New Terminal)
- Verify you can run commands on the cluster through VSCode
This setup will be essential for running AlphaFold2 and other prediction tools in the upcoming exercises!