4. Reflection & Capstone Introduction

Before diving into the capstone project, take time to reflect on what you’ve learned and prepare your approach. This self-guided activity helps you consolidate your knowledge and plan your binder design project.

Live Workshop Session

🎥 Live workshop recording — Protein design roundtable discussion

📊 View slide deck

Part 1: Reflecting on Your Learning

Take 10-15 minutes to think through these questions. Consider writing your answers in a notebook or document—this reflection will help solidify your understanding and identify areas to revisit.

Structure Prediction

Reflection Questions

What surprised you most about how AlphaFold2 or ESMFold work?
When would you choose ESMFold over AlphaFold2? What are the trade-offs?
How do you interpret pLDDT scores? What score range makes you confident in a prediction?
What does the PAE matrix tell you that pLDDT alone doesn’t?

Protein Design

Reflection Questions

How does RFdiffusion “know” what shape to generate? What’s the role of the noise schedule?
Why do we need sequence design (LigandMPNN) after backbone generation? Why can’t RFdiffusion output sequences directly?
What’s the difference between unconditional and conditional generation in RFdiffusion?
How do hotspot residues guide the design process?

Computational Concepts

Reflection Questions

Why are GPUs faster than CPUs for these ML tools?
When might CPU actually be faster than GPU?
What’s the benefit of vectorization in numerical computing?

Connecting the Tools

Reflection Questions

Sketch out the typical workflow for designing a protein binder. What tools do you use at each step?
Where in the pipeline would you use structure prediction? Where would you use it for validation?
What would you do if your designed binder had low pLDDT when you predicted its structure?

Part 2: Introduction to the Capstone Project

The capstone project brings together everything you’ve learned. You’ll design a de novo protein binder for a target of your choice.

What is Binder Design?

Binder design is formulated as: “Given a target protein (and optionally a specific epitope), design a smaller protein capable of binding the target.”

This is one of the most important problems in computational protein design, with applications in:

Therapeutics: Blocking disease-related protein interactions
Diagnostics: Creating detection reagents
Research tools: Probing protein function
Synthetic biology: Building new biological circuits

The Design Pipeline

flowchart LR
    Target[Target Selection] --> Structure[Structure Analysis]
    Structure --> Backbone[Backbone Generation<br/>RFdiffusion]
    Backbone --> Sequence[Sequence Design<br/>LigandMPNN]
    Sequence --> Validation[Structure Prediction<br/>AlphaFold2/ESMFold]
    Validation --> Refinement[Iterate & Refine]

    style Backbone fill:#f9f,stroke:#333,stroke-width:2px
    style Sequence fill:#bbf,stroke:#333,stroke-width:2px
    style Validation fill:#bfb,stroke:#333,stroke-width:2px

Available Targets

You’ll choose from one of these curated targets, each representing a different challenge:

Category	Targets	Challenge Level
Immune Checkpoint	PD-L1, IFNAR2, IL-7Rα	Well-studied interfaces, therapeutic relevance
Allergen	Bet v 1	Smaller target, antibody-like design
Enzymes	TrkA, TEM-1, GM2 Activator, Beta-Glucosidase	Diverse binding sites, varied complexity

Choosing Your Target

Consider what interests you most: - Therapeutic relevance? → PD-L1, IFNAR2, IL-7Rα - Smaller, focused project? → Bet v 1 - Enzyme inhibition? → TEM-1, Beta-Glucosidase - Receptor binding? → TrkA

Part 3: Planning Your Approach

Before starting the capstone, develop a plan. Answer these questions to guide your work.

Target Selection

Planning Questions

Which target interests you most? Why?
What PDB structure will you use? Have you looked at it in PyMOL?
What residues define the binding interface? (Check the target’s detail page)
Are there any potential challenges with your target (flexible regions, glycosylation sites, etc.)?

Design Strategy

Planning Questions

What size binder will you aim for? (Typical range: 50-100 residues)
Will you use hotspot conditioning in RFdiffusion? Which residues?
How many designs will you generate at each step?
What metrics will you use to filter designs? (pLDDT, PAE, interface contacts, etc.)

Documentation Plan

Planning Questions

How will you keep track of your experiments? (Lab notebook, Jupyter notebook, README, etc.)
What format will you use to present your work? (Report, slides, video, GitHub repo)
What commands and settings will you record? (Think about reproducibility)

Part 4: Getting Started Checklist

Before moving to the capstone, make sure you can answer “yes” to these:

I’ve reviewed the tools from Monday and know how to run them
I understand the structure prediction outputs (pLDDT, PAE)
I know how RFdiffusion and LigandMPNN work together
I’ve chosen a target (or narrowed it down to 2-3 options)
I’ve looked at my target structure in PyMOL
I have a plan for documenting my work

Ready to Start?

Head to the Capstone Project page to begin! Each target has a dedicated page with specific guidance, PDB information, and strategy tips.

Additional Resources

If you want to deepen your understanding before starting the capstone:

Papers

RFdiffusion paper - Watson et al., Nature 2023
AlphaFold2 paper - Jumper et al., Nature 2021
BindCraft preprint - Pacesa et al., bioRxiv 2024

Communities

RosettaCommons Forums
OpenFold Issues
Protein design Twitter/X community (#proteindesign)

← Docking

Back to Home

Capstone →