This tutorial will guide you through the complete workflow for designing cyclic peptides using AI tools. We will first generte cyclic backbones with RFDiffusion, then design the backbone sequence design with ProteinMPNN, and finally predict and validate the peptide structure with AlphaFold2.
We begin our design process by generating cyclic peptide backbones
using RFDiffusion, which has been trained to understand protein geometry
and can generate realistic backbone conformations. For cyclic peptide
generation with RFDiffusion, peptides of sizes 10-14 are generally good
sizes to use, though you can experiment with smaller or larger peptides
as well. For an individual peptide, the RFDiffusion CPU code generally
takes less than half a minute per structureon the molgraph machines, so
fell free to increase the num_designs parameter. The
benefit of staying at or under 14 amino acids is that you can re-predict
the peptide structure using the orthogonal Rosetta
simple_cyc_pep_predict application.
To generate the cyclic peptide backones, activate the RFDiffusion
environment using conda activate SE3nv and create the
output directory with mkdir -p cyclic_gen. Then run the
following command:
~/rosetta_workshop/RFdiffusion/scripts/run_inference.py \
'contigmap.contigs=[10-10]' \
inference.output_prefix=./cyclic_gen/macrocycle \
inference.num_designs=5 \
inference.cyclic=True \
inference.cyc_chains='a'
This command generates five different 10-amino acid cyclic peptide
backbones, with the cyclic constraint applied to chain ‘a’. The contigs
parameter [10-10] specifies that we want peptides of
exactly 10 residues, but you can modify this to generate different sizes
such as [8-8], [12-12], or even ranges like
[10-14].
Workshop participants should feel free to change the peptide size in
the contigs or design more peptides by increasing the
num_designs parameter. The generated backbones will be
saved to the cyclic_gen directory. When examining these
structures, look for backbones that display clear secondary structure
elements such as beta sheets or alpha helical character, as these are
more likely to lead to successful designs for the next steps.
Now, we will use ProteinMPNN do design your favorite peptide backbone from step 1. When making this selection, prioritize peptides with more internal hydrogen bonds and, those that have more beta sheet or alpha helical character, as these structural features are more likely to be successful for generating a stable peptide with this pipeline.
To begin sequence design, activate the ProteinMPNN environment with
conda activate ligandmpnn_env and create the output
directory using mkdir -p mpnn_design. Then run ProteinMPNN
with the following command:
python ~/rosetta_workshop/LigandMPNN/run.py \
--checkpoint_protein_mpnn ~/rosetta_workshop/LigandMPNN/model_params/proteinmpnn_v_48_020.pt \
--checkpoint_path_sc ~/rosetta_workshop/LigandMPNN/model_params/ligandmpnn_sc_v_32_002_16.pt \
--pdb_path cyclic_gen/macrocycle_1.pdb \
--out_folder mpnn_design/ \
--model_type "protein_mpnn" \
--number_of_batches 1 \
--batch_size 10 \
--pack_side_chains 1 \
--number_of_packs_per_design 1
Remember to update the --pdb_path parameter to point to
your chosen backbone structure from the cyclic_gen directory. This
command will generate 10 different sequence designs for your chosen
backbone, with optimized side chain conformations. The
--pack_side_chains 1 option ensures that ProteinMPNN not
only designs the sequence but also optimizes the side chain rotamers for
better packing. The packed designs can be visualized using
pymol mpnn_design/packed/*.pdb, where you can examine how
well the designed sequences fill the backbone structure. The designed
sequences themselves can be found by examining the contents with
cat mpnn_design/seqs/macrocycle_*.fa, which contains the
amino acid sequences in FASTA format.
The final step in our workflow involves using AlphaFold2 via ColabDesign to predict the structure of our designed cyclic peptide. This structure prediction step will validate whether AlphaFold thinks our design will fold as intended. To predict the structure, we will use AlphaFold with a cyclic peptide offset via a Python script that uses ColabDesign to predict the structure of the designed cyclic peptide.
Before running the AlphaFold prediction, set the JAX platforms to CPU
using export JAX_PLATFORMS=cpu as the molgraph machines to
not have a GPU availiable. Extract your chosen designed sequence from
the ProteinMPNN output by examining the sequences in the
mpnn_design/seqs directory. Once you’ve selected a promising sequence,
run the AlphaFold prediction using the provided script and the
colabdesign python from sbgrid:
python.colabdesign af_cyc_pred.py KILPGVISEG -o af_output --model_num 1 \
--params_path /programs/x86_64-linux/colabdesign/1.1.2/
Replace “KILPGVISEG” with your actual designed sequence. The script automatically applies cyclic peptide constraints by connecting the N-terminus to the C-terminus, ensuring that AlphaFold considers the cyclic nature of the peptide during structure prediction.
The resulting AlphaFold predicted peptide can be visualized at the
default location using pymol af_output/cyclic_peptide.pdb.
In PyMOL, the spectrum b command will color the peptide by
B-factor, which corresponds to AlphaFold’s confidence scores. The range
of B-factors is printed to the screen. The script also generates a
results file containing key metrics including pLDDT (overall
confidence), pTM (template modeling score), and PAE (predicted aligned
error), which help assess the quality and reliability of the prediction.
For peptides, pLDDT is generally the best method to assess quality and
pLDDT scores above 0.9 (this pLDDT is rescaled be be between 0 and 1)
are considered good. Because peptides are so small, the pTM is not an
accurate representation of a peptides stability.
This workflow can easily be expanded to designing peptide binders either de novo or that stabilize a known binding motif. In this use case, you may want to generate 10,000+ starting backbones for sequence design. After designing the peptide sequence and predicting its structure in complex with the target, iteratively redesigning and repredicting the peptide-target complex can optimize the peptide-target interactions into a design minima. The ColabDesign github (https://github.com/sokrypton/ColabDesign) has a designability_test.py script that can serve as a template for designing and validating peptide binders.
Alternatively, structure prediction methods based on AlphaFold3 such as boltz and RosettaFold3 can be used to validate peptide-protein complexes.