# Non-canonical amino acid (NCAA) parameterization using Rosetta

When designing peptides, one can find the vocabulary of the twenty canonical amino acids to be restrictive in designing tight, specific binders to the receptor.  Therefore, a framework needs to exist for modeling amino acids which can have any sidechain chemistry, i.e., non-canonical amino acids (NCAAs).  In order to include these NCAAs into Rosetta design simulations, you must first parameterize them, which involves listing atoms and bonds and their respective types, recording the initial geometry, assigning rotamers, etc.  Thankfully, most of this process can be handled automatically by the `molfile_to_params_polymer.py` script, but rotamer assignment still remains a challenge for which Rosetta proposes multiple solutions.  This tutorial will detail how to use each of these tools in combination with `molfile_to_params_polymer.py` to go from a molfile (.sdf) of a NCAA to a parameter file (.params) which is useable in Rosetta design simulations.

## 1. Using existing canonical parameters for non-canonicals ##

The simplest way of assigning rotamers to a NCAA is to use a set of rotamers which already exist and simply attach them to your NCAA.  However, this requires that your NCAA be quite similar to an existing canonical AA.  For this step, we will be parameterizing a 3,4,5-trifluorophenylalanine (abbreviated in this tutorial as TFF), which highly resembles a phenylalanine.  First, make a directory for your params files and `cd` into that directory.

    mkdir NCAA_params
    cd NCAA_params

The molfile for TFF is located at `../input_files/TFF.sdf`.  There are a few things to note about this input which are required for proper parameterization by Rosetta.  First of all, you should notice that the backbone is in a dipeptide form where each end of the amino acid is extended to a methyl group representing the adjacent C-alpha atoms of neighboring amino acids.  This is necessary for Rosetta to understand how to connect this AA to adjacent AA's.  In the file itself, there is also a set of lines which inform Rosetta which atoms correspond to the various atoms of the backbone, which atoms connect to the upper and lower AA's in the sequence, and properties such as charge, aromaticity, and chirality.  Because of these instructions, to parameterize this NCAA, all we have to run is:

    python <RosettaDir>/main/source/scripts/python/public/molfile_to_params_polymer.py \
        --clobber --polymer --no-pdb --name TFF --use-parent-rotamers PHE \
        -i ../input_files/TFF.sdf

Note the usage of `--use_parent_rotamers` in this command, as this is what establishes which canonical AA rotamers you want to use.  If you look at the generated parameter file with `cat TFF.params`, you should see a line which says `ROTAMER_AA PHE`, indicating phenylalanine's rotamers are being used for this AA.

## 2. Rigorous rotamer calculation using MakeRotLib ##

In 2012, [Renfrew et. al.](https://doi.org/10.1371%2Fjournal.pone.0032637) developed MakeRotLib, for generating NCAA rotamers through minimization of iterated initial conformational states using a hybrid Rosetta/CHARMM energy function.  This protocol remains the most rigorous calculation of NCAA rotamers that exists in Rosetta, but due to its rigor, its runtime is not suitable for large libraries of NCAAs.  In addition, the runtime scales exponentially with the number of chi angles and caps at 4 chis, so this protocol is also not suitable for highly flexible sidechains.  Finally, MakeRotLib is not capable of handling anything other than monosubstituted alpha amino acids, so if your amino acid structure is exotic, it won't be able to be processed by MakeRotLib.  To demonstrate the functionality of MakeRotLib, you will parameterize an amino acid with an ethyl group as a sidechain (abbreviated as EAA in this tutorial).  Similar to the previous case, `EAA.sdf` is in dipeptide form and has the necessary instructions for Rosetta to parameterize the molecule.  Run `molfile_to_params_polymer.py` to get the parameter file:

    python <RosettaDir>/main/source/scripts/python/public/molfile_to_params_polymer.py \
        --clobber --polymer --no-pdb --name EAA -i ../input_files/EAA.sdf

However, since we excluded `--use_parent_rotamers`, this parameter file is not ready for use in Rosetta yet.  In order to run MakeRotLib and generate the rotamers, you will need an options file, supplied at `../input_files/EAA_makerotlib_options.in`.  This file specifies the angle ranges over which MakeRotLib should iterate, the number of chi angles in the sidechain, and initial guesses as to how many chi angle bins there are and where they lie.  For this tutorial, the options file will consider all possible phi and psi angle values at increments of 10 degrees, every value of the single chi angle at 30 degree increments, and assume there are 3 chi rotamer bins each spaced 120 degrees apart (which is reasonable since the canonicals generally show this pattern as well).  Now to run MakeRotLib:

    <RosettaDir>/main/source/bin/MakeRotLib.default.linuxgccrelease \
        -extra_res_fa ./EAA.params -score:weights mm_std \
        -options_file ../input_files/EAA_makerotlib_options.in

This calculation should take a few minutes, after which you should have a ton of  `EAA_*` files in your current directory.  This directory contains logs from running MakeRotLib as well as a `.rotlib` file for each pair of phi/psi angle values.  The final objective is to consolidate all of these files into a single `.rotlib`, and add a reference to this rotlib into the parameter file:

    for i in `seq -170 10 180`; do
        for j in `seq -170 10 180`; do 
            cat EAA_180_${i}_${j}_180.rotlib >> EAA.rotlib
        done
    done
    echo "NCAA_ROTLIB_PATH $PWD/EAA.rotlib"  >> EAA.params
    echo "NCAA_ROTLIB_NUM_ROTAMER_BINS 1 3" >> EAA.params

Open the EAA.rotlib and EAA.params files with a text editor, and check to make sure there's no obvious errors. If not, you can remove the temporary files

    rm -rf EAA_*

Now that the file `EAA.params` has been assigned rotamers, it is now ready to use in Rosetta.  Note that the `NCAA_ROTLIB_PATH` is hardcoded, so if you use the example .params file in `output_files`, you will need to change this path to match your environment.

## 3. Small molecule conformers as NCAA rotamer libraries (FakeRotLib) ##

While MakeRotLib is the most accurate method for rotamer construction in Rosetta, it does not apply in many contexts, as was previously discussed.  To address some of these shortcomings, we consider the NCAA as a small molecule and define the rotamers of the NCAA as low energy conformers of the "small molecule".  The implementation of this idea is the `fake_rotlib.py` script, which uses RDKit to generate conformations of the NCAA, score the conformations using the UFF forcefield, and utilize the N lowest energy conformations in the parameter file as "PDB rotamers".  The distinction between this implementation of the rotamer library and the previous methods is that the previous methods define the _distribution_ of rotamers and then score a given conformation according to its position in that distribution, whereas PDB rotamers store a set of acceptable conformations and randomly draws from these conformations when modeling the residue.  Since PDB rotamers don't have to fit into the distribution parameters accepted by Rosetta, pretty much any NCAA can be accommodated by PDB rotamers.  On the other hand, PDB rotamers inherently discretize the conformational space, are not compatible with some movers, and generally require more compute time and memory in modeling.  To allow both types of rotamer libraries to be built, `fake_rotlib.py` also has functionality to generate a rotamer distribution file (`.rotlib`) from the PDB rotamers (as long as the NCAA has four or less chi angles). In addition to rotamer modeling, `fake_rotlib.py` automates a few other steps of the process, including dipeptide capping, writing the params instructions, and running `molfile_to_params_polymer.py`. For more information on FakeRotLib, see it's relevant [publication](https://doi.org/10.1021/acs.jcim.5c01030).

As a demonstration of `fake_rotlib.py`, we parameterize another phenylalanine derivative (with an attached Bis(2-chloroethyl)amine group), abbreviated as MFF in this tutorial.  Since this molecule has far more chi angles than any other we've parameterized before, we will be using PDB rotamers in lieu of a `.rotlib` file.  `fake_rotlib.py` depends on RDKit to generate conformers, so we need a python environment with it installed. Since RDKit is already installed in our base python interpreter, simply run:

    python <RosettaDir>/main/source/scripts/python/public/fake_rotlib.py \
        --input ../input_files/MFF.sdf --dip -n 100
    mv ../input_files/MFF* ./

Note that the `--dip` flag is used here because the input `MFF.sdf` is already in dipeptide form and has parameterization instructions pre-generated.  If instructions need to be generated, run without this flag and ensure that the input is NOT in dipeptide form (either neutral or zwitterionic backbone is acceptable).  This causes two important files to be generated: the `MFF.params` which possesses a reference to the PDB rotamers file `MFF_rotamer.pdb`.  `MFF_rotamer.sdf` is also here, but this is an intermediate file used input to `molfile_to_params_polymer.py`.  Regardless, as long as `MFF_rotamer.pdb` remains in the same directory as `MFF.params`, the file is ready to be used in Rosetta simulations.

