Quantitative structure-activity relationship (QSAR) models require the 3D conformation of a molecule to describe the spatial relationship of atomic properties, such as hydrogen bond donors, that are potentially important for binding to the target protein. We expect QSAR models to produce the best prediction when given the conformation of a molecule that interacts with the active site of the protein. However, protein structure prediction and determination of the ligand-protein interface remains computationally prohibitive for virtual high throughput screening. We are investigating the use of conformational ensembles to describe individual small molecules in training and predicting with QSAR models. To generate a conformational ensemble, we first build a library of common scaffolds, and conformers of those scaffolds, for which the 3D structure is known. Common scaffolds are discovered by finding the greatest common substructure between all pairs of molecules in the Cambridge Structural Database (CSD). Scaffolds deemed to have excessive degrees of freedom relative to the number of times they exist in the CSD are eliminated. We then generate arbitrarily large conformational ensembles by repeatedly selecting any conformer from our library, and then setting the geometry of the largest common substructure between the conformer and the target molecule to match the chosen conformer. With the conformational ensemble generated in this manner, we will train QSAR models and compare the performance to using the lowest-energy conformations found using commercial software packages such as CORINA.
Alumni Project Members: Jeff Mendenhall, Sandeep Kothiwale, Edward W. Lowe Jr
