Researchers Kazi Kabir, Nasrin Akhter, and Amarda Shehu study the structural dynamics that come from repeated Molecular Dynamics simulations.
by Yash Arora
Currently, there are few ways to see the different structures that molecules such as peptides or proteins assume. With the use of Molecular Dynamics (MD) simulations, it is easier to understand how these molecules form tertiary structures, but many rounds of MD simulations are required to get an accurate representation of the different structures. A new study by Kazi Kabir, Nasrin Akhter, and Amarda Shehu, researchers at George Mason University, allows one to get the structural dynamics that come from these repeated MD simulations by finding the relation between stable and semi-stable structures of protein molecules and by looking at the energy landscape, or distribution of energy levels, of the molecule.
Before anything is done, Markov state models (MSM) are used to integrate the different MD simulations. A Markov model is a type of model that uses as input only the current state, and not any states before it. In this case, the MSM uses as input the different structures of the molecule that occurred at different timesteps, and the amount of time in between each structure state. To map a structure to a usable form (one that can be used in programming), the Cartesian coordinates of the structure are taken. Other ways that the structure is often taken include using the angles that they are oriented at in different states.
To complete the MSM, different linear transformation techniques are used, the different molecular structures are grouped or clustered. The clustering of different structures is known as space state discretization. The research group uses the aforementioned energy landscape to group the different structures. Finally, the MSM is tested to see if it is viable to use.
The group used a Python library called PyEMMA to build the MSM and evaluate different methods of space state discretization. The first method, or the Community Detection Method, was to group different structures based on how close the different structures were to each other. This was done by creating a graph known as a a nearest-neighbor graph (nngraph). It measured the distance between different structures as they moved in the landscape. The distance was measured using root-mean-squared-deviation (RMSD), and two structures had an edge between them in the graph if their distance was below a certain threshold.
Another method the group used was to use the energy landscape of different structures. Certain structures (namely stable and semi-stable structures) created basins in the landscape that could be used to cluster structures. Once again, the group used nngraphs, but this time based them on energy landscapes and not by using RMSD. Using this nngraph, the group built MSMs for testing.
By using their methods of space state discretization and comparing to the k-means clustering method that is pre-built into PyEmma, the group was able to test their findings. Both models beat the k-means clustering model, and the basin model even did it in only 700 steps. Thus, the group concludes that the basin model seems to be better than the built-in clustering methods in PyEMMA to use to look at the structural dynamics of biological molecules.