Validating clustering techniques applied to simulations of protein folding

While such clustering analyses may be acceptable for qualitatively visualizing MD trajectories, their use to study the number of structural transitions present in the trajectories and perform free energy calculations such as in, may lead to serious artifacts. Furthermore, partitions generated by clustering are generally validated by visual inspection of the structures returned as cluster centers. Since little is known about protein dynamics en-route to folding, visual inspection may not be a reliable way of validating clustering techniques applied to MD simulations of protein folding. Various rigorous cluster validation methods, which take into account inter-cluster relationships have been developed in the field of bioinformatics. It can nevertheless be quite difficult to choose the necessary and sufficient set of validation techniques for MD trajectories without prior knowledge of the structural processes underlying folding. An additional goal of MD simulations of folding processes is to find collective coordinates. Clustering does not yield itself to such analysis. There is clearly a need to go beyond clustering to analyze MD folding trajectories. In this paper, we report application of data reduction methods to analyze villin headpiece folding trajectories. Our methods can be used for reducing any large MD trajectory to obtain salient features. The most widely used technique to obtain collective Glutathione coordinates from folding trajectories and experiments is principal component analysis. However, apart from having other well known drawbacks, PCA is unable to achieve sufficient data compression when the data are nonlinearly correlated. Our trajectories reside in a high dimensional space as every snapshot has information about all atomic coordinates. However, not all coordinates are important to folding; many coordinates are likely to be nonlinearly Daidzein correlated and, thus, if viewed in the correct coordinate space, the folding trajectories might lie in some lower dimensional space. The extraction of a correct reduced basis has been the goal of a variety of dimensional reduction methods.

Leave a Reply