50. Protein folding under confinement: a role for solvent
D. Lucent, V. Vishal, V. S. Pande. Proceedings of the National Academy of Sciences (2007)
LOCAL COPY
SUMMARY: When proteins fold inside a cell, they are frequently subjected to various amounts of spatial confinement. Specifically, misfolded or unfolded proteins can be encapsulated inside a helper molecule called a chaperonin. These chaperonins are involved with helping proteins fold inside cells. Here we investigate how confinement affects protein folding using a simple model: a fast folding mini-protein confined to a nanopore. We find that if we confine the protein, but allow the surrounding water molecules to pass freely in and out of the nanopore, the protein is more likely to reach the folded state. On the other hand, if we make the nanopore water-tight, we find that the protein is less likely to fold. Specifically it is pushed into a small non-native globule. This suggests that when thinking of folding inside a confined space (like a chaperonin) it is important to remember both protein and water are confined, and this confined water can have an affect on protein folding.
ABSTRACT: Although most experimental and theoretical studies of protein folding involve proteins in vitro, the effects of spatial confinement may complicate protein folding in vivo. In this study, we examine the folding dynamics of villin (a small fast folding protein) with explicit solvent confined to an inert nanopore. We have calculated the probability of folding before unfolding (P fold) under various confinement regimes. Using P fold correlation techniques, we observed two competing effects. Confining protein alone promotes folding by destabilizing the unfolded state. In contrast, confining both protein and solvent gives rise to a solvent-mediated effect that destabilizes the native state. When both protein and solvent are confined we see unfolding to a compact unfolded state different from the unfolded state seen in bulk. Thus, we demonstrate that the confinement of solvent has a significant impact on protein kinetics and thermodynamics. We conclude with a discussion of the implications of these results for folding in confined environments such as the chaperonin cavity in vivo.
49. Automatic State Decomposition Algorithm.
J. Chodera, N. Singhal, V. S. Pande, K. Dill, and W. Swope. Journal of Chemical Physics, (2007)
SUMMARY: In order to break up calculations to run on Folding@Home and then repiece them together in order to act like a single, very, very, very fast computer, we need special algorithms. We are constantly trying to improve our methods in these directions and this paper represents our latest state of the art in this direction.
ABSTRACT: To meet the challenge of modeling the conformational dynamics of biological macromolecules over long timescales, much recent effort has been devoted to constructing stochastic kinetic models, often in the form of discrete-state Markov models, from short molecular dynamics simulations. To construct useful models that faithfully represent dynamics at the timescales of interest, it is necessary to decompose configuration space into a set of kinetically metastable states. Previous attempts to define these states have relied upon either prior knowledge of the slow degrees of freedom or on the application of conformational clustering techniques which assume that conformationally distinct clusters are also kinetically distinct. Here, we present a first version of an automatic algorithm for the discovery of kinetically metastable states that is generally applicable to solvated macromolecules. Given molecular dynamics trajectories initiated from a well-defined starting distribution, the algorithm discovers long-lived, kinetically metastable states through successive iterations of partitioning and aggregating conformation space into kinetically related regions. We apply this method to three peptides in explicit solvent—terminally blocked alanine, the engineered 12-residue beta-hairpin trpzip2, and the 21-residue helical Fs peptide — to assess its ability to generate physically meaningful states and faithful kinetic models.
PREPRINT
48. Storage@home: Petascale Distributed Storage.
A. Beberg and V. S. Pande. IPDPS(2007)
SUMMARY: This work will be presented at the IPDS/PCGRID meeting at the end of March.
47. Predicting structure and dynamics of loosely-ordered protein complexes: influenza hemagglutinin fusion peptide.
P. Kasson and V. S. Pande. PSB, (2006)
SUMMARY: We have been applying Folding@Home to study the nature of key proteins involved in how flu (the influenza virus) gains access into host cells. This paper refelcts our first work in this direction.
46. A Bayesian Update Method for Adaptive Weighted Sampling.
S. Park and V. S. Pande. Physical Review E (2006)
SUMMARY: We've developed a new way to do a particular type of calculation (for protien thermodynamics) on Folding@Home. This paper lays out how this works and gives some demonstrations.
45. Local structure formation in simulations of two small proteins.
Guha Jayachandran, V. Vishal, Angel E. Garcıa and V. S. Pande. Journal of Structural Biology, (2006)
ABSTRACT: Massively parallel all-atom, explicit solvent molecular dynamics simulations were used to explore the formation and existence of local structure in two small alpha-helical proteins, the villin headpiece and the helical fragment B of protein A. We report on the existence of transient helices and combinations of helices in the unfolded ensemble, and on the order of formation of helices, which appears to largely agree with previous experimental results. Transient local structure is observed even in the absence of overall native structure. We also calculate sets of residue-residue pairs that are statistically predictive of the formation of given local structures in our simulations.
Local PDF Copy
44. Kinetic Definition of Protein Folding Transition State Ensembles and Reaction Coordinates.
C. Snow and V. S. Pande. Biophysical Journal, (2006)ABSTRACT: Using distributed molecular dynamics simulations we located 4 distinct folding transitions for a 39 residue beta-beta-alpha-beta protein fold. We introduce and sequently determine the transmission probability, Ptrans, of 500 conformations along each free energy barrier at room temperature, and determined which conformations were transition state ensemble members (Ptrans ≈ 0.5). We ran similar simulations at 82ºC, determined the change in Ptrans with temperature for all 2,000 conformations, and observed Hammond behavior directly using Ptrans correlation. The polymer temperature increase only slightly perturbed the transition probabilities. We propose that diffusion along Ptrans may provide the configurational diffusion rate at the top of the barrier. Specifically, given a transition state conformation x0 with estimated Ptrans = 0.5, we selected a large set of subsequent conformations from independent trajectories, each exactly a small time δt after x0 (250ps). Then we calculated Ptrans for each of the new trial conformations. The P(Ptrans|δt=250ps) distribution reflects diffusion along an ideal kinetic reaction coordinate. This approach provides a novel perspective on the nature of a protein folding transition, and provides a framework for quantitative study of activated relaxation kinetics.
43. Parallelized Over Parts Computation of Absolute Binding Free Energy with Docking and Molecular Dynamics.
Guha Jayachandran, M. R. Shirts, S. Park, and V. S. Pande. Journal of Chemical Physics, (2006)
ABSTRACT: We present a technique for biomolecular free energy calculations that exploits highly parallelized sampling to significantly reduce the time to results. The technique combines free energies for multiple, nonoverlapping configurational macrostates and is naturally suited to distributed computing. We describe a methodology that uses this technique with docking, molecular dynamics, and free energy perturbation to compute absolute free energies of binding quickly compared to previous methods. The method does not require a priori knowledge of the binding pose as long as the docking technique used can generate reasonable binding modes. We demonstrate the method on the protein FKBP12 and eight of its inhibitors.
42.
Folding Simulations of the Villin Headpiece in All-Atom Detail.
Guha Jayachandran, V. Vishal, and V. S. Pande. Journal of Chemical Physics (2006)
SUMMARY: We have developed a new method which greatly extends Folding@Home's ability to simulate long timescales. This new method (MSM) will be applied to essentially all new Folding@Home projects. This paper demonstrates MSM's applied to a challenging target -- the villin headpiece.
ABSTRACT: We report on the use of large-scale distributed computing simulation and novel analysis techniques for examining the dynamics of a small protein. Matters addressed include folding rate, very long timescale kinetics, ensemble properties, and interaction with water. The target system for the study, the villin headpiece, has been of great interest to experimentalists and theorists both. Sampling totaled nearly 500 µs—the most extensive published to date for a system of villin’s size in explicit solvent with all atom detail—and was in the form of tens of thousands of independent molecular dynamics trajectories, each several tens of nanoseconds in length. We report on kinetics sensitivity analyses that, using a set of short simulations, probed the role of water in villin’s folding and sensitivity to the simulation’s electrostatics treatment. By constructing Markovian state models from the collected data, we were able to propagate dynamics to times far beyond those directly simulated and to rapidly compute mean first passage times, long time kinetics (tens of microseconds), and evolution of ensemble property distributions over long times, otherwise currently impossible. We also tested our MSM by using it to predict the structure of villin de novo.
41. Ensemble molecular dynamics yields submillisecond kinetics and intermediates of membrane fusion
P. Kasson, N. Kelley, N. Singhal, M. Vrjlic, A. Brunger, and V. S. Pande. Proceedings of the National Academy of Sciences, USA
SUMMARY: These first results describe work we've been doing to study membrane fusion, the process by which two lipid membranes become one. This process is critical to proper functioning of the cell and also phenomena such as neurotransmission and infection by many viruses. We are seeking to understand how membrane fusion works so that we can eventually manipulate it. We hope such an understanding will lead to the development of new and more effective drugs to combat viral infection and treat neurologic diseases.
ABSTRACT: Lipid membrane fusion is critical to cellular transport and signaling processes such as constitutive secretion, neurotransmitter release, and infection by enveloped viruses. Here, we introduce a powerful computational methodology for simulating membrane fusion from a starting configuration designed to approximate activated prefusion assemblies from neuronal and viral fusion, producing results on a time scale and degree of mechanistic detail not previously possible to our knowledge. We use an approach to the long time scale simulation of fusion by constructing a Markovian state model with large-scale distributed computing, yielding an understanding of fusion mechanisms on time scales previously impossible to simulate to our knowledge. Our simulation data suggest a branched pathway for fusion, in which a common stalk-like intermediate can either rapidly form a fusion pore or remain in a metastable hemifused state that slowly forms fully fused vesicles. This branched reaction pathway provides a mechanistic explanation both for the biphasic fusion kinetics and the stable hemifused intermediates previously observed experimentally. Our distributed computing and Markovian state model approaches provide sufficient sampling to detect rare transitions, a systematic process for analyzing reaction pathways, and the ability to develop quantitative approximations of reaction kinetics for fusion.
40.
Electric Fields at the Active Site of an Enzyme: Direct Comparison of Experiment with Theory.
Ian T. Suydam, Christopher D. Snow, Vijay S. Pande, Steven G. Boxer.
Science (2006)
SUMMARY: The ability to quantitatively predict electric fields in proteins has remained a great challenge. In this paper, we combine new experimental methods with new theoretical methods made possible by Folding@Home distributed computing to greatly push the boundary of what one could previously predict. In particular, we see that a single structure is insufficient to make accurate predictions, suggesting that the ensemble approaches inherent to Folding@Home may be important in predicting electrostatics in proteins.
ABSTRACT: The electric fields produced in folded proteins influence nearly every aspect of protein function. We present a vibrational spectroscopy technique that measures changes in electric field at a specific site of a protein as shifts in frequency (Stark shifts) of a calibrated nitrile vibration. A nitrile-containing inhibitor is used to deliver a unique probe vibration to the active site of human aldose reductase, and the response of the nitrile stretch frequency is measured for a series of mutations in the enzyme active site. These shifts yield quantitative information on electric fields that can be directly compared with electrostatics calculations. We show that extensive molecular dynamics simulations and ensemble averaging are required to reproduce the observed changes in field.
FULL TEXT
39.
A novel approach for computational alanine scanning: application to the p53 oligomerization domain.
L.T. Chong, W. C. Swope, J. W. Pitera, and V. S. Pande.
Journal of Molecular Biology (2006)
SUMMARY: Roughly half of all known cancers involve a mutation in a single protein: p53. P53 serves to protect us from getting cancer; when p53 fails, one often gets cancer. We have developed a new method for predicting how mutations in p53, a protein central to cancer, would impact p53. This new method is naturally suited for distributed computing and can predict several mutations found to date.
ABSTRACT: We have developed a novel computational alanine scanning approach that involves analysis of ensemble unfolding kinetics at high temperature to identify residues that are critical for the stability of a given protein. This approach has been applied to dimerization of the oligomerization domain (residues 326–355) of tumor suppressor p53. As validated by experimental results, our approach has reasonable success in identifying deleterious mutations, including mutations that have been linked to cancer. We discuss a method for determining the effect of mutations on the location of the dimerization transition state.
38. Validation of Markov state models using Shannon's entropy.
S. Park and V. S. Pande.
Journal of Chemical Physics (2006)
SUMMARY: Markov State Models (MSM's) have become a major part of how Folding@Home calculations are performed. In particular, the MSM technique is at the heart of how one can divide complex calculations like protein folding or lipid vesicle dynamics on 10,000 to 100,000 CPU's -- i.e. how distributed computing can tackle complex problems. This paper presents a new way to test the validity of MSM's generated to make sure that the models are suitable and self-consistent. ABSTRACT: Markov state models are kinetic models built from the dynamics of molecular simulation trajectories by grouping similar configurations into states and examining the transition probabilities between states. Here we present a procedure for validating the underlying Markov assumption in Markov state models based on information theory using Shannon's entropy. This entropy method is applied to a simple system and is compared with the previous eigenvalue method. The entropy method also provides a way to identify states that are least Markovian, which can then be divided into finer states to improve the model.
37. On the role of chemical detail in simulating protein
folding kinetics.
Young Min Rhee and Vijay S. Pande.
Chemical Physics (2006)
SUMMARY: How important are local chemical
features of proteins during the folding process? We assess protein folding models with varying degrees of chemical
detail to gain an understanding of how they perform relative to some of today's most sophisticated models.
ABSTRACT: Is an all-atom representation for protein and solvent necessary for simulating protein folding kinetics or can simpler models reproduce
the results of more complex models? This question is relevant not just for simulation methodology, but also for the general understanding
of the chemical details relevant for protein dynamics. With recent advances in computational methodology, it is now possible to simulate
the folding kinetics of small proteins in all-atom detail. Therefore, with both detailed and simplified models of folding in hand, the outstanding
questions are what the differences in these models are for the description of protein folding dynamics, and how we can quantitatively
compare the folding mechanisms found in the models. To address the outstanding problem of how to determine the differences
between folding mechanism in a sensitive and quantitative manner, we suggest a new method to quantify the non-linear correlation in
folding commitment probability (Pfold) values. We use this method to probe the differences between a wide range of models for folding
simulations, ranging from coarse grained Go models to all-atom models with implicit or explicit solvation. While the differences between
less-detailed models (Go and implicit solvation models) and explicit solvation models are large, the differences within various explicit
solvation models appear to be small, suggesting that the discrete nature of water may play a role in folding kinetics.
36. Nanotube confinement denatures protein helices.
Eric J. Sorin and Vijay S. Pande.
JACS (2006)
ABSTRACT: In striking contrast to simple polymer physics theory, which does not account for solvent effects, we find that physical confinement of solvated biopolymers decreases solvent entropy, which in turn leads to a reduction in the organized structural content of the polymer. Since our theory is based on a fundamental property of water-protein statistical mechanics, we expect it to have broad implications in many biological and material science contexts.
35. The solvation interface is a determining
factor in peptide conformational preferences.
Eric J. Sorin, Young Min Rhee,
Michael R. Shirts, and Vijay S. Pande.
Journal of Molecular Biology (2006)
SUMMARY: How complicated is a helix, and
how is the complexity of helical structure affected by the solvent? Here we show, through a novel "computational
hydrophobic titration" experiment, that many features of helices can be rationalized and/or explained by considering
the interactions along the peptide-solvent interface. TECHNICAL ABSTRACT: The 21-residue polyalanine-based Fs peptide was studied using thousands of
long, explicit solvent, atomistic molecular dynamics simulations which reached equilibrium
at the ensemble level. Peptide conformational preference as a function of hydrophobicity
was examined using a spectrum of explicit solvent models, and the peptide length dependence
of the hydrophilic and hydrophobic components of solvent-accessible surface
area for several ideal conformational types was also considered. Our results demonstrate
how the character of the solvation interface induces several conformational preferences,
including a decrease in mean helical content with increased hydrophilicity, which occurs
predominantly through reduced nucleation tendency and, to a lesser extent, destabilization
of helical propagation. Interestingly, an opposing effect occurs through increased
propensity for 310-helix conformations, as well as increased polyproline structure. Our
observations provide a framework for understanding previous reports of conformational
preferences in polyalanine-based peptides including (i) terminal 310-helix prominence, (ii)
low p-helix propensity, (iii) increased polyproline conformations in short and unfolded
peptides, and (iv) membrane helix stability in the presence and absence of water. These
observations lend physical insight into the role of water in peptide conformational
equilibria at the atomic level, and expand our view of the complexity of even the most "simple" of biopolymers. Whereas previous studies have focused predominantly on
hydrophobic effects with respect to tertiary structure, this report highlights the need for
consideration of such effects on the secondary structural level.
34. Can conformational
change be described by only a few
normal modes?
Paula Petrone and Vijay S. Pande.
Biophysical Journal (2005)
SUMMARY: In allosteric regulation, protein activity is altered when ligand binding (or unbinding) causes changes in the protein conformation. Little is known about which aspects of the protein architecture are responsible for allosteric regulation, however most of these changes involve collective displacements of atoms (domain and hinge-bending motions) which are likely to occur in the microsecond timescale.
Normal mode analysis (NMA) decouples the complex motions and fluctuations of proteins into a linear combination of orthogonal basis vectors, each representing an independent concerted harmonic motion with a characteristic frequency. In principle, it would be a natural basis in which to represent conformational change that involves collective motions of atoms. This paper addresses the limitations of NMA, namely how many normal modes are necessary to achieve a certain degree of accuracy in the representation.
TECHNICAL ABSTRACT: We suggest a simple method to assess how many normal modes are needed to map a conformational change. By projecting the conformational change onto a subspace of the normal mode vectors and, using RMSD as a test of accuracy, we find that the first 20 modes only contribute 50% or less of the total conformational change in four test cases (myosin, calmodulin, NtrC, and hemoglobin). In some allosteric systems, like the molecular switch NtrC, the conformational change is localized to a limited number of residues. We find that many more modes are necessary to accurately map this collective displacement. In addition, the normal mode “spectra” can provide useful information about the details of the conformational change, especially when comparing structures with different bound ligands, in this case, calmodulin. Indeed, this approach presents normal mode analysis as a useful basis in which to capture the mechanism of conformational change, and shows that the number of normal modes needed to capture the essential collective motions of atoms should be chosen according to the required accuracy.
33. How large is alpha-helix in solution? Studies of
the
radii of gyration of helical peptides by SAXS
and MD.
Bojan Zagrovic, Guha Jayachandran,
Ian S. Millett, Sebastian Doniach and Vijay S. Pande.
Journal of Molecular Biology (2005)
SUMMARY: Direct comparisons are made
between Folding@Home simulations and experimental measurements (SAXS) to determine molecular size of helical peptides
of varying length, revealing the compact nature of such helical peptides.
TECHNICAL ABSTRACT: Using synchrotron radiation and the small-angle X-ray scattering
technique we have measured the radii of gyration of a series of alaninebased
a-helix-forming peptides of the composition Ace-(AAKAA)n-GYNH2,
nZ2–7, in aqueous solvent at 10C. In contrast to other
techniques typically used to study a-helices in isolation (such as nuclear
magnetic resonance and circular dichroism), small-angle X-ray scattering
reports on the global structure of a molecule and, as such, provides
complementary information to these other, more sequence-local measuring
techniques. The radii of gyration that we measure are, except for the
12-mer, lower than the radii of gyration of ideal a-helices or helices with
frayed ends of the equivalent sequence-length. For example, the measured
radius of gyration of the 37-mer is 14.2 A , which is to be compared
with the radius of gyration of an ideal 37-mer a-helix of 17.6 A . Attempts
are made to analyze the origin of this discrepancy in terms of the analytical Zimm–Bragg–Nagai (ZBN) theory, as well as distributed computing
explicit solvent molecular dynamics simulations using two variants of
the AMBER force-field. The ZBN theory, which treats helices as cylinders
connected by random walk segments, predicts markedly larger radii of
gyration than those measured. This is true even when the persistence
length of the random walk parts is taken to be extremely short (about one residue). Similarly, the molecular dynamics simulations, at the level of
sampling available to us, give inaccurate values of the radii of gyration of
the molecules (by overestimating them by around 25% for longer peptides)
and/or their helical content. We conclude that even at the short sequences
examined here (%37 amino acid residues), these a-helical peptides behave
as fluctuating semi-broken rods rather than straight cylinders with frayed
ends.
32. Error Analysis in Markovian State Models for protein
folding.
Nina Singhal and Vijay S. Pande.
Journal of Chemical Physics (2005)
SUMMARY: We validate the new Markovian State Model (MSM) for describing protein dynamics, and show how to efficiently calculate how accurate these models are. We also describe how to start new FAH simulations to best improve the accuracy of the model.
TECHNICAL ABSTRACT: In previous work, we described a Markovian state model(MSM) for analyzing molecular-dynamics trajectories, which involved grouping conformations into states and estimating the transition probabilities between states. In this paper, we analyze the errors in this model caused by finite sampling. We give different methods with various approximations to determine the precision of the reported mean first passage times. These approximations are validated on an 87 state toy Markovian system. In addition, we propose an efficient and practical sampling algorithm that uses these error calculations to build a MSM that has the same precision in mean first passage time values but requires an order of magnitude fewer samples. We also show how these methods can be scaled to large systems using sparse matrix methods.
31. Direct calculation of the binding free energies of FKBP ligands
using
the Fujitsu BioServer massively parallel computer.
Hideaki Fujutani, Yoshiaki Tanida,
Masakatsu Ito, Guha Jayachandran, Christopher D. Snow, Michael R. Shirts, Eric J. Sorin, and Vijay S.
Pande
Journal of Chemical Physics (2005)
SUMMARY: Drug design calculations
are generally very difficult. Here we show that calculations made previously on the Folding@Home network are
possible on a much smaller supercomputer system without loss of numerical precision.
TECHNICAL ABSTRACT: Direct calculations of the absolute binding free energies for eight FKBP ligands were performed
using the Fujitsu BioServer massively parallel computer. Using latest version of the general AMBER
force field (GAFF) for ligand model parameters and the Bennett acceptance ratio for computing
free energy differences, we obtained an excellent linear fit between the calculated and experimental
binding free energies. The RMS error from a linear fit is 0.4 kcal/mol for eight ligand
complexes. In comparison with a previous study of the binding energies of these same eight ligand complexes, these results suggest that the use of improved model parameters can lead to more predictive binding estimates, and that these estimates can be obtained with significantly less computer time than previously thought. These findings make such direct methods more attractive for use in rational drug design.
30. A New Set of Molecular Mechanics Parameters for
Hydroxyproline and
Its Use in Molecular Dynamics Simulations of Collagen-Like Peptides.
Sanghyun Park, Randall J. Radmer, Teri E. Klein,
and
Vijay S. Pande.
Journal of Computational Chemistry (2005)
SUMMARY: Simulation of the collagen triple helix has
been given less attention that more common protein "folds." Here we present newly derived parameters for such
simulations to gain better agreement with experimental data, and thereby offering insight into the stability of the
triple
helix structure.
TECHNICAL ABSTRACT:
Recently, the importance of proline ring pucker conformations in collagen
has been suggested in the context of hydroxylation of prolines. The previous molecular mechanics parameters for hydroxyproline, however, do not
reproduce the correct pucker preference. We have developed a new set of parameters that reproduces the correct pucker
preference. Our molecular dynamics simulations of proline and hydroxyproline monomers as well as collagen-like
peptides, using the new parameters, support the theory that the role of hydroxylation in collagen is to stabilize the
triple helix by adjusting to the right pucker conformation (and thus the right f angle)
in the Y position.
29. Comparison of efficiency and bias of free energies computed by
exponential averaging, the Bennett acceptance ratio, and thermodynamic integration.
Michael R. Shirts & Vijay S. Pande.
Journal of Chemical Physics (2005)
SUMMARY: We test new methods for free energy calculations -- relevant for our computational drug design methodology. We find that the BAR method we previously investigated is significantly better than methods commonly employed. We have already gotten a lot of positive feedback about this work from others in the field, as they have been starting to use the results of this work to improve their calculations as well.
TECHNICAL ABSTRACT: Recent work has demonstrated the Bennett acceptance ratio method is the best asymptotically
unbiased method for determining the equilibrium free energy between two end states given work
distributions collected from either equilibrium and non-equilibrium data. However, it is still not clear
what the practical advantage of this acceptance ratio method is over other common methods in
atomistic simulations. In this study, we first review theoretical estimates of the bias and variance of
exponential averaging (EXP), thermodynamic integration (TI), and the Bennett acceptance ratios (BAR). In the process, we present a new simple scheme for computing the variance and bias of
many estimators, and demonstrate the connections between BAR and the weighted histogram
analysis method. Next, a series of analytically solvable toy problems is examined to shed more light
on the relative performance in terms of the bias and efficiency of these three methods. Interestingly,
it is impossible to conclusively identify a “best” method for calculating the free energy, as each of
the three methods performs more efficiently than the others in at least one situation examined in
these toy problems. Finally, sample problems of the insertion/deletion of both a Lennard-Jones
particle and a much larger molecule in TIP3P water are examined by these three methods. In all tests
of atomistic systems, free energies obtained with BAR have significantly lower bias and smaller
variance than when using EXP or TI, especially when the overlap in phase space between end states
is small. For example, BAR can extract as much information from multiple fast,
far-from-equilibrium simulations as from fewer simulations near equilibrium, which EXP cannot.
Although TI and sometimes even EXP can be somewhat more efficient in idealized toy problems,
in the realistic atomistic situations tested in this paper, BAR is significantly more efficient than all other methods.
28. Solvation free energies of amino acid side chain analogs for
common molecular mechanics water
models.
Michael R. Shirts & Vijay S. Pande.
Journal of Chemical Physics (2005)
SUMMARY: This paper is a test of our methods for free energy calculation -- critical to our computational drug design methodology. We achieve a higher level of accuracy and precision than before. Moreover, our recent research in computational efficiency of free energy methods allows us to
perform simulations on a local cluster that previously required large scale distributed computing,
performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.
TECHNICAL ABSTRACT: Quantitative free energy computation involves both using a model that is sufficiently faithful to the experimental system under study (accuracy) and establishing statistically meaningful measures of the uncertainties resulting from finite sampling (precision). In order to examine the accuracy of a range of common water models used for protein simulation for their solute/solvent properties, we calculate the free energy of hydration of 15 amino acid side chain analogs derived from the
OPLS-AA parameter set with the TIP3P, TIP4P, SPC, SPC/E, TIP3P-MOD, and TIP4P-Ew water models. We achieve a high degree of statistical precision in our simulations, obtaining uncertainties for the free energy of hydration of 0.02–0.06 kcal/mol, equivalent to that obtained in experimental hydration free energy measurements of the same molecules. We find that TIP3P-MOD, a model
designed to give improved free energy of hydration for methane, gives uniformly the closest match
to experiment; we also find that the ability to accurately model pure water properties does not necessarily predict ability to predict solute/solvent behavior. We also evaluate the free energies of a
number of novel modifications of TIP3P designed as a proof of concept that it is possible to obtain
much better solute/solvent free energetic behavior without substantially negatively affecting pure
water properties. We decrease the average error to zero while reducing the rms error below that of
any of the published water models, with measured liquid water properties remaining almost constant
with respect to our perturbations. This demonstrates there is still both room for improvement within
current fixed-charge biomolecular force fields and significant parameter flexibility to make these
improvements. Recent research in computational efficiency of free energy methods allows us to
perform simulations on a local cluster that previously required large scale distributed computing,
performing four times as much computational work in approximately a tenth of the computer time as a similar study a year ago.
27. Foldamer dynamics expressed
via Markov state models. I. Explicit solvent molecular-dynamics simulations in acetonitrile,
chloroform, methanol, and water.
Sidney Elmer, Sanghyun Park, & Vijay S. Pande.
Journal of Chemical Physics (2005)
SUMMARY: Here, we lay out some of the first applications of a new method for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.
TECHNICAL ABSTRACT: In this article, we analyze the folding dynamics of an all-atom model of a polyphenylacetylene (pPA) 12-mer in explicit solvent for four common organic and aqueous solvents: acetonitrile,chloroform, methanol, and water. The solvent quality has a dramatic effect on the time scales in which pPA 12-mers fold. Acetonitrile was found to manifest ideal folding conditions as suggested by optimal folding times on the order of ~100–200 ns, depending on temperature. In contrast,
chloroform and water were observed to hinder the folding of the pPA 12-mer due to extreme solvation conditions relative to acetonitrile; chloroform denatures the oligomer, whereas water promotes aggregation and traps. The pPA 12-mer in a pure methanol solution folded in ~400 ns at 300 K, compared relative to the experimental 12-mer folding time of ~160 ns measured in a 1:1 v/v THF/methanol solution. Requisite in drawing the aforementioned conclusions, analysis techniques based on Markov state models are applied to multiple short independent trajectories to extrapolate the long-time scale dynamics of the 12-mer in each respective solvent. We review the theory of
Markov chains and derive a method to impose detailed balance on a transition probability matrix computed from simulation data.
26. Foldamer dynamics
expressed via Markov state models. II. State
space decomposition
Sidney Elmer, Sanghyun Park, & Vijay S. Pande.
Journal of Chemical Physics (2005)
SUMMARY: Here, we lay out some new methodology for simulation for future FAH calculations. This new method, Markovian State Models (MSM), allows FAH to solve some important limitations of previous methods. Since these limitations are most relevant for larger and more complex systems than what has been done in FAH so far, this does not affect the work in the past. However, it lays the foundation for FAH to tackle even more complex and challenging problems.
TECHNICAL ABSTRACT: The structural landscape of poly-phenylacetylene (pPA), otherwise known as m-phenylene ethynylene oligomers, has been shown to consist of a very diverse set of conformations, including helices, turns, and knots. Defining a state space decomposition to classify these conformations into easily identifiable states is an important step in understanding the dynamics in relation to Markov state models. We define the state decomposition of pPA oligomers in terms of the sequence of
discretized dihedral angles between adjacent phenyl rings along the oligomer backbone. Furthermore, we derive in mathematical detail an approach to further reduce the number of states by grouping symmetrically equivalent states into a single parent state. A more challenging problem requires a formal definition for knotted states in the structural landscape. Assuming that the oligomer chain can only cross the ideal helix path once, we propose a technique to define a knotted state derived from a helical state determined by the position along the helical nucleus where the chain crosses the ideal helix path. Several examples of helical states and knotted states from the pPA 12-mer illustrate the principles outlined in this article.
25. Unusual compactness of a polyproline type II structure.
Bojan Zagrovic, Jan Lipfert, Eric J. Sorin, Ian S. Millett, Wilfred F. van Gunsteren,
Sebastian Doniach & Vijay S. Pande.
Proceedings of the National Academy of Sciences (2005)
SUMMARY:
This study probes the structural character of a small peptide using experiment and
simulation. It highlights the differences between global and local structural information,
suggesting a new model for PPII conformational character, which is thought to be dominant in the unfolded state of
proteins.
TECHNICAL ABSTRACT:
Polyproline type II (PPII) helix has emerged recently as the dominant
paradigm for describing the conformation of unfolded
polypeptides. However, most experimental observables used to
characterize unfolded proteins typically provide only short-range,
sequence-local structural information that is both time- and ensemble-
averaged, giving limited detail about the long-range structure
of the chain. Here, we report a study of a long-range property:
the radius of gyration of an alanine-based peptide, Ace-(diaminobutyric
acid)2-(Ala)7-(ornithine)2-NH2. This molecule has
previously been studied as a model for the unfolded state of proteins
under folding conditions and is believed to adopt a PPII fold based
on short-range techniques such as NMR and CD. By using synchrotron
radiation and small-angle x-ray scattering, we have determined
the radius of gyration of this peptide to be 7.4(+/-0.5) Å, which
is significantly less than the value expected from an ideal PPII helix
in solution (13.1 Å). To further study this contradiction, we have
used molecular dynamics simulations using six variants of the
AMBER force field and the GROMOS 53A6 force field. However, in
all cases, the simulated ensembles underestimate the PPII content
while overestimating the experimental radius of gyration. The
conformational model that we propose, based on our small angle
x-ray scattering results and what is known about this molecule
from before, is that of a very flexible, fluctuating structure that on
the level of individual residues explores a wide basin around the
ideal PPII geometry but is never, or only rarely, in the ideal
extended PPII helical conformation.
24. How well can simulation predict protein folding kinetics and thermodynamics?
Christopher D. Snow, Eric J. Sorin, Young Min Rhee, and Vijay S. Pande.
Annual Review of Biophysics & Biomolecular Structure (2005)
SUMMARY:
Rather than reporting new data from the Folding@Home project, this review article offers an in-depth
look at the current state-of-the-art in simulation-based prediction. This includes work by our group
and others in the field, including many computational models and methods of extracting information
that can be directly compared to experiment.
TECHNICAL ABSTRACT:
Simulation of protein folding has come a long way in five years. Notably,
new quantitative comparisons with experiments for small, rapidly folding proteins
have become possible. As the only way to validate simulation methodology, this
achievement marks a significant advance. Here, we detail these recent achievements
and ask whether simulations have indeed rendered quantitative predictions in several
areas, including protein folding kinetics, thermodynamics, and physics-based methods
for structure prediction. We conclude by looking to the future of such comparisons
between simulations and experiments.
23. Empirical Force-Field Assessment: The
Interplay Between Backbone Torsions and Noncovalent Term Scaling
Eric J. Sorin and Vijay S. Pande. Journal of Computational Chemistry
(2005)
SUMMARY:
How do the results of peptide simulations change with slight variations to the models employed?
Here we answer this question with respect to very local changes in the energetics of the polymer,
demonstrating the sensitivity of simulated bulk (i.e. ensemble averaged) structural equilibrium on the
parameters of the model.
TECHNICAL ABSTRACT:
The kinetic and thermodynamic aspects of the helix-coil transition in polyalanine-based peptides have
been studied at the ensemble level using a distributed computing network. This study builds on a previous
report, which critically assessed the performance of several contemporary force fields in reproducing
experimental measurements and elucidated the complex nature of helix-coil systems. Here we consider the effects of modifying
backbone torsions and the scaling of noncovalent interactions. Although these elements determine the potential of mean
force between atoms separated by three covalent bonds (and thus largely determine the local conformational distributions
observed in simulation), we demonstrate that the interplay between these factors is both complex and
force field dependent. We quantitatively assess the heliophilicity of several helix-stabilizing potentials as well as the
changes in heliophilicity resulting from such modifications, which can "make or break" the accuracy
of a given force field, and our findings suggests that future force field development may need to better consider effect that
vary with peptide length. This report also serves as an example of the utility of distributed computing in analyzing and improving upon
contemporary force fields at the level of absolute ensemble equilibrium, the next step in force field
development.
22. Exploring the Helix-Coil
Transition via All-atom Equilibrium Ensemble Simulations
Eric J. Sorin and Vijay S. Pande. Biophysical Journal
(2005)
SUMMARY: How
good are our models for folding? This question is important to address
in order to understand the usefulness of our work, as well as the
work of everyone in the atomistic simulation field in general. Here,
we've done extremely extensive tests of models used in folding to
show their strengths and weaknesses. Based on their weaknesses, we
have proposed a new model which appears to have a much stronger agreement
with experiment.
TECHNICAL ABSTRACT: The
ensemble folding of two 21-residue a-helical peptides
has been studied using all-atom simulations under several variants of the AMBER
potential in explicit solvent using a global distributed computing network. Our extensive
sampling, orders of magnitude greater than the experimental folding time, results
in complete convergence to ensemble equilibrium. This allows for a quantitative
assessment of these potentials, including a new variant of the AMBER-99 force
field, denoted AMBER-99f, which shows improved agreement with experimental
kinetic and thermodynamic measurements. From bulk analysis of the simulated
AMBER-99f equilibrium, we find that the folding landscape is pseudo-two-state,
with complexity arising from the broad, shallow character of the 'native' and
'unfolded' regions of the phase space. Each of these macrostates allows for
configurational diffusion among a diverse ensemble of conformational microstates
with greatly varying helical content and molecular size. Indeed, the observed
structural dynamics are better represented as a conformational diffusion than
as a simple exponential process, and equilibrium transition rates spanning
several orders of magnitude are reported. After multiple nucleation steps,
on average, helix formation proceeds via a kinetic "alignment" phase
in which two or more short, low-entropy helical segments form a more ideal,
single-helix structure.
21.
Does Water Play a Structural Role in the Folding of Small Nucleic
Acids?
Eric J. Sorin, Young Min Rhee, and Vijay S. Pande. Biophysical Journal (2005)
SUMMARY: While
previous studies on the folding of nucleic acid hairpins have employed
simplified models of either the nucleic acid or the solvent,
this paper reports the first such study using an explicit treatment
of the surrounding water and counterions. We show that accounting
for water molecules in this manner is necessary to most accurately
characterize the energetics of hairpin folding, whereas monovalent
ions appear to play only a background role.
TECHNICAL ABSTRACT: Nucleic
acid structure and dynamics are known to be closely coupled to local environmental
conditions and, in particular, to the ionic character of the solvent. Here
we consider what role the discrete properties of water and ions play in the
collapse and folding of small nucleic acids. We study the folding of an experimentally
well-characterized RNA hairpin-loop motif (sequence 5'-GGGC[GCAA]GCCU-3') via
ensemble molecular dynamics simulation and, with nearly 500 µs of aggregate
simulation time using an explicit representation of the ionic solvent, report
successful ensemble folding simulations, with a predicted folding time of 8.8(±2.0) µs,
in agreement with experimental measurements of ~10 µs. Comparing our
results to previous folding simulations using the GB/SA continuum solvent model
shows that accounting for water-mediated interactions is necessary to accurately
characterize the free energy surface and stochastic nature of folding. The
formation of secondary structure appears to be more rapid than the fastest
ionic degrees of freedom, and counterions do not participate discretely in
observed folding events. We find that hydrophobic collapse follows a predominantly
expulsive mechanism in which a diffusion-search of early structural compaction
is followed by final formation of native structure that occurs in tandem with
solvent evacuation.
20. Dimerization
of the p53 Oligomerization Domain: Identification of a Folding
Nucleus by Molecular Dynamics Simulations
Lillian T. Chong, Christopher D. Snow, Young Min Rhee, and
Vijay S. Pande. Journal
of Molecular Biology (2005)
SUMMARY: Roughly
half of all known cancers result from mutations in p53. Our first work
in the cancer area examines the tetramerization domain of p53.
We
predict how p53 folds and in doing so, we can predict which amino
acid mutations would be relevant. When compared with experiments,
our predictions have appeared to agree with experiment and give
a new interpretation to existing data.
TECHNICAL ABSTRACT: Dimerization
of the p53 oligomerization domain involves coupled folding and binding
of monomers. To examine the dimerization, we have
performed molecular dynamics (MD) simulations of dimer folding from
the rate-limiting transition state ensemble (TSE). Among 799 putative
transition state structures that were selected from a large ensemble
of high-temperature unfolding trajectories, 129 were identified as
members of
the TSE via calculation of a 50% transmission coefficient from at least
20
room-temperature simulations. This study is the first to examine the
refolding of a protein dimer using MD simulations in explicit water,
revealing a folding nucleus for dimerization. Our atomistic simulations
are consistent with experiment and offer insight that was previously
unobtainable.
19. Using
path sampling to build better Markovian state models: Predicting
the folding rate and mechanism of a tryptophan zipper beta
hairpin
Nina Singhal, Christopher D. Snow, and Vijay S. Pande. Journal
of Chemical Physics (2004)
SUMMARY: How
can Folding@Home use thousands to millions of CPUs to efficiently
simulate long timescale biomolecular dynamics? This paper outlines
the "Markovian State Model" method which is the foundation of
how most new Folding@Home calculations are performed. The MSM method
allows for a very efficient use of uncoupled simulations, as one
would easily get from distributed computing.
TECHNICAL ABSTRACT: We
propose an efficient method for the prediction of protein folding
rate constants and mechanisms. We use molecular dynamics simulation
data to build Markovian state
models (MSMs), discrete representations of the pathways sampled. Using
these MSMs, we can quickly calculate the folding probability (Pfold)
and mean first passage time of all the sampled points. In addition,
we provide techniques for evaluating these values under perturbed
conditions without expensive recomputations. To demonstrate this
method on a challenging system, we apply these techniques to a two-dimensional
model energy landscape and the folding of a tryptophan
zipper beta hairpin.
18. Simulations of the role of water in the protein-folding mechanism
Young Min Rhee, Eric J. Sorin, Guha Jayachandran, Erik Lindahl, & Vijay
S Pande. Proceedings of the National Academy of Sciences (2004)
ABSTRACT:
There are many unresolved questions regarding the role of water in protein folding. Does water merely
induce hydrophobic forces, or does the discrete nature of water play a structural role in folding? Are the
nonadditive aspects of water important in determining the folding mechanism? To help to address these
questions, we have performed simulations of the folding of a model protein (BBA5) in explicit solvent.
Starting 10,000 independent trajectories from a fully unfolded conformation, we have observed numerous
folding events, making this work a comprehensive study of the kinetics of protein folding starting from the
unfolded state and reaching the folded state and with an explicit solvation model and experimentally
validated rates. Indeed, both the raw TIP3P folding rate (4.5 +/- 2.5s) and the
diffusion-constant corrected
rate (7.5 +/- 4.2s) are in strong agreement with the experimentally observed
rate of 7.5 +/- 3.5s. To address the role of water in folding, the
mechanism is compared with that predicted from implicit solvation
simulations. An examination of solvent density near hydrophobic
groups during folding suggests that in the case of BBA5, there are
water-induced effects not captured by implicit solvation models,
including signs of a ''concurrent mechanism'' of core collapse and
desolvation.
17. Trp zipper folding kinetics by molecular dynamics and temperature-jump
spectroscopy
Christopher D. Snow, Linlin Qiu, Deguo Du, Feng Gai, Stephen J. Hagen, & Vijay S Pande. Proceedings
of the National Academy of Sciences (2004)
ABSTRACT:
We studied the microsecond folding dynamics of three hairpins
(Trp zippers 1-3, TZ1-TZ3) by using temperature-jump fluorescence
and atomistic molecular dynamics in implicit solvent. In addition,
we studied TZ2 by using time-resolved IR spectroscopy. By using
distributed computing, we obtained an aggregate simulation time
of 22 ms. The simulations included 150, 212, and 48 folding events
at room temperature for TZ1, TZ2, and TZ3, respectively. The
all-atom optimized potentials for liquid simulations (OPLSaa) potential
set predicted TZ1 and TZ2 properties well; the estimated
folding rates agreed with the experimentally determined folding
rates and native conformations were the global potential-energy
minimum. The simulations also predicted reasonable unfolding
activation enthalpies. This work, directly comparing large simulated
folding ensembles with multiple spectroscopic probes, revealed
both the surprising predictive ability of current models as
well as their shortcomings. Specifically, for TZ1-TZ3, OPLS for
united atom models had a nonnative free-energy minimum, and
the folding rate for OPLSaa TZ3 was sensitive to the initial conformation.
Finally, we characterized the transition state; all TZs fold
by means of similar, native-like transition-state conformations.
16. Does Native State Topology Determine the RNA Folding Mechanism? Eric J. Sorin, Bradley J. Nakatani, Young Min Rhee, Guha Jayachandran, V Vishal, & Vijay S Pande. Journal of Molecular Biology (2004)
ABSTRACT:
Recent studies in protein folding suggest that native state topology plays a
dominant role in determining the folding mechanism, yet an analogous
statement has not been made for RNA, most likely due to the strong
coupling between the ionic environment and conformational energetics
that make RNA folding more complex than protein folding. Applying a
distributed computing architecture to sample nearly 5000 complete tRNA
folding events using a minimalist, atomistic model, we have characterized
the role of native topology in tRNA folding dynamics: the simulated bulk
folding behavior predicts well the experimentally observed folding
mechanism. In contrast, single-molecule folding events display multiple
discrete folding transitions and compose a largely diverse, heterogeneous
dynamic ensemble. This both supports an emerging view of heterogeneous
folding dynamics at the microscopic level and highlights the
need for single-molecule experiments and both single-molecule and bulk
simulations in interpreting bulk experimental measurements.
15.
Structural correspondence between the alpha-helix and the random-flight chain
resolves how
unfolded proteins can have native-like properties.
Bojan Zagrovic & Vijay S Pande. Nature Structural Biology (2003)
ABSTRACT: Recently,
we have proposed that, on average, the structure of the unfolded state
of small, mostly alpha-helical proteins may be similar to the native
structure (the 'mean-structure' hypothesis). After examining thousands
of simulations of both the folded and the unfolded states of five polypeptides
in atomistic detail at room temperature, we report here a result that
seems at odds with the mean-structure hypothesis. Specifically, the
average inter-residue distances in the collapsed unfolded structures
agree well with the statistics of the ideal random-flight chain with
link length of 3.8 Å (the length of one amino acid). A possible
resolution of this apparent contradiction is offered by the observation
that the inter-residue distances in a typical alpha-helix over short
stretches are close to the average distances in an ideal random-flight
chain.
14.
Equilibrium Free Energies from Nonequilibrium Measurements Using Maximum-Likelihood
Methods.
Michael R. Shirts, Eric Bair, Giles Hooker, and Vijay S Pande. Physical
Review Letters (2003)
ABSTRACT: We
present a maximum likelihood argument for the Bennett acceptance
ratio method, and derive a simple formula for the variance of free
energy estimates generated using this method. This derivation of
the acceptance ratio method, using a form of logistic regression,
a common statistical technique, allows us to shed additional light
on the underlying physical and statistical properties of the method.
For example, we demonstrate that the acceptance ratio method yields
the lowest variance for any estimator of the free energy which is
unbiased in the limit of large numbers of measurements.
13.
Extremely precise free energy calculations of amino acid side chain analogs:
Comparison of common molecular mechanics force fields for proteins.
Michael R. Shirts, Jed W. Pitera, William C. Swope, and Vijay S. Pande. Journal
of Chemical Physics (2003)
ABSTRACT: Quantitative
free energy computation involves both using a model that is sufficiently
faithful to the experimental system under study (accuracy) and establishing
statistically meaningful measures of the uncertainties resulting
from finite sampling (precision). We use large-scale distributed
computing to access sufficient computational resources to extensively
sample molecular systems and thus reduce statistical uncertainty
of measured free energies. In order to examine the accuracy of a
range of common models used for protein simulation, we calculate
the free energy of hydration of 15 amino acid side chain analogs
derived from recent versions of the OPLS-AA, CHARMM, and AMBER parameter
sets in TIP3P water using thermodynamic integration. We achieve a
high degree of statistical precision in our simulations, obtaining
uncertainties for the free energy of hydration of 0.02–0.05
kcal/mol, which are in general an order of magnitude smaller than
those found in other studies. Notably, this level of precision is
comparable to that obtained in experimental hydration free energy
measurements of the same molecules. Root mean square differences
from experiment over the set of molecules examined using AMBER-,
CHARMM-, and OPLS-AA-derived parameters were 1.35 kcal/mol, 1.31
kcal/mol, and 0.85 kcal/mol, respectively. Under the simulation conditions
used, these force fields tend to uniformly underestimate solubility
of all the side chain analogs. The relative free energies of hydration
between amino acid side chain analogs were closer to experiment but
still exhibited significant deviations. Although extensive computational
resources may be needed for large numbers of molecules, sufficient
computational resources to calculate precise free energy calculations
for small molecules are accessible to most researchers.
12.
Solvent Viscosity Dependence of the Folding Rate of a Small
Protein:
Distributed Computing Study.
Bojan Zagrovic and Vijay
S. Pande. Journal of Computational Chemistry (2003)
ABSTRACT: By
using distributed computing techniques and a supercluster of more
than 20,000 processors we simulated folding of a 20-residue Trp Cage
miniprotein in atomistic detail with implicit GB/SA solvent at a
variety of solvent viscosities (g). This
allowed us to analyze the dependence of folding rates on viscosity.
In particular, we focused
on the low-viscosity regime (values below the viscosity of water).
In accordance with Kramers' theory, we observe approximately linear
dependence of the folding rate on 1/g for
values from 1-10^(-1) × that
of water viscosity. However, for the regime between 10^(-4) - 10^(-1)× that
of water viscosity we observe power-law dependence of the form k
~ g^(-1/5). These results suggest that
estimating folding rates from molecular simulations run at low viscosity
under
the assumption of
linear dependence
of rate on inverse viscosity may lead to erroneous results.
11. Insights
Into Nucleic Acid Conformational Dynamics from Massively
Parallel Stochastic Simulations.
Eric J. Sorin, Young Min Rhee, Bradley J. Nakatani & Vijay S.
Pande. Biophysical
Journal (2003)
ABSTRACT: The
helical hairpin is one of the most ubiquitous and elementary secondary
structural motifs in nucleic acids,
capable of serving functional roles and participating in long-range
tertiary contacts. Yet the self-assembly of these structures
has not been well-characterized at the atomic level. With this in mind,
the dynamics of nucleic acid hairpin formation and
disruption have been studied using a novel computational tool: large-scale,
parallel, atomistic molecular dynamics simulation
employing an inhomogeneous distributed computer consisting of more
than 40,000 processors. Using multiple methodologies,
over 500 ms of atomistic simulation time has been collected for a large
ensemble of hairpins (sequence 5'-
GGGC[GCAA]GCCU-3'), allowing characterization of rare events not previously
observable in simulation. From uncoupled ensemble dynamics simulations
in unperturbed folding conditions, we report on 1), competing pathways
between the folded and unfolded regions of the conformational space;
2), observed non-native stacking and basepairing traps; and 3), a
helix unwinding-rewinding
mode that is differentiated from the unfolding and folding dynamics.
A heterogeneous
transition state ensemble is characterized structurally through
calculations of conformer-specific folding probabilities and a multiplexed
replica exchange stochastic dynamics algorithm is used to derive
an approximate folding landscape. A comparison between the observed
folding mechanism and that of a peptide b-hairpin analog suggests that
although native topology defines the character of the folding
landscape, the statistical weighting of potential folding pathways
is determined by the chemical nature of the polymer.
10.
Multiplexed-Replica Exchange Molecular Dynamics Method for
Protein Folding Simulation.
Young Min Rhee & Vijay S. Pande. Biophysical Journal (2003)
ABSTRACT: Simulating
protein folding thermodynamics starting purely from a protein
sequence is a grand challenge of
computational biology. Here, we present an algorithm to calculate
a canonical distribution from molecular dynamics simulation of
protein folding. This algorithm is based on the replica exchange
method
where the kinetic trapping problem is overcome by
exchanging noninteracting replicas simulated at different temperatures.
Our algorithm uses multiplexed-replicas with a number
of independent molecular dynamics runs at each temperature. Exchanges
of configurations between these multiplexed-replicas
are also tried, rendering the algorithm applicable to large-scale
distributed computing (i.e., highly heterogeneous parallel
computers with processors having different computational power).
We demonstrate the enhanced sampling of this algorithm by
simulating the folding thermodynamics of a 23 amino acid miniprotein.
We show that better convergence is achieved compared
to constant temperature molecular dynamics simulation, with an
effcient scaling to large number of computer processors.
Indeed, this enhanced sampling results in (to our knowledge) the
first example of a replica exchange algorithm that samples
a folded structure starting from a completely unfolded state.
9. The
Trp Cage: Folding Kinetics and Unfolded State Topology via
Molecular
Dynamics Simulations.
Christopher D. Snow, Bojan Zagrovic, and Vijay S. Pande. Journal of the
Americal Chemical Society (2002)
ABSTRACT: A
number of rapidly folding proteins have been characterized
in recent years.1 These small proteins can provide the first direct
comparisons between simulated and experimental protein folding
kinetics and pathways. Proteins have been characterized through
thermodynamic sampling methods, unfolding simulations, and
folding simulations using simple potentials. Here, as described
recently, we use several thousand stochastic dynamics simulations
in a generalized-Born implicit solvent (in atomic detail) to simulate
the folding dynamics of the Trp cage mini-protein under experimental
conditions (27 °C with full solvent viscosity, Á ) 91
ps-1).
The Folding@Home distributed computing project was used to
generate an aggregate simulation time of ~100 us (~250 CPU
years). First we capture the rapid relaxation from an extended
starting condition to a relaxed unfolded state ensemble of thousands
of conformations. With continued simulation, a small fraction of
these simulations reach the folded state. Furthermore, the topology
of the collapsed unfolded state closely resembles the native state.
8.
Absolute comparison of simulated and experimental protein-folding
dynamics.
Christopher D. Snow, Houbi Ngyen, Vijay S. Pande, and Martin Gruebele.
Nature (2002)
ABSTRACT: Protein folding is difficult
to simulate with classical molecular dynamics. Secondary structure
motifs such as -helices and -hairpins
can form in 0.1–10 µs (ref. 1), whereas small proteins
have been shown to fold completely in tens of microseconds. The
longest folding simulation to date is a single 1-µs simulation
of the villin headpiece; however, such single runs may miss many
features of the folding process as it is a heterogeneous reaction
involving an ensemble of transition states. Here, we have used
a distributed computing implementation to produce tens of thousands
of 5–20-ns trajectories (700 µs) to simulate mutants
of the designed mini-protein BBA5. The fast relaxation dynamics
these predict were compared with the results of laser temperature-jump
experiments. Our computational predictions are in excellent agreement
with the experimentally determined mean folding times and equilibrium
constants. The rapid folding of BBA5 is due to the swift formation
of secondary structure. The convergence of experimentally and computationally
accessible timescales will allow the comparison of absolute quantities
characterizing in vitro and in silico (computed) protein folding.
7. Native-like
Mean Structure in the Unfolded Ensemble of Small Proteins.
Bojan Zagrovic, Christopher D. Snow, Siraj Khaliq, Michael
R. Shirts, and Vijay S. Pande. Journal of Molecular Biology (2002)
ABSTRACT: The
nature of the unfolded state plays a great role in our understanding
of proteins. However, accurately studying the unfolded state with
computer simulation is difficult, due to its complexity and the
great deal of sampling required. Using a supercluster of over 10,000
processors we have performed close to 800 ms of molecular dynamics
simulation in atomistic detail of the folded and unfolded states
of three polypeptides from a range of structural classes: the all-alpha
villin headpiece molecule, the beta hairpin tryptophan zipper,
and a designed alpha-beta zinc finger mimic. A comparison between
the folded and the unfolded ensembles reveals that, even though
virtually none of the individual members of
the unfolded ensemble exhibits native-like features, the mean unfolded structure
(averaged over the entire unfolded ensemble) has a native-like geometry. This
suggests several novel implications for protein folding and structure prediction
as well as new interpretations for experiments which find structure in ensemble-averaged
measurements.
6.
Simulation of Folding of a Small Alpha-helical Protein
in Atomistic
Detail using Worldwidedistributed
Computing.
Bojan Zagrovic, Christopher D. Snow, Michael R. Shirts,
and Vijay S. Pande. Journal of Molecular Biology (2002)
ABSTRACT: By
employing thousands of PCs and new worldwide-distributed computing
techniques, we have simulated in atomistic detail the folding of
a fastfolding
36-residue a-helical protein from the villin headpiece. The total
simulated time exceeds 300 ms, orders of magnitude more than previous
simulations of a molecule of this size. Starting from an extended
state,
we obtained an ensemble of folded structures, which is on average
1.7
and 1.9 away from the native state in Ca distance-based root-meansquare
deviation (dRMS) and Cb dRMS sense, respectively. The folding
mechanism of villin is most consistent with the hydrophobic collapse
view of folding: the molecule collapses non-specifically very quickly
(20 ns), which greatly reduces the size of the conformational space
that
needs to be explored in search of the native state. The conformational
search in the collapsed state appears to be rate-limited by the
formation
of the aromatic core: in a significant fraction of our simulations,
the
C-terminal phenylalanine residue packs improperly with the rest
of the
hydrophobic core. We suggest that the breaking of this interaction
may
be the rate-determining step in the course of folding. On the basis
of our
simulations we estimate the folding rate of villin to be approximately
5 ms. By analyzing the average features of the folded ensemble
obtained
by simulation, we see that the mean folded structure is more similar
to
the native fold than any individual folded structure. This finding
highlights
the need for simulating ensembles of molecules and averaging the
results in an experiment-like fashion if meaningful comparison
between
simulation and experiment is to be attempted. Moreover, our results
demonstrate that (1) the computational methodology exists to simulate the multi-microsecond regime using distributed computing and (2)
that
potential sets used to describe interatomic interactions may be
sufficiently
accurate to reach the folded state, at least for small proteins.
We conclude
with a comparison between our results and current protein-folding
theory.
5.
Folding@Home and Genome@Home: Using distributed computing
to tackle
previously intractable problems in computational biology.
Stefan M. Larson, Christopher D. Snow, Michael R. Shirts, and Vijay S. Pande. To
appear in Computational Genomics, Richard Grant, editor, Horizon Press, (2002)
ABSTRACT: For
decades, researchers have been applying computer simulation to
address problems in biology. However, many of these ?grand challenges?
in computational biology, such as simulating how proteins fold,
remained unsolved due to their great complexity. Indeed, even
to simulate the fastest folding protein would require decades
on the fastest modern CPUs. Here, we review novel methods to
fundamentally speed such previously intractable problems using
a new computational paradigm: distributed computing. By efficiently
harnessing tens of thousands of computers throughout the world,
we have been able to break previous computational barriers. However,
distributed computing brings new challenges, such as how to efficiently
divide a complex calculation of many PCs that are connected by
relatively slow networking. Moreover, even if the challenge of
accurately reproducing reality can be conquered, a new challenge
emerges: how can we take the results of these simulations (typically
tens to hundreds of gigabytes of raw data) and gain some insight
into the questions at hand. This challenge of the analysis of
the sea of data resulting from large-scale simulation will likely
remain for decades to come.
4.
Atomistic protein folding simulations on the submillisecond
timescale
using worldwide distributed computing.
Vijay Pande, et al. Peter Kollman Memorial Issue, Biopolymers (2002)
ABSTRACT: Atomistic
simulations of protein folding have the potential to be a great
complement to experimental studies, but have been severely limited
by the time scales accessible with current computer hardware and
algorithms. By employing a worldwide distributed computing network
of tens of thousands of PCs and algorithms designed to effciently
utilize this new many-processor, highly heterogeneous, loosely
coupled distributed computing paradigm, we have been able to simulate
hundreds of microseconds of atomistic molecular dynamics. This
has allowed us to directly simulate the folding mechanism and to
accurately predict the folding rate of several fast-folding proteins
and polymers, including a nonbiological helix, polypeptide a-helices,
a b-hairpin, and a three-helix bundle protein from the villin headpiece.
Our results demonstrate that one can reach the time scales needed
to simulate fast folding using distributed computing, and that
potential sets used to describe interatomic interactions are suffciently
accurate to reach the folded state with experimentally
validated rates, at least for small proteins.
3.
b-Hairpin Folding Simulations in Atomistic Detail Using an Implicit Solvent Model
Bojan
Zagrovic, Eric J. Sorin, and Vijay Pande, Journal of
Molecular Biology (2001)
ABSTRACT: We
have used distributed computing techniques and a supercluster
of thousands of computer processors to study folding of the C-terminal
b-hairpin from protein G in atomistic detail using the GB/SA
implicit
solvent model at 300 K. We have simulated a total of nearly
38 ms of folding time and obtained eight complete and independent
folding trajectories. Starting from an extended state, we observe
relaxation to an unfolded state characterized by non-specific,
temporary hydrogen bonding. This is followed by the appearance
of interactions between hydrophobic residues that stabilize
a
bent intermediate. Final formation of the complete hydrophobic
core
occurs cooperatively at the same time that the final hydrogen
bonding pattern appears. The folded hairpin structures we observe
all contain
a closely packed hydrophobic core and proper b-sheet backbone
dihedral angles, but they differ in backbone hydrogen bonding
pattern. We
show that this is consistent with the existing experimental
data on the hairpin alone in solution. Our analysis also reveals
short-lived
semi-helical intermediates which deÆne a thermodynamic
trap. Our results are consistent with a three-state mechanism
with a
single rate-limiting step in which a varying final hydrogen
bond pattern is apparent, and semi-helical off-pathway intermediates
may appear early in the folding process. We include details
of
the ensemble dynamics
methodology and a discussion of our achievements using this new computational
device for studying dynamics at the atomic level.
2.
Mathematical Foundations of ensemble dynamics.
Michael R. Shirts and Vijay Pande, Physical Review
Letters (2001)
ABSTRACT: A
set of parallel replicas of a single simulation can be statistically
coupled to closely approximate long trajectories. In many cases,
this produces nearly linear speedup over a single simulation (M times
faster with M simulations), rendering previously intractable
problems within reach of large computer clusters. Interestingly,
by varying the coupling of the parallel simulations, it is possible
in some systems to obtain greater than linear speedup. The methods
are generalizable to any search algorithm with long residence times
in intermediate states.
1.
Screen savers of the world, Unite!
Michael R. Shirts and Vijay Pande, Science 2000.
Summary:
Is distributed computing a fundamental advance or simply fashionable
computing? In this brief letter, we show how distributed computing
can be used to tackle problems which make even supercomputers quake.
Indeed, we show how distributed computing has the ability to create
a supercomputer thousands of times more powerful than any existing
machine, due the large number of processors on the internet (hundreds
of millions) and the relatively small number of computer processors
in supercomputers (thousands).
For more papers by our
group, please click here.
For recent press coverage,
click here. |