Historically unfolded proteins were seen as a random distribution
of a large number of structural possibilities. If you take a piece
of string, it can be folded in many different ways.
Each amino acid contributes two freely rotating
bonds to the backbone of the polypeptide chain, and thus even
a small protein (100 amino acid residues) have a very large number
of configurations it can adopt when unfolded (10100
- a number much larger than the grains of sand in a typical beach).
It seemed almost paradoxical given a very large number of available
states, proteins still manage to fold in a biologically relevant
time to carry out their biological function.But are we really
certain that unfolded states are astronomically complex? Our results
suggest there may be a surprising simplicity to this seemingly
heterogeneous mess.
Selected portion of simulation of HIV INTEGRASE unfolding by Pande
Group (simulation is time-reversed for illustration purposes)
Since it was believed the unfolded state of proteins
to be very complex, and biological function is predominantly dominated
by the native state, unfolded state of proteins received significantly
less attention. Recent studies of denatured proteins (For a definition
of denatured protein click on movie)
suggest that denatured state may not be as diverse as previously
thought. Unfolded state refers to the heterogeneous state of proteins
before folding into native state and it is different from the
denatured state. For example cooked eggs does represent irreversibly
denatured proteins. However, most denatured proteins can be renatured
when the chemical agent is diluted or removed all together.
Computer simulation of unfolded state seemed formidable
due to intense computational power needed. However, using more
than 10,0000 computer processors through folding@home
we have run thousands of fully independent simulations of three
small proteins, each simulation tens of nanoseconds (billionth
of a second) long. One advantage of running such a large number
of independent, relatively short folding simulations is that we
can expect a small number of folding events to take place which
we can study. Even though initial random motions does play a factor,
ultimately though under the right conditions, proteins do not
have much of a choice but fold into their native state to do their
biological function within fraction of a second. . Large number
of parallel simulations is our way of handling the initial random
motions until a small number of proteins fold within tens of nanoseconds.
It takes many microseconds (millionth of a second) for a complete
sample of proteins to fold.
Having more than 10,000 independent simulations
also give us the advantage of giving us a detailed picture of
the unfolded protein very early into folding (tens of nanoseconds
after the initiation of folding.) The illustration below shows
some of the stages the proteins go through simulation. Initially
the proteins are extended like a piece of spaghetti. However,
they quickly collapse to a compact unfolded state before the final
folded state.
WHAT HAVE WE LEARNED ABOUT THE UNFOLDED STATE OF
THREE SMALL PROTEINS?
Folding simulations for three proteins, Native Villin,
Native TrpZip and Native BBA5, were started from extended conformations.
In about 10 ns TrpZip, BBA5 and in 20 ns villin collapse from
extended into compact conformations.
Individual members of the unfolded protein ensembles
are very diverse; however we found if we look at the average structure
there are some surprising similarities to the folded structure.
First let us discuss how we determine the average structure. We
get the distance between a selected carbon atoms (Alpha Carbon)
of each amino acid in the proteins. In the illustration below,
a protein with 7 amino acids, the gray circles represent alpha
carbons and blue lines represent the distance in Angstroms ( 1
Angstrom = 1Å = 10-10 meters - one meter is roughly
one yard).
Using the above illustration we would organize the
data in a 7 X 7 table (Matrix).
1
2
3
4
5
6
7
1
0
1.2 Å
5.1Å
7.6Å
6.1Å
5.4Å
4.2Å
2
1.2 Å
0
3
5.1Å
0
4
7.6Å
0
5
6.1Å
0
6
5.4Å
0
7
4.2Å
0
We would organize a similar table (Matrix) for the native protein
in the folded state. Then we would compare the structures between
each protein in the unfolded state with the native folded one.
We use mathematical formula below to do the comparison. We calculate
the difference of the distances between the each (i and j represent
corresponding atoms between two structures) entry in the table
for unfolded and folded protein, square it, multiply by 2, take
the square root of it and divide it by number of atoms in the
protein, the result of these calculations are called distance
root-mean square deviation or dRMS.
In the case of Villin protein the table (matrix)
would really be a 36 X36 table since Villin has 36 amino acids
not just 7 as in our illustration above. If we graph the number
of structures vs dRMS we get a general distribution curve (red
bars) which indicates the unfolded state is a very diverse group.
However, red bars in the above graph refers to comparing each
unfolded protein with the folded protein. When protein structures
are determined experimentally (with x-ray diffraction or Nuclear
Magnetic Resonance), what the experimenter really determine is
the average structure of a large number of proteins. We wanted
to take a similar approach in our analysis of data as a result
of our simulations. We averaged these distances for thousands
of protein samples in the simulation. In the example below, the
average distance (indicated by red line between Alpha Carbons)
will be: (7Å + 6Å + 8Å) / 3 = 7Å..
From all the averages we would get a single table:
36 X 36 for Villin protein (Villin has 36 amino acids or 36 alpha
carbons). We compare the average structure of unfolded state with
the folded state using dRMS formula. The result of these calculations
is shown as an arrow in the above graph. The surprising result
is average structure of unfolded state is quite close to the folded
state (dRMS = 2.4 Å). In comparison if we compare individual
unfolded protein with the folded one dRMS fluctuate widely.
Our findings lead us to form what we call the "mean-structure
hypothesis", which means the geometry of the collapsed unfolded
state of small peptides and proteins in an average sense corresponds
to the geometry of the native folded state.
We suggest that in the folding process in an average sense
essentially the structure of the protein does not change. The
average structure stays in place while folding reduce the large
number of structural variability described in the beginning.
Individual proteins may take a variety of paths to folding,
but on the average of all proteins in the sample, structurally
things do not change that much.
Experimentally the structural analysis is also done on an
average basis of a large number of proteins. This approach will
allow us to better validate our results experimentally.
If the mean-structure hypothesis is correct it could help
us refine simulations. Using the distance constraints one could
find the closest individual member of the unfolded ensemble
to the average structure based on the same unfolded ensemble.
If the hypothesis is correct, this structure should be closer
to the native structure than most other individual unfolded
structures.