An Introduction to Proteins
Part 4



Example: The huntingtin protein

Now that you have the basic idea of how a gene becomes a protein, we are going to follow the Huntington gene on its journey to becoming the huntingtin protein. Our story begins in the nucleus, when we zoom in on a particular section of chromosome 4. This section is where the DNA sequence known as the Huntington gene is located. The gene is actually quite long (180 kb), so it would take a lot of space to write out. Just to give you an idea, here are the first 600 bases:

    1 TTGCTGTGTG AGGCAGAACC TGCGGGGGCA GGGGCGGGCT GGTTCCCTGG CCAGCCATTG
   61 GCAGAGTCCG CAGGCTAGGG CTGTCAATCA TGCTGGCCGG CGTGGCCCCG CCTCCGCCGG
  121 CGCGGCCCCG CCTCCGCCGG CGCACGTCTG GGACGCAAGG CGCCGTGGGG GCTGCCGGGA
  181 CGGGTCCAAG ATGGACGGCC GCTCAGGTTC TGCTTTTACC TGCGGCCCAG AGCCCCATTC
  241 ATTGCCCCGG TGCTGAGCGG CGCCGCGAGT CGGCCCGAGG CCTCCGGGGA CTGCCGTGCC
  301 GGGCGGGAGA CCGCCATGGC GACCCTGGAA AAGCTGATGA AGGCCTTCGA GTCCCTCAAG
  361 TCCTTCCAGC AGCAGCAGCA GCAGCAGCAG CAGCAGCAGC AGCAGCAGCA GCAGCAGCAG
  421 CAGCAGCAGC AACAGCCGCC ACCGCCGCCG CCGCCGCCGC CGCCTCCTCA GCTTCCTCAG
  481 CCGCCGCCGC AGGCACAGCC GCTGCTGCCT CAGCCGCAGC CGCCCCCGCC GCCGCCCCCG
  541 CCGCCACCCG GCCCGGCTGT GGCTGAGGAG CCGCTGCACC GACCAAAGAA AGAACTTTCA

Remember that there are only four letters in the DNA alphabet: A, C, G, and T. The different combinations of these four letters comprise the instructions for everything that occurs in your body. In our case, the instructions in the Huntington gene are to create the huntingtin protein. You may have noticed that a certain section of the genetic code has been highlighted. Do you notice anything in particular about this section? In the highlighted section, the bases C-A-G are repeated a number of times. Everyone has a CAG repeat in the Huntington gene, but the number of repeats is different for each person. In this case, there are only 21 repeats, which falls within the normal range of repeats, so we would refer to this as the non-HD allele. Remember that everyone has two copies (alleles) of the Huntington gene. Having only one copy with too many CAG repeats will result in a person getting HD.

Let’s get back to our journey. When the signal arrives indicating that more huntingtin protein is needed, the process of transcription begins. The DNA section containing the Huntington gene begins to uncoil, and the two complementary strands are separated, exposing the bases. Now the RNA nucleotides begin to fly in, matching up with their base pairs on the DNA. Let’s take a look at how this would work, using the DNA sequence for the first 30 bases:

    DNA:    TTGCTGTGTG AGGCAGAACC TGCGGGGGCA...

Now we must match each DNA base with the complementary RNA base. Recall that T matches with A, G matches with C, and C matches with G. What does A match with? Be careful – remember that RNA does not use T, it instead uses U! The correctly matched bases would be ordered as follows:

    DNA:    TTGCTGTGTG AGGCAGAACC TGCGGGGGCA...
    mRNA:   AACGACACAC UCCGUCUUGG ACGCCCCCGU...

This process must be carried out for the entire DNA sequence comprising the gene; we’ve only matched the first 30 bases. Once each base is matched with its complement, the mRNA transcript is formed. Before it exits the nucleus, it may encounter some changes such as the removal of unnecessary sections.

We are finished with transcription, so we exit the nucleus and enter the main part of the cell: the cytoplasm. The mRNA transcript must now meet up with a ribosome to get the process of translation going. The two parts of the ribosome clamp down on the beginning of the mRNA transcript, and we are just about ready to start. What do we need to translate from the language of RNA to the language of protein? Our interpreters, tRNA! Different tRNAs fly by, reading the mRNA bases in groups of three; remember that these groups of three are called codons. Each codon translates to one amino acid. If a tRNA finds a codon on the mRNA with a sequence that is complementary to its own anticodon, we’re in business! When a tRNA has its amino acid on one end and a matching codon on the other, the tRNA binds to the ribosome. Let’s see how this would go for the first 30 bases of the mRNA that we transcribed. Here is the beginning of the mRNA transcript:

    mRNA:   AACGACACAC UCCGUCUUGG ACGCCCCCGU...

Since the tRNAs read the bases as codons, let’s separate the transcript into groups of three bases:

    mRNA:   AAC GAC ACA CUC CGU CUU GGA CGC CCC CGU...

Now each mRNA codon must be matched up with a tRNA with a complementary anticodon. Remembering that A and U pair with each other and G and C pair with each other, we can write the correct order of tRNA anticodons:

    mRNA:   AAC GAC ACA CUC CGU CUU GGA CGC CCC CGU...
    tRNA:   UUG CUG UGU GAG GCA GAA CCU GCG GGG GCA...

Each of these three-letter “words” translates to one amino acid. The tRNA translates the codon into the correct amino acid because it can only bind to one of each. When we humans want to translate, we use a chart such as the one found here . The resulting string of amino acids is as follows:

    tRNA:      UUG CUG UGU GAG GCA GAA CCU GCG GGG GCA...
    Protein:   Leu Leu Cys Glu Ala Glu Pro Ala Gly Ala...

The way the amino acids are written here is in their three-letter abbreviations. Each amino acid also has a one-letter abbreviation. The beginning of our protein written with the one-letter abbreviations is as follows: LLCEAEPAGA. When we get to the section with the CAG repeats, we will find that the codon CAG translates to the amino acid glutamine (abbreviated as Gln or Q). The many repeated CAG codons will translate into a string of glutamines in the resulting protein, which is why HD is also known as a polyglutamine or polyQ disease (“poly” means many). (For more information on polyglutamine diseases, click here.)

Fig P-19: Huntingtin Primary Structure

The mRNA transcript will continue going through the ribosome as each new tRNA binds to the matching codon with its corresponding amino acid. A peptide bond will form between the growing amino acid chain and the latest amino acid until all of the codons have been translated and we have our complete polypeptide. We are now left with the full sequence of amino acids, or the primary structure.

Fig P-20: Heat Repeat Sequences

While the three-dimensional structure has not been completely determined, we do know what at least part of the protein looks like. The huntingtin protein is comprised mostly of "HEAT" motifs. These HEAT motifs have a characteristic pattern and often appear repeated together, as they do in the huntingtin protein about ten times. An example of what a section of the huntingtin protein containing these HEAT repeat sequences might look like is shown on the left.

Do you notice any familiar secondary structure motifs? There are many alpha helices in the HEAT motif of the huntingtin protein. The tertiary structure of this section describes the relationship in space between each alpha helix. As you can see, this section has two polypeptide strings; the quaternary structure of just this section would include both polypeptides and their relationship to one another. The entire huntingtin protein is a very large protein made from many polypeptide strings, so the overall quaternary structure would be the whole protein with all of the different polypeptides together.

It is important to note that this is an example of a “normal” huntingtin protein, one that originated from a DNA sequence that did not have enough CAG repeats to result in HD. If the DNA sequence had 40 or more CAG repeats, then the transcribed mRNA sequence would have had 40 or more GUC repeats. The translating tRNA would match each of these GUC codons with the anticodon CAG, corresponding to the amino acid glutamine. So, the resulting protein would have 40 or more glutamines (depending on the actual number of repeats in the DNA). This change in the sequence would affect how the protein folds. Since the way a protein folds determines the final shape of the protein, and the shape of the protein determines the function, this HD protein will not function properly, and the person will develop Huntington’s disease.

We can now begin to understand how the number of CAG repeats determines whether or not a person will develop HD. We can think of 35 repeats as a threshold, where, up to this number of repeats, the protein can still fold well enough to get its job done. Any more than 35 repeats—and certainly more than 40 repeats—changes the way the protein folds so that it cannot function normally. When a protein’s ability to fold is changed, there are two possibilities for what happens to its function. The protein can simply stop working altogether (this is called “loss of function”) or it can acquire a new kind of function (this is called “gain of function”). If the protein gains a function, it is possible to be helpful but more likely it is harmful. A harmful new function is often called “toxic gain of function.” Current research suggests that the protein resulting from the HD allele has a toxic gain of function. This new function contributes to the development of HD.

To summarize, 35 or fewer CAG repeats in the Huntington gene allows the resulting huntingtin protein to fold normally. The normal shape allows normal function and the person with two normally folded huntingtin proteins will not develop HD. A Huntington gene with 40 or more CAG repeats will result in an improperly folded huntingtin protein. The abnormal shape resulting from too many glutamines in the protein will prevent normal functioning of the huntingtin protein. The change of function in one abnormal huntingtin protein is enough to lead to the development of HD.

We hope you enjoyed this section of the HOPES website. To email this article to a friend, please click here. To leave feedback for the HOPES team, click here. Make sure to specify which article you're referring to.

-K. Taub, 1-29-06


For further reading:

  1. Biology-Online.org. "Protein Synthesis." http://www.biology-online.org/1/6_protein_synthesis.htm. This is an excellent overview of protein synthesis and is easy to understand.
  2. Friedli, George-Louis. http://www.friedli.com/herbs/phytochem/proteins.html. This is a fairly difficult overview of amino acids and protein structure.
  3. Massey University. http://www.massey.ac.nz/~wwbioch/Prot/tutehome/tutepage.htm. This is a tutorial of medium difficulty with many examples and pictures.
  4. MIT. "Central Dogma: Translation." http://web.mit.edu/esgbio/www/dogma/trl.html. This is a short, easy to understand review of protein translation.
  5. Unilever Education Advanced Series. "Proteins." http://www.schoolscience.co.uk/content/5/chemistry/proteins/. This is a fairly difficult and very detailed tutorial about proteins.
  6. prevback to chapter titlenext

    Last Modified: 02/12/2006


    HOPES Logo

    An educational product of HOPES, not to be used in place of medical care.
    For more information about HOPES, click on the Logo.
    To contact HOPES with comments or questions, click here.


    You are HOPES site visitor number

 
Search HOPES
Esperanzas/Espoirs/
Other Languages
About HOPES    Print This Page     Home    Forum    Site Search    Glossary    Contact Us
DHTML Web Menu by OpenCube