Protein Folding

Energy minimization of a peptide structure.
(Energy minimization of a peptide structure.)

In principle, protein folding can be understood through interatomic interactions. If you analyze the potential energy of a molecule as a function of the structure of the protein, the conformations which are most often observed are those which have the lowest energies. The protein packs itself in a manner which maximizes favorable interactions and minimizes unfavorable interactions, just as a collection of small magnets will tend to link up with north and south poles aligned. Computer simulation methods are able to mimic this process (albeit with a number of simplifying assumptions) resulting in simplified models of the packing process such as illustrated below.

Proteins, then, are chains of molecules able to wrap around themselves under the influence of the forces between their constituent atoms and between their atoms and water. The resulting shapes control the function of the protein and we examine this facet of protein science in the following section. However, given the importance of the precise sequence of atoms within a protein how can nature set about perfectly reproducing copies of protein molecules to satisfy the demands of a complex organism? The answer to this question shows us again the importance of chemical interactions, again in the form of hydrogen bonds, and nature's ability to exploit such chemical forces to copy and transfer information.

Whenever you use a computer (or use any one of the electronic appliances or microchip-controlled devices which now permeate the world) you have used a stored sequence of instructions. The instructions, like the directions that accompany that electronic appliance, explain how to achieve a given effect. They enable a device such as a computer to be used a in variety of different ways - to process words or to sum numbers in a spreadsheet - and turn a useless but fast collection of electronic switches into a business tool or an enthralling game. The term code is often applied to the stored sequence of instructions that enable the computer to respond to its external stimuli supplied by the keyboard, mouse or pen. In manufacturing the use of a stored sequence of instructions significantly predates their electronic applications. In fact in 1800 Jacquard used a sequence of instructions stored on punched cards (a medium later exploited by the programmers of computers) to control the production of intricate patterns by automated looms and insuring reproducibility and fashionability for French weavers.

A code is a source of information which, after appropriate processing, may be acted upon. Nature requires a code which may be used to maintain the plans for protein sequences (once formed in the appropriate environment) the protein itself will adopt the required overall shape - storing the required three dimensional information within itself. But the sequence information must come from somewhere. Not only must this be stored, but also ideally there must be a mechanism permitting the transfer of the information to succeeding generations of cells and organisms.

Although life on earth is so outwardly diverse a tremendous linkage between life forms is evident in the code carrier which was discovered at the culmination of a chain of scientific discoveries by James Watson and Francis Crick in 1953. Their simple model provided an explanation of not only molecular replication, exploited by all life on earth, but also the chemical interactions which underlie reproduction and genetic inheritance. The model that emerged had the power to rationalize diverse lines of existing evidence and the truth to provide the basis for remarkable predictions and new fields of research. The discovery of the double helix by James Watson and Francis Crick and their colleagues Maurice Wilkins and Rosalind Franklin is one of the most successful models ever proposed.


The molecule that conveys all genetic information is illustrated in the rotating image above. The feature discovered by James Watson in building cardboard models, which makes these molecules so special is the fact that by a subtle pattern of bonds, the same hydrogen bonds which are seen in protein, ice and water, these molecules pair in a very distinctive way. This complementary pairing of bases, shown in the image to the left, means that, under appropriate conditions, a chain of these base molecules can pick up a chain of matching bases to form a complementary strand. Watson and Crick realized that this must be the case and showed how a helical arrangement of two associated strands of nucleic acids could explain the diffraction evidence gathered by the crystallographers, Maurice Wilkins and Rosalind Franklin. The now famous double helix, illustrated above, had been discovered.

The four bases.
(The four bases.)
Base pairs.
(Base pairs.)

Watson and Crick's discovery of the registry of two chain molecules, entwined in DNA's helix, snapped many important observations into place. Biologists had reasoned out the molecular size of the genetic information-carrying component of the cell, chemists had determined the logic of its elemental combinations. In 1953 the two scientific fields became entwined.

The DNA molecule conveys critical information and that makes it centrally important. The double helix, held together by hydrogen bonds, uses its component bases to code for the sequence of amino acids in proteins. Thus for example adenine, adenine, adenine in the base sequence of a DNA molecule is translated into a particular component of a protein molecule. Now proteins are the multi-talented catalysts and molecular work horses, carrying vital oxygen and providing the mechanical power of muscles for example, and like DNA itself, proteins are long chains of component molecules. Nucleic acids carry in their sequence of bases the codes necessary to build the vast range of different protein molecules that are needed by complex organisms. The sequence of steps that form the basis of this extraordinary storage and use of information are now quite well understood. Transcription is a molecular level process involving careful reading of the DNA sequence, collection of the appropriate constituents and their concatenation to produce a protein molecule. The DNA molecule contains not only the information to build protein molecules but also contains a templated copy of itself and this is the key to molecular reproduction, cell division and life. A simple computer generated image conveys the essence of the secret of DNA. A single double helix of DNA is uncoiled and each strand of the double helix is complemented with appropriate bases to form two helices where one had been. The helical information carrier is then perfectly copied. This the basis by which cells are able to divide and organisms grow. By pairing strands of DNA from different cells modified child helices are produced which combine the characteristics of the parent DNA molecules. This is the basis of inheritance. The secret of life is contained in the helix of DNA. Scientists have gradually learned more of the intricate mechanisms by which DNA is controlled within the cells of living organisms. The picture which has emerged is complex and intricate and built around the beautiful simplicity of the helical information carrier, DNA.

Is it possible that only as the single dominant means of propagating molecular information could have emerged from the turbulent chemical activity billions of years ago? There is at present a single known exception to revive one's faith in the diversity of propagation mechanisms explored by life. This exception comes in the form of a 'prion', a protein, which reproduces by commandeering the reproductive machinery of the cells of other species. The disease 'scrapie' in sheep is caused by the hijacking of the sheep's cells by a prion. It is not known how this parasitic protein is able to make the host cell produce prion molecules, but it is known that the infection is transmitted without the involvement of DNA and that prions themselves contain no nucleic acids and yet are able to infect and thereby reproduce.

A schematic illustration of DNA replication. Division of the original pair of base strands, shown in blue, leads to two identical copies of the base sequence.
(A schematic illustration of DNA replication. Division of the original pair of base strands, shown in blue, leads to two identical copies of the base sequence.)

As far as is known then, prions are an isolated instance of molecular opportunistic marauder. Nucleic acids and the information they convey underlie all other forms of life. This molecular information has enjoyed considerable success. Why should that be? DNA efficiently conveys information, with remarkable fidelity, so that efficient molecular machines or proteins pass to successive generations. But DNA also allows for the combining of information from two individuals, through the combining of two strands of DNA to make a single child molecule, which combines the genetic information of the parent molecules. This pairing of information sources allows the molecules of life to successfully contend with changing environments and situations. Not all life forms need to exploit this capability, but for many it turns out to be advantageous. Successive generations are not identical parental copies with the parent's strengths and weaknesses reproduced, instead they are unique individuals built from elements of the genetic codes of both parents. When you see a baby who has clearly inherited her mother's eyes, you are looking at a large tract of copied genetic code with instructions for the manufacture of pigment and proteins. All things being equal, the father's code will be evident in other areas of the child's makeup.