Unit 4. Genes, Proteins, and Evolution

BIOCHEMISTRY FOR CITIZENS

Mutation and evolution

Francois Jacob, a pioneer of molecular biology, said that the dream of every cell is to become two cells. While dreaming, the cell prepares by replicating its DNA. Copying errors are corrected by so-called editing enzymes, which can recognize mismatches between template and copy, and then fix the mismatch. But a few mismatches escape detection, and are inherited. If, in a touch of irony, such an escaped mismatch occurs in a gene that codes for a mismatch-repair enzyme, the daughter cell receiving this mutation, and all of its offspring, will be deficient editors, and will be sloppy copiers, with higher mutation rates than their progenitors. Thus a mutation can lead to higher mutation rates, and might well lead to mutations in genes that regulate cell division. The result of such mutations might be cells that divide without constraint, a hallmark of cancer cells.

So even mutation rates are subject to variation, and perhaps there are “optimal” error rates for different settings. Organisms in fast-changing environments might do better if they are a bit sloppy, and produce more varied offspring, some of which might happen to be better adapted to the next generation’s conditions than to the present ones.

Replication errors are not the only causes of mutation. Ionizing radiation (see Unit 6) and reactive chemicals can damage DNA, and mutations can result from incomplete or faulty repair of damage. If such environmental factors break both strands of DNA, the damage cannot be repaired, and numerous cell functions can be lost, often leading to cell death. In a multicelled organism, the death of one or a very small number of cells is often of little consequence.

Beneath the apparently highly ordered biology that we observe, there is a great deal of sloppiness.

Evolution: tiny steps to big changes

Mutations are the source of variation that natural selection turns into evolutionary change. By far, most mutations are either neutral or harmful. Rarely, a mutation confers benefits. In these rare instances, the recipient of the mutation gains a new function, or an existing function becomes more efficient or effective. The result might be that the mutant organism produces more offspring, due to more efficient use of resources such as nutrients, or to the ability to use a nutrient not useable before, or simply to a longer reproductive lifespan. This modified gene will be passed to offspring, who in turn benefit, and produce more offspring than un-mutated (“wild type”) organisms. The result is that the beneficial-mutant organisms become more common, and might even supplant the wild type in a specific environment. If mutant and wild type do not interbreed (they might, for example, become isolated from each other by virtue of using different food sources), then the two populations will evolve independently, become even more different due to accumulated mutations, and eventually become unable to interbreed. When two groups become reproductively isolation in this fashion, they have become separate species.

It is a challenge to imagine such minute changes accumulating into separation of species. An interesting and entertaining computer simulation of this process was produced by Richard Dawkins and described in his book of the same name, The Blind Watchmaker. The simulation is available at several online sites. It is fun to play with. In class, I will demonstrate these simulations, and show how small variations, along with selection, can produce unexpectedly large and specific changes.

Three implementations of The Blind Watchmaker, each with its own additional tools (and glitches), are at the sites listed here. I am still looking for a perfect one. All will open in a new window.

http://www.annanardella.it/biomorph.html  At one time, this was the best one, but right now, I can't get it to run.
http://www.phy.syr.edu/courses/mirror/biomorph/ The objects start out so small you can hardly see them. Keep clicking boxes at random, then when one is big enough to see, click it, and keep selecting for larger ones until you can see details. Then start selecting for something interesting.
http://www.rennard.org/alife/english/biomgb.html. This one allows you to play with individual genes to see their effects, but there is no simple random-variation/selection mode.
http://www.permadi.com/java/biomorph/ I just found this one; looks promising. Will get back to you.

Images: biomorphs made with online versions of Blind Watchmaker. Clockwise from top left: typical start screen; biomorphs generated at random; result of selecting for round, red forms; results of selecting for tall, narrow, red forms.
A famous protein family

At the molecular level, what specific kinds of evolutionary changes, which can be seen as the result of variation and natural selection, have been documented? There are many splendid examples. One of my favorite is a family of protein-digesting enzymes, all found in your small intestines, that each specialize in a different aspect of breaking down the proteins you eat, and preparing them to be broken down further to amino acids, which are then absorbed into your bloodstream.

Superimposed models of three protein-digesting enzymes (blue, green, and yellow) superimposed on each other. If you recently ate, all of them are now at work in your small intestine. Colors highlight regions according to how different each model is from the average of them: blue regions are very similar in all models, green less similar, yellow even less. More extreme differences would be orange, then red.
Several of the many enzymes that digest proteins in your small intestine are members of a large protein family called serine proteases. Shown here are ribbon models of three members of the family, superimposed on each other. Each ribbon (blue, green, and yellow) models a different enzyme (called, respectively, trypsin, chymotrypsin, and elastase). Only three of the ~250 side chains are shown in each model (ball & stick models in the center). You can see that the ribbons are not identical in shape, but are similar, reflecting their close relationship. But also notice that three side chains are identical and almost perfectly congruent among the three models. This region, called the active site, is the protein-cutting or functional part, which all members of this family have in common.

How did this family arise? Comparison of the sequence of residues in all three strongly supports the idea that they all evolved from a common ancestral enzyme, perhaps a general-purpose protein cutter. How does sequence support such a conclusion? There are simply too many likenesses among the three sequences for them to be coincidental. Let’s look at the odds that two different proteins will have the same sequence of building blocks.

[NOTE: I cannot find a way to write superscripts at this website. Scientists use superscripts to represent exponents, particularly when discussing very large numbers; for example, 10 with a superscript 3 equals 10 x 10 x 10, or 1000 (also called 10 raised to the power 3, or ten cubed). I will resort to writing exponents as "^"), so 1000 can be represented as 10^3. It's not elegant, but the best I can think of at the moment, and it is somewhat urgent, because the very next paragraph contains some huge numbers.]

At every position in a sequence, there are twenty possible residues, so the odds against two proteins having the same residue at the same position, just by chance, are 20 to 1. For two successive positions, the odds of identical residues are 20 to 1 times 20 to 1, or 400 to 1 (for each of twenty possible residues in the first position, there are twenty possibilities for the second position). For three successive positions, the odds are 8000 to 1 against identity. For any two proteins, the odds against them having, say, 100 identical residues in identical positions, the odds are 20^100, which equals 10^130 (the number one followed by 130 zeros!). You might call these astronomical odds, but they are much larger than that, because 10^130 is 10^50 times the estimated number of atoms in the universe.

It is quite safe to say, therefore, that if two protein sequences can be aligned so that, say, 25% of the sequence positions show the same building block in the same position in both, then the two proteins share common ancestry, and did not evolve independently of each other. In addition, two proteins with this amount in common (or in the parlance of the field, with this level of homology) will almost certainly fold in the same way, and thus be quite similar in overall structure.

Next question: how does a single protein produce a family? How can it evolve, not just into a slightly different protein, but into several similar proteins? If the gene for this protein is essential, how can it change without costing the organism its life? During the cell divisions that produce eggs or sperm, chromosomes from the mother and father (that is, the mother and father of the parent-to-be) pair up and exchange parts, in a process called crossover. The exchange, in theory, swaps equivalent parts (equal crossover), so that each egg or sperm contains a roughly equal mixture of DNA from each parent of the egg or sperm producer. So each sperm has equal DNA representation from the parents of a male, and each egg from the parents of a female. This means that, when an egg and a sperm produce a foetus, all four grandparents of the foetus are represented equally.

Thomas Hunt Morgan's illustration of crossing over (1916).

Like all molecular processes, crossover is not always perfect, and rarely, the exchange is unequal. For a serine protease gene, equal crossover simply swaps the gene between parental chromosomes. But unequal crossover might result in both genes going to the same chromosome, while the other chromosome receives neither. The sperm or egg that gets neither might not survive, while the recipient of both now has two idential genes. One of these genes can evolve (most likely, to become disfunctional, but rarely, to take on a new function), while the other carries out its usual function. This process is called gene duplication, and is thought to be a major driver of evolution. Once again, evolution appears to be mediated by rare errors.

Discovering evolutionary relationships

It might surprise you to realize that all of Earth’s organisms have so many genes in common. But after all, they have many functions in common. The vast majority of organisms use oxygen to break down their nutrients, they use sugars and fats as their main fuels, they digest proteins to obtain amino acids for making proteins. If, as scientists believe, we all descend from a common ancestor, each of our proteins descend as well from common ancestral proteins, or more accurately, our genes descend from common ancestral genes. As mentioned earlier, often only a small percentage of the residues in a protein are essential to function. Most of the others can vary by mutation without much effect on function. This means that when groups separate into species, mutations accumulate independently in each species, and proteins that do the same job in both species become more and more different at the sequence positions that don’t matter much. So the number of sequence differences reflect the length of time since the two species began evolving independently.

Proteins that do the same job in different species are called homologous proteins. Comparisons of the sequences of homologous proteins in a large number of species can tell us which species are closely related to each other, and which are more distantly related.

So what we need in order to find evolutionary relationships is a protein found in a very large number of organisms. Practically any oxygen-using organism on the earth has a little protein called cytochrome c, which is a key protein in using oxygen to break down nutrients. In the figure below, five models of cytochrome c  are superimposed, and areas of ribbon are colored according to how much they differ in the identity of building blocks in each colored area. Blue areas are identical in all five models, yellow areas exhibit the most variation among the models, and green areas are intermediate. Organisms that are the most similar in their cytochrome c sequences are the most closely related. For example, cytochromes in humans and chimpanzees are identical, those in humans and pigs are different in several areas, and those in humans and whales are different in even more areas.

Five superimposed models of the small protein cytochrome c, each from a different organism. Colors are as described in the protein-family image above.
To construct models like this, scientists have compared the sequences of cytochrome c molecules from hundreds of organisms. From this data, they construct a tree, called a phylogenetic tree, that displays the evolutionary relationships among many organisms.

Phylogenetic tree, constructed from the sequences of cytochrome c molecules from many organisms (“bottom” of the tree at left). Existing species are at the tips of branches, while the pioneer ancestral cytochrome c is at the far left, the “root” of the tree. From Mulligan, P.K. (2008) Proteins, evolution of, in AccessScience, ©McGraw-Hill Companies.

Each twig tip of the tree represents an existing species, the same species from which all the samples of homologous cytochrome c were obtained. Each branch point represents the common ancestral cytochrome c for the organisms beyond the branch. The sequences at all branches must be inferred from the living ones by the simplest assumption—that that ancestral protein changed to the living ones by the minimum possible number of mutations—an assumption that is probably a little bit wrong, but likely to be uniformly wrong in all cases. Numbers along branches give the average number of sequence differences between branch points.

Applications to medicine

Comparisons of genes can reveal much more than phylogenetic relationships. Genes certainly play roles in disease, causing some diseases (sickle-cell anemia), predisposing carriers to some (hypercholesterimia, cancer), and even preventing some (such as delaying AIDS in HIV-infected people). The same kinds of comparisons used to establish evolutionary kinship can also find associations between genes and disease. An association between a gene and a disease does not prove that the gene causes the condition, but it suggests further study to establish the underlying cause of the association.