Figure 13. A rough diagram of the structure of a gene. Genes are composed of exons and introns, where introns are regions that get spliced out of mRNA before translation occurs.
The purpose of introns remains unknown, but these regions are evolutionarily favored, so they must be important.
The problem is that the sequences which define the gene boundaries (or the boundaries between the introns and the exons) are not unique, and can occur in many different places in the genome. For this reason, Hidden Markov Models can be used to find the true gene boundaries among many similar sequence signatures. This will be discussed later in the course.
As a side note, on the UCSC website there is a way to browse different genomes and see the genes predicted by various methods that were developed over the years. Usually, these methods were developed by computer scientists, and later used by biologists during experiments, to focus on the predicted gene regions.
1. Evolution at the DNA level
At the DNA level, as mentioned above, DNA occurs through a series of random changes. Here, we’ll just consider individual nucleotide mutations. Those that occur in the “junk” areas of the organism’s genome will not affect the individual, and thus it will be as fit as the previous generation. However, those mutations that occur in important regions will be rarely selected, and most often will be harmful to the individual and thus will not remain in the gene pool.
Figure 14. Diagram of evolution at the DNA level. The colored blocks represent important regions of the DNA (perhaps genes), while the red squiggles represent places of mutations.
Thus the level of conservation of a region is a good clue to its importance. For example, exons (the regions of genes which don’t get spliced out before translation, and thus directly code for proteins) are very highly conserved throughout evolution. Thus it is much easier to predict the location of the genes given the genomes of many organisms, by finding highly conserved regions, than to find genes using just the sequence of one organism, by looking for sequence boundaries.