IV. Genetics in the 21st century
The main change in the 21st century was the invention of new technology which now allowed for massive scale measurements. New DNA sequencing technology allows researchers to take a DNA molecule and obtain a sequence of letters in a computer representing the nucleotide sequence. Micro arrays take the contents of a cell’s cytoplasm and reveal which RNA molecules are present or absent. All 22,000 genes can be read in just 1 chip. This saves researchers a lot of time and money, and thus allows for massive data collection, on a scale which was unheard of in the 20th century.
The goal now is to analyze this data and understand precisely how everything works within a cell. Obviously, since there is so much data, computer science plays a major role. The following are a few of the main areas of research and technologies used in genomics now.
A. Sequencing and Fragment Assembly
DNA sequencing is one of the most useful technologies developed. The problem is that we can’t just get the entire DNA sequence out of a machine. Actually, we can only get about 700 letters from the left or right (or both) end of any fragment. Thus a technique known as shotgun is used.
1. Obtaining data
First, researchers take many copies of a strand of DNA which they wish to sequence, and shake them until they break into smaller fragments. This is similar to cutting the strands at random. Each fragment is then passed to a machine, and a pair of reads, one from each end, is obtained with only about 1-2% errors. The approximate distances between these reads are also known, to within about 20% accuracy.
Figure 9. Process of sequencing of one fragment of the DNA strand.
As a side note, 10-15 years ago this process cost about $10/letter. Now it only takes about $10-30 million to sequence a mammal’s genome.
Generally, researchers do this process enough to have about 7-10 reads per each region of DNA. It is a rule of thumb; one can do the math and see that there will be few gaps in the resulting sequence once all the reads are combined, but at the same time it will not be too wasteful. There will be some bias for which parts of the genome will be covered more than others, but generally, this bias is not large enough to worry about.