Computational Methods to Reckon With: Understanding the Mechanism Behind Diabetes
The current experimental methods of genetics are highly efficient. Ten years ago, a student could get a doctoral degree after spending five years performing experiments that a robot can now finish in a few hours. There is much useful information hidden in these large data sets, but scientists need advanced statistical methods in order to extract it. It is a research field in itself to formulate genetic questions as mathematical problems based on the data. These complicated problems are, in turn, impossible to solve without specialized computer programs. The field of developing solution methods for given mathematical problems is called "scientific computing." I work on a scientific computing project concerning problems that arise in a certain field of genetics.
The function of most human genes is unknown. Many geneticists try to find connections between functions and specific genes. I analyze data from geneticists that want to find genes involved in metabolism, which is the process of energy turnover in the body and involves how energy is extracted from food and then transported, stored and used. Body weight is affected by genes involved in metabolism, and diabetes and obesity are two metabolic diseases that partly depend on genetics. By understanding metabolism in general, one can find ways of curing the diseases.
To study metabolism, geneticists have performed mating experiments with pigs. Pigs and humans are remarkably similar on a genetic level, so by studying pigs one can learn much about humans. In the experiment, domestic pigs were mated with wild boars. Domestic pigs and wild boars belong to the same species, but domestic pigs have been bred over hundreds of years to convert as much of the food they eat as possible into fat and muscle. On a genetic level, this corresponds to domestic pigs having very efficient versions of all genes involved in metabolism. Each piglet from the mating experiment has a different mix of wild boar and domestic pig genes. The experimental data for each piglet includes its weight and a list of which genes it has inherited from which parent. As with humans, the function of most pig genes is unknown. However, if all the big piglets have inherited domestic pig versions of a group of genes, while all the small piglets have inherited wild boar versions of the same genes, then it is likely that these genes affect metabolism. This intuitive reasoning has been turned into a mathematical problem by statisticians working on this project, and my task is to compute the solution.
In order to do this, I need to do something called "combinatorial optimization." A simple example is choosing two athletes from a group for a tug-of-war team. If there are five athletes, then ten different combinations are possible. One way to find the best team is to test the strength of all pairs of athletes. This takes a while, especially if there are five thousand candidates instead of five, but I am sure to find the best team since I have tested them all. Another, and quicker, way is to measure the strength of all athletes individually and then directly pick the two strongest. This works fine as long as each candidate gets along equally well with all the others. However, if the first candidate is Superman, the second is She-Hulk, the third is Popeye, the fourth is a can of spinach, and the fifth is Charlie Chaplin, then finding the best team is not as easy as picking the two strongest individuals. This example is an analogy to my research. The team of athletes represents a combination of genes, and the strength of the team corresponds to how closely the variation in size of the piglets follows the variation in which versions, wild boar or domestic pig, the piglets have of the genes in the combination. I cannot find the best combination by measuring the "strength" of each gene and putting together the strongest individual candidates because genes interact with each other. Neither would it work to test all possible combinations, because that would take too long.
So far, I have worked on speeding up the computation of the "strength" of a particular combination of genes. To explain the method we can use another example. Assume that I want to compute 2*7+3*7 and 2*8+3*8. Doing the calculations as they are written requires six mathematical operations in total. But if I use the fact that 2*7+3*7 is equal to (2+3)*7 and 2*8+3*8 equals (2+3)*8, I can get the same result by first calculating 2+3=5 and 5*7, and then reusing the partial result 2+3=5 to calculate (2+3)*8. The same two answers are found by using three operations instead of six. The point is that by using a good algorithm, that is performing well-chosen operations in a well-chosen order, I can drastically reduce the computational effort. In my research, the principle of reusing partial results has proved successful. I can compute the "strengths" of millions of gene combinations in a few days instead of in three years, as it would take with a standard algorithm.
Scientific computing is useful for finding the "grains of gold" in data sets that are too large and complicated for the human brain to handle. The valuable information is then returned to the experimentalists who design new experiments to answer follow-up questions. The data sets from these experiments then need to be analyzed, and so on. After every round a little more is understood about metabolism and hopefully, one day, the problems of diabetes, obesity and other metabolic diseases will be solved.
|Modified 15 January 2003 * Contact Us|