First, we have to consider what is meant by the loaded term complexity. I would like to consider a definition of biological complexity related to the Kolmogorov complexity of computer science, which, for a given string of data is the shortest binary computer program for computing the string. What is the equivalent measure for biology?
More specifically, let’s consider the case of a biological circuit. These circuits regulate the state of a cell in response to a given set of input signals. The parts of the circuit are a wide variety of specific proteins, whose synthesis and function are modified by many different mechanisms. One distinction of biological relative to engineered circuits is the fact that every protein is different so that there are no standardized parts. Proteins interact to modify one another’s function and eventually produce an output, i.e., change the state of the cell by changing its transcriptional program and thereby changing the concentrations of its constitutive proteins.
For a given circuit, it is still an open question as to how many genes could be legitimately considered to be involved. In yeast, genetic maps constructed with high-throughput robotics have identified literally millions of interactions; these are shown as the famous ‘hairball’ plots one frequently encounters. So, is that the answer? Do we need to know everything about everything to make sense of it all? Do we need a map whose legend states that 1 mile = 1 mile to understand the cell? If so, then what kind of understanding would that entail… a depressing prospect, being drowned in a sea of information.
So, we want and need reduced models to better understand the functions of a biological circuit. Intuitively, this is what the more classical cell biologists and biochemists have been constructing. Classical mutagenesis tends to focus on mutations to critical genes that severely hamper cell fitness. Thus, inadvertently, the community may have been studying the circuit elements containing the most information about the process. However, several questions arise. First, where do you draw the line and exclude more peripheral elements? And second, is this a legitimate endeavor?
Consider a simple example, where the dynamics of a circuit are dominated by a few interactions each of strength O(1). There is a second handful of interactions of strength O(epsilon), whose exclusion gives a negligible error of O(epsilon) << 1. However, if the number of weaker interactions is larger, as threatened by the hairball, we could get into serious trouble. If the number is N ~ 1/epsilon, we would still probably be ok in our approximation due to the unexpected property of asymptotic analysis to give fundamental insight even when it shouldn’t. However, if N >> 1/epsilon things would start looking quite terrible from the point of view of dimensional reduction. The genetic tractability of circuits and the progress made argues against the later case, but the intermediate case is not yet ruled out and relatively few cases have been studied in sufficient detail (i.e., the drunk looking for his lost keys at night under the street lamp because that is where the light is). So, we are left with the following tentative definition of information – a sum of the relevant biochemical interactions multiplied by their relative impact on the output. Obvious issues left to be resolved regard the definition of ‘relative impact’ and how to quantitatively define the information content of a circuit output.
Further understanding of information transmission in biology will require new quantitative imaging tools to better measure chemistry on the level of single cells, where it really takes place. Integration of these techniques with the rapidly increasing amounts of cell biological and genetic knowledge promises rapid advances in our understanding of information transmission and generation in cellular systems. The rapid technical advancement in biology of the past decades leads one to expect significant progress on these fundamental issues over the next decade.