Wednesday, May 6, 2015

Pattern and process: computation and biology

It is obvious that there is a big cultural difference between biologists and computationalists, irrespective of whether we think its a good idea or not. This follows simply from the nature of the activities in the two professions — the activities are different and therefore different personalities are attracted to those professions.

Some of these differences are well known. For example, computations require algorithmic repeatability, along with proof that the algorithms achieve the explicitly stated goal. This means that computationalists have to be pedants in order to succeed. On the other hand, no-one can be pedantic and succeed in biology. Biodiversity is a concept that makes it clear that there are no rules to biological phenomena — any generalization that you can think of will turn out to have numerous exceptions. In the biological sciences we do not look for universal "laws" (as in the physical sciences), because there are none; and if you can't handle that fact then you should not try to become a biologist.

This leads to a further difference between the two professions that I think is sometimes poorly appreciated. In general, computationalists focus on patterns, whereas biologists focus on processes. Many processes can produce the same patterns, and therefore the same computations can be used to detect those patterns; and this is of interest to people who are developing algorithms. On the other hand, in biology processes can produce many different patterns, so that patterns are often unpredictable. Biologists are aware that patterns and processes can be poorly connected, and the biological interest is primarily on understanding the processes, because these are frequently more generalizable than are the patterns.

As a simple example of this dichotomy, consider the following diagram (from Loren H. Rieseberg and Richard D. Noyes. 1998. Genetic map-based studies of reticulate evolution in plants. Trends in Plant Science 3: 254-259). It shows the eight haploid chromosomes of a particular plant species.

Perusal of the figure will lead you to identify the pattern, and this is straightforward to detect computationally. Each chromosomal segment is triplicated, but the triplicates are arranged arbitrarily and are sometimes segmented.

On its own this is of little biological interest. The interest lies in the processes that led to the pattern. These processes could produce an infinite number of similar patterns, and so predicting the exact pattern in this species is impossible. We use abduction to proceed from the pattern to the processes (see What we know, what we know we can know, and what we know we cannot know).

We appear to be looking at a case of allopolyploidy (the nuclear genome is hexaploid) followed by recombination. Neither of these processes necessarily produces patterns that can be predicted in detail.

So, the computation focuses on the pattern and the biology on the process. Sometimes biologists forget this, and naively interpret patterns as inevitably implying a particular process. And sometimes computationalists naively expect patterns to be predictable when they are not.

No comments:

Post a Comment