Wednesday, March 7, 2012


Last week my attention was drawn to the forthcoming conference RECOMB-AB 2012 : First RECOMB Satellite Conference on Open Problems in Algorithmic Biology:
“RECOMB-AB brings together leading researchers in the mathematical, computational, and life sciences to discuss interesting, challenging, and well-formulated open problems in algorithmic biology.”

As someone working in the field of “algorithmic biology” (which, I guess, could be defined as the application of techniques from computer science, discrete mathematics, combinatorial optimization and operations research to computational biology problems) I was, predictably, immediately enthusiastic about the conference.  

However, what really caught my attention was the following paragraph:

“The discussion panels at RECOMB-AB will also address the worrisome proliferation of ill-formulated computational problems in bioinformatics. While some biological problems can be translated into well-formulated computational problems, others defy all attempts to bridge biology and computing. This may result in computational biology papers that lack a formulation of a computational problem they are trying to solve. While some such papers may represent valuable biological contributions (despite lacking a well-defined computational problem), others may represent computational 'pseudoscience.' RECOMB-AB will address the difficult question of how to evaluate computational papers that lack a computational problem formulation.”

Calls-for-participation rarely strike such a negative tone. However, in this case I think the conference organizers have highlighted an extremely important point. Problems arising in computational biology are inherently complex and this entails a bewildering number of parameters and degrees of freedom in the underlying models. Furthermore, it is commonplace for computational biology articles to utilize a large number of intermediate algorithms and software packages to perform auxiliary processing, and this further compounds the number of unknowns (and the inaccuracies) in the system.

All this is, to a certain extent, inevitable. However, this complexity sometimes seems to have become an end in itself. This would be harmless except for the fact that scientists subsequently attempt to draw biological conclusions from this mass of data. Rarely is the question asked: is there actually any “biological signal” left amongst all those numbers? Would we have obtained similar results if we had just fed random noise into the system?

The fact that these questions are not posed, is directly linked to the lack of a clear and explicitly articulated optimization criterion.  In other words: just what are we trying to optimize exactly? What makes one solution “better” than another? What, at the end of the day, is the question that we are trying to answer? This is exactly what RECOMB-AB is getting at with the sentence, “This may result in computational biology papers that lack a formulation of a computational problem they are trying to solve”. The articulation might be slightly formal, but the point they raise is nevertheless fundamental.

It remains to be seen what kind of a role phylogenetic networks will play at RECOMB-AB, if any. For sure, the field of phylogenetic networks continues to generate a vast number of fascinating open algorithmic problems. However, are the underlying biological models precise enough to allow us to say that we are actually producing biologically-meaningful output? Overall, I think the answer is still no. However, I think that there is reason for optimism. The field is young and evolving and it is likely that both biologists and algorithmic scientists will have a significant role in shaping its future. Hopefully this interplay will allow us to move forward on the biological front without losing sight of the need for explicit optimization criteria.

No comments:

Post a Comment