Monday, July 23, 2018

Sequence alignment is still an open computational problem


I recently submitted an invited manuscript about multiple sequence alignment to a bioinformatics journal, but it did not fare well with the reviewers (ominously, there were more than the usual two, and it took a couple of months to get the reviews). The bioinformatics referees simply rejected the notion that a multiple alignment is an object in its own right, which is the basic premise of the manuscript.

To explain this: if we think of the normal tabular arrangement of a multiple sequence alignment, then the historical relationships among the rows (the taxa) are drawn as a phylogeny, while the historical relationships among the columns represent the homologies among the characters. There is no necessary primary importance of the phylogeny relationships over the homology relationships. However, phylogenies are much more prominent in the literature; and, indeed, sequence alignment is often seen as nothing more than a pesky step on the way to getting a phylogeny.

However, if we accept this notion, that homology relationships are both important and interesting in their own right, then multiple sequence alignment is certainly still an open computational problem, because most automated sequence alignments currently do not represent homology relationships. Instead, they represent sequence similarity of various sorts, and thus they only represent homology to the extent that similarity reflects history. In fact, similarity = homology + analogy, and the latter is not trivial.

I have previously written about the topic of alignment-as-homology for the biological audience:
  • Morrison DA (2015) Is multiple sequence alignment an art or a science? Systematic Botany 40: 14-26.
  • Morrison DA, Morgan MJ, Kelchner SA (2015) Molecular homology and multiple sequence alignment: an analysis of concepts and practice. Australian Systematic Botany 28: 46-62.
This new manuscript is intended to be the equivalent for the bioinformatics audience, explaining why homology ≠ similarity, and therefore why the current alignment algorithms are inadequate.

Rather than let it languish, and since it is likely to be the last single-author paper that I ever write, I tried to add it to the bioRxiv repository, for everyone to read. Sadly, their reviewers decided that it is insufficiently original, but is merely a summary of existing information. So, I guess that they are not impressed by the novel ideas, either.

I also tried the arXiv, which may seem to be more appropriate, given the audience, but they no longer recognize my user account, which means that the manuscripts I have there now exist in limbo. The world is apparently against my manuscript!

[ Note: This issue has now been resolved, and the manuscript can be accessed as arXiv:1808.07717 ]

So, I am linking the paper here, instead:
Please have a look; and if you think it is worth it, then please spread the word. Moreover, if you are computationally inclined, then feel free to be inspired to tackle the problem described therein.

PS. I also once wrote a brief blog post about this:

No comments:

Post a Comment