Monday, December 2, 2013

The bioRxiv — not just a preprint server for biology

The physical sciences have long had preprint archives, notably the arXiv (founded in 1991), which is managed by Cornell University Library. Bioinformaticians have been active users of these archives, at least partly because getting mathematical papers published can take up to 2 years (see Backlog of mathematics research journals). Bioinformatics moves faster than that. There have been more general preprint services, as well, such as Nature Precedings, which operated from 2007 to 2012.

There have recently been moves afoot to provide similar services specifically for biologists; and the beta version of the bioRxiv has now come online:
bioRxiv (pronounced "bio-archive") is a free online archive and distribution service for unpublished preprints in the life sciences. It is operated by Cold Spring Harbor Laboratory, a not-for-profit research and educational institution. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals.
Many research journals, including all Cold Spring Harbor Laboratory Press titles, EMBO Journal, Nature journals, Science, eLife, and all PLOS journals allow posting on preprint servers such as bioRxiv prior to publication. A few journals will not consider articles that have been posted to preprint servers.
Preprint policies are summarized here: List of academic journals by preprint policy.

Many people seem to see archives such as this as having their principal role in bridging the publication delay caused by the peer-review process (see The case for open preprints in biology for a summary of the argument). Indeed, much of the online discussion of preprints in biology seems to be about why biologists have not taken to preprints like ducks to water, asking the rhetorical question: "What are biologists afraid of?" This question pre-supposes that everyone should use preprints unless there is a good reason not to, rather than the more obvious assumption that no-one will use them unless there is a good reason to do so. On the whole, shortening the peer-review process by a few months (as is typical in biology) hardly seems like a sufficient incentive for mass usage of preprints.

However, there does seem to be a possible incentive beyond break-neck speed. An equally important point is that archives act as a powerful means of making unpublished work available online. Even if a particular manuscript is ultimately never published in a journal or book, it will still be available in the archive in its final draft form, since the archives are intended to be permanent repositories. That is, the archives are not only for pre-prints.

There are many reasons why some work never gets formally published, including incompleteness of the data, negative results, lack of perceived profundity, and being out of synch with current trends. If there is nothing inherently faulty about a manuscript, then there is no reason for it to remain unavailable to interested readers. We are no longer beholden to the publishers (or to the referees) for disseminating our data and/or ideas, although we may still prefer formal publication as the primary conduit.

For example, I started using the arXiv after it added a section on "Quantitative Biology" in 2003. I have several manuscripts in the ArXiv that, for one reason or another, have not (yet) made it into print:
  • Morrison DA (2005) Counting chickens before they hatch: reciprocal consistency of calibration points for estimating divergence dates. arXiv
  • Morrison DA (2005) Bayesian posterior probabilities: revisited. arXiv
  • Jenkins M, Morrison DA, Auld TD (2005) Estimating seed bank accumulation and patterns in three obligate-seeder Proteaceae species. arXiv
  • Morrison DA (2009) How and where to look for tRNAs in Metazoan mitochondrial genomes, and what you might find when you get there. arXiv
  • Kelk S, Linz S, Morrison DA (2013) Fighting network space: it is time for an SQL-type language to filter phylogenetic networks. arXiv
I do not see these manuscripts as in any way inferior to my published papers.

They have all been indexed by search engines such as Google, and they are thus available via Google Scholar (which also keeps track of citations of preprint papers), as well as via professional sites such as ResearchGate. In this sense, the data and ideas are just as "available" as they would be in any peer-reviewed publication, and potential "scholarly impact" is not compromised. Indeed, Twitter mentions of arXiv papers are recognized as being a powerful means of disseminating their content, irrespective of later publication (see How the scientific community reacts to newly submitted preprints: article downloads, twitter mentions, and citations). I even know of bioinformatics papers that were still being cited via the online pre-print (labeled as a "Technical Report") long after they finally made it into print.

So, preprint archives are a valuable tool for academics, especially when those pesky referees are not being co-operative.

PS. This is post number 200 for this blog.

