Monday, February 25, 2013

One year of network blogging


Today is the first anniversary of starting this blog, and this is post number 120. So, a big thankyou to all of our visitors over the past year. We hope that the next year will be as productive as this past one has been.

We have summarized here some of the accumulated data, in order to document at least some of the productivity.

As of this morning, there have been 29,316 pageviews, for a median of 70 per day, but with a range of 3-667 pageviews. The daily pattern for the year is shown in the first graph.

Line graph of pageviews through time, up to today.
The largest value (Day 224) is off the graph.

The erratic nature of the daily variation is apparently all too typical of blogs, and there appears to be no good explanation for it. So, we might take this as a good example of the stochastic nature of the web. Nervertheless, there are general patterns detectable. For example, the steady rise from one third of the way through the year is very gratifying, although the slight dip right at the end is less so. The recent mean pageview data are:
October – November
December
Christmas – New Year
January – mid February
late February
90
130
90
130
90

Some of the sharp peaks in the graph were due to various identifiable events, including the email announcing the existence of the blog, the addition of the blog to the Systematic Biology homepage, the mention of the blog in some posts at the Scientopia blog, and the mention of some of the posts in the monthly Carnival of Evolution blog roundup.

The biggest peak (which goes off the graph) was due to hosting an edition of the Carnival of Evolution, which generated an extra 2,000 pageviews. There were also unexpected Twitter announcements for particular posts, including the fourth Tattoo post (which got picked up when it happened to go out on April Fool's Day) and the one on Scotch Whiskies, which is apparently a topic of widespread interest.

There are also other general patterns in the data, the most obvious one being the day of the week, as shown in the second graph. The posts have usually been on Mondays and Wednesdays, and these two days have had the greatest mean number of pageviews (84 and 90, respectively), The other weekdays have had somewhat less (Tuesday 82, Thursday 75, Friday 65), and the weekend even fewer (Saturday 50, Sunday 63).

Boxplot of the daily pageviews, up to last Friday.
The largest value has been excluded.

There were also a few instances of what appear to be "rogue" visits during late December and early January. These involved an almost instantaneous addition of c.100 pageviews, without obvious explanation, which presumably came from bots examining the blog. They occurred once the blog reached 100 posts, which may not be coincidental.

The posts themselves have varied greatly in popularity, as shown in the next graph. It is actually a bit tricky to assign pageviews to particular posts, because visits to the blog's homepage are not attributed by the counter to any specific post. Since the current two posts are the ones that appear on the homepage, these posts are under-counted until they move off the homepage, (after which they can be accessed only by a direct visit to their own pages, and thus always get counted). On average, 33% of the blog's pageviews are to the homepage, rather than to a specific post page, and so there is considerable under-counting.

Scatterplot of post pageviews through time, up to today; the line is the median.
Note the log scale, and that the values are under-counted (see the text).

The fact that 33% of the blog's pageviews are to the homepage means that one-third of the visitors are reading the blog as the posts are posted, while two-thirds are visiting via web searches and external links. So, we do have a regular readership, as well as having itinerant visitors.

It is good to note that the most popular posts were scattered throughout the year. Keeping in mind the under-counting, the top collection of posts (with counted pageviews) have been:
73
42
19
49
10
58
98
26
67
17
29
2
35
Carnival of Evolution Number 52
Charles Darwin's unpublished tree sketches
Tattoo Monday IV
Evolutionary trees: old wine in new bottles?
Why do we still use trees for the dog genealogy?
Who published the first phylogenetic tree?
Faux phylogenies
Steven Jay Gould was wrong
Metaphors for evolutionary relationships
Tattoo Monday III
Network analysis of scotch whiskies
The first phylogenetic network (1755)
Tattoo Monday V
1,559
1,302
737
687
666
606
600
429
420
415
414
403
394

This blog has two possible uses: (i) providing an outlet for commentaries and ideas by professionals; and (ii) advertising phylogenetic networks to a wider audience. It has turned out that the latter posts have appeared mostly on Mondays and the former mostly on Wednesdays. Furthermore, it seems reasonable for the latter posts to have fewer pageviews, since the expected audience is much smaller (or "more select", as we prefer to see it).

There have been five main types of posts:

(i) Discussions of methodology
These are the mainstay of the blog for those who are professionally interested in phylogenetic networks. A wide range of topics have been discussed, and there is plenty more that can be said.

If anyone wants to contribute to this part of the blog, then we welcome guest bloggers. This is a good forum to try out all of your half-baked ideas, in order to get some feedback, as well as to raise issues that have not yet received any discussion in the literature. If nothing else, it is a good place to be dogmatic without interference from a referee!

As a blogger, you are very likely to get feedback from people, even if they do not leave comments on the blog itself. Professionals do not yet seem to be very used to writing blog comments, but they will send you an email.

(ii) Explanations
There are all sorts of things that seem obvious to professionals but which are obscure to non-experts. These posts are designed to redress this situation, so that there is somewhere on the web for people to go when they want to find out. They seem to have been rather popular posts.

(iii) Data analyses
The EDA analyses are intended to illustrate the usefulness of networks as data summaries (as opposed to their use for strictly evolutionary analyses). In particular, choosing datasets outside science advertizes the potential uses of scientific data analysis to a wider public. Networks provide a valuable way of visualizing a table of numbers -- so, any time you see such a table you should be tempted to find out whether a network will help people to picture what it says. Most of the analyses have proved quite popular in terms of pageviews, but there has been little feedback about whether the public understands any of it.

(iv) Historical commentaries
These have usually been among the most popular posts with visitors. They simply involve bits of information that have accumulated through time, and the blog seems to be a good place to put them. They often involve phylogenetic trees, rather than networks, but that is only because trees have been used more often and thus have more history. Mind you, you have to have a good title in order to attract the public's attention!

(v) Miscellaneous
These are uncategorizable posts, which just consist of things that relate in some way to phylogenetic analysis, however peripherally. There are almost no other phylogenetics blogs on the web, and so there is no other obvious outlet for this information. The most popular of these posts have been the ones compiling the various pictures of phylogenetic tattoos that are lying around the web -- these are the most common Google search hits to the blog, along with the first compilation of Darwin's unpublished tree sketches.

Along with these posts, we have also started compiling a list of datasets that will be useful for evaluating network algorithms. Such datasets, where biologists seem to have an independently validated idea about the phylogenetic pattern, are hard to come by, and so it is worthwhile to make them available at a centralized location. A blog page is a good as anywhere else for this purpose, and the number of visits to this page is quite steady. Contributions of datasets are always welcome.

Finally, the audience for the blog has been, not unexpectedly, firmly in the USA. Based on the number of  pageviews, the data are:
United States
United Kingdom
Germany
Russia
Canada
France
Australia
New Zealand
Netherlands
Sweden
37.4%
6.6%
5.3%
4.7%
4.0%
2.7%
2.3%
1.7%
1.6%
1.5%
You will note that this list is dominated by English-speaking countries. The blog does have a link to Google Translate to help other people, but it is clear that the audience is made up almost entirely of those people who are comfortable with English (or Australian, any any rate).

No comments:

Post a Comment

Post a Comment