The Music Genome Project is a database in which 1 million pieces of music (currently) have been coded for 450 distinct musical characteristics. The main use of the database at the moment is to provide the data from which predictions can be made about which other pieces of music might appeal to listeners of any nominated musical set; this is implemented in the Pandora Radio product.
The use of the word "genome" is an analogy, in which the set of musical characteristics is seen as creating a sort of genetic fingerprint for a song. According to one of the originators, Nolan Gasser:
The basic idea ... was to see if we could approach music from almost a scientific perspective; that's why it's called the Music Genome Project, named not accidentally after the Human Genome Project.There seems to be a major misunderstanding here, since the mere idea of atomizing something does not make the atoms genes. After all, the idea behind the Project is simply one of taking music apart and evaluating it by its acoustic elements.
I've always taken that metaphor very seriously: biologists have come to understand the human species by identifying all the individual genes in our genome; it's then how each individual gene is manifest or expressed that makes us who we are as individuals — as well as defines how we're related to others: most closely to those in our family, but also indirectly to people who share our same physical attributes or capabilities in sports, and so forth.
That orientation was paramount to my thinking in designing the Music Genome Project.
The first problem is that the study of musical attributes is clearly a study of phenotype not genotype, as Gasser alludes in the quote above — there are no hereditary units in music. Unfortunately, phenotype and genotype, are frequently confused in the social sciences, with serious consequences when the wrong analogy is used (see the blog post False analogies between anthropology and biology). As noted by LessWrong user jmmcd:
Equally importantly for the Music Genome Project, the musical attributes themselves cannot easily be related to genes as a metaphor — they are simply observed features of the music. The attributes cover musical ideas such as genre, type of instruments, type of vocals, tempo, etc. Most of these attributes are objective and observable (e.g. vocal duets, acoustic guitar solo, percussion, triple meter style, etc), although there are some that are more nuanced (e.g. driving shuffle feel, wildly complex rhythm, epic buildup / breakdown, etc) and thus involve expert subjective judgment. The attributes are coded on a 10-point scale for the "amount" of each attribute.I think the Music Genome project is misleadingly-named. A genome is generative: there is a mapping from a genome to an organism. There is no reverse mapping. In the case of music, there is a reverse mapping from a piece of music to these 400 odd features, but there's no forward mapping ... Knowledge of a phenotype is not constructive, because there are many ways of constructing that phenotype; a genotype is unique, and is thus constructive.
Given the quantitative nature of the attributes, the only possible analogy with genetics is that of gene expression, not the genome itself (as Gasser also alludes in the quote above). This is a very different metaphor, at least to a biologist. The power of a metaphor is that if it is a good one then it can give you insights that you might not otherwise have; the danger is that a false metaphor will probably lead you up the garden path. In this case, the genome analogy does seem to lead people astray, because they think that Pandora is picking "related" music in a genealogical sense (a "family resemblance") when it is doing no such thing. After all, trying to construct a phylogeny from gene expression data is not something that biologists have attempted.
Thus, if the Music Genome Project did live up to its name then it would be a very valuable thing for musical anthropologists, because then it would be possible to reconstruct a phylogeny of music. Indeed, such a thing has been proposed for popular music: The Music Phylogeny Project. Furthermore, such phylogenies have already been constructed: A Phylogenetic Tree of Musical Style. In the latter case, the author notes: "Needless to say, the tree is not automatically produced by the raw data itself, but by my own interpretation of the data", which gives you some idea of the technical problems involved.
Finally, I will note that what I have said above applies to the other projects based on a supposed analogy with the Human Genome Project. These include the Book Genome Project and the Game Genome Project. Indeed, the blurb for the Book Genome Project makes it sound even more wildly inappropriate:
The genomic analogy is imperfect but useful nevertheless: we defined the three elements of Language, Story, and Character as the literary equivalent of DNA and RNA classifications. Each gene category contains its own subset of measurements specific to its branch of the book genome structure ... Each individual book produces 32,162 genomic measurements.