Wednesday, March 27, 2013

The Music Genome Project is no such thing

The Music Genome Project is a database in which 1 million pieces of music (currently) have been coded for 450 distinct musical characteristics. The main use of the database at the moment is to provide the data from which predictions can be made about which other pieces of music might appeal to listeners of any nominated musical set; this is implemented in the Pandora Radio product. This seems like a valuable idea.

However, the use of the word "genome" is an analogy, in which the set of musical characteristics is seen as creating a sort of genetic fingerprint for a song. According to one of the originators, Nolan Gasser:
The basic idea ... was to see if we could approach music from almost a scientific perspective; that's why it's called the Music Genome Project, named not accidentally after the Human Genome Project.
     I've always taken that metaphor very seriously: biologists have come to understand the human species by identifying all the individual genes in our genome; it's then how each individual gene is manifest or expressed that makes us who we are as individuals — as well as defines how we're related to others: most closely to those in our family, but also indirectly to people who share our same physical attributes or capabilities in sports, and so forth.
     That orientation was paramount to my thinking in designing the Music Genome Project.
There seems to be a major misunderstanding here, since the mere idea of atomizing something does not make the atoms genes. After all, the idea behind the Project is basically one of taking music apart and evaluating it by its acoustic elements.

The first problem is that the study of musical attributes is clearly a study of phenotype not genotype, as Gasser alludes in the quote above — there are no hereditary units in music. Unfortunately, phenotype and genotype are frequently confused in the social sciences, with serious consequences when the wrong analogy is used (see the blog post False analogies between anthropology and biology). As noted by LessWrong user jmmcd:
I think the Music Genome project is misleadingly-named. A genome is generative: there is a mapping from a genome to an organism. There is no reverse mapping. In the case of music, there is a reverse mapping from a piece of music to these 400 odd features, but there's no forward mapping ... Knowledge of a phenotype is not constructive, because there are many ways of constructing that phenotype; a genotype is unique, and is thus constructive.
Equally importantly for the Music Genome Project, the musical attributes themselves cannot easily be related to genes as a metaphor — they are simply observed features of the music. The attributes cover musical ideas such as genre, type of instruments, type of vocals, tempo, etc. Most of these attributes are objective and observable (e.g. vocal duets, acoustic guitar solo, percussion, triple meter style, etc), although there are some that are more nuanced (e.g. driving shuffle feel, wildly complex rhythm, epic buildup / breakdown, etc) and thus involve expert subjective judgment. The attributes are coded on a 10-point scale for the "amount" of each attribute.

Given the quantitative nature of the attributes, the only possible analogy with genetics is that of gene expression, not the genome itself (as Gasser also alludes in the quote above). This is a very different metaphor, at least to a biologist. The power of a metaphor is that if it is a good one then it can give you insights that you might not otherwise have; the danger is that a false metaphor will probably lead you up the garden path. In this case, the genome analogy does seem to lead people astray, because they think that Pandora is picking "related" music in a genealogical sense (a "family resemblance") when it is doing no such thing. After all, trying to construct a phylogeny from gene expression data is not something that biologists have attempted successfully.

Thus, if the Music Genome Project did live up to its name then it would be a very valuable thing for musical anthropologists, because then it would be possible to reconstruct a phylogeny of music. Indeed, such a thing has been proposed for popular music: The Music Phylogeny Project. Furthermore, such phylogenies have already been constructed: A Phylogenetic Tree of Musical Style. In the latter case, the author notes: "Needless to say, the tree is not automatically produced by the raw data itself, but by my own interpretation of the data", which gives you some idea of the technical problems involved.

Finally, I will note that what I have said above applies to the other projects based on a supposed analogy with the Human Genome Project. These include the Book Genome Project and the Game Genome Project. Indeed, the blurb for the Book Genome Project makes it sound even more wildly inappropriate:
The genomic analogy is imperfect but useful nevertheless: we defined the three elements of Language, Story, and Character as the literary equivalent of DNA and RNA classifications. Each gene category contains its own subset of measurements specific to its branch of the book genome structure ... Each individual book produces 32,162 genomic measurements.
As noted by commentator CypherGames below, these projects would all be more accurately called Phenome Projects.


  1. "...there are no hereditary units in music."

    Sure there are -- phrases, riffs, etc. But these would be better termed "memes" than "genes".

    1. MIke, These would indeed be called "memes" by social scientists. My argument is that there are important differences between memes and genes (if memes do exist, which I personally doubt), and at best memes refer to phenotype not genotype. David

    2. "Meme" refers to the idea itself, not the outward expression of the idea, so it is meant to be analogous. The "meme" itself could be understood as the neural pattern that stores the idea, and any associated expression of the idea would be analogous to a phenotype. Out understanding of these neural patterns is dim, but so was our understanding of molecular patterns when "gene" was coined.

    3. Although I suppose my initial statement was incorrect, in that light -- the phrases and riffs themselves, as played, are not the memes, but the neural patterns that store them in aural memory are.

  2. Analogies. Inspiration. This is the problem with scientists. Solve real problems.

    1. Misunderstanding. Trolling. This is the problem with Anonymous. Get a job.

  3. I wonder if it would be more accurate to call it the Music Phenome Project?

  4. I think the metaphor is fine. What might be throwing you off is that the enormous project to map the human genome is astronomically bigger than simply identifying 400 song characteristics, so you are assuming that they are ignorantly conflating genome with phenome. In fact, they are working on both: (1) Mapping the genome and (2) using that map to describe every phenome. The HGP only does (1).

    The Music Genome "Project" is actually two projects: One is mapping the "music genome", and the other is using their music genome to describe every individual song on Earth.

    In short, the Music Genome Project is a project to actually *use* music's metaphorical genome, not just to map it, so the name is perfectly fine. The most accurate name would be to call it the "Music Genome and Phenome Project", but that would be pedantic and clumsy.

  5. So basically all this article does is point out the quite obvious fact that the MGP has very little connection to genes and that the naming is rather inaccurate.
    It is merely a publicity decision since it sounds catchy and science-related leading people use the product.

  6. How was the Music Genome Project created?

    1. Wikipedia has some good information on that, as does this Ars Technica article:

  7. You are pretty serious there. Nice writing anyway...

  8. Thank you. I do find it important to make such fine distinctions in word usage, especially in the realm of science. However, I agree with Mike Keesey above. "there are no hereditary units in music" is a rather bold and almost certainly unresearched statement. Here's another important distinction: Unquantified is not the same as unquantifiable.

  9. If a phenotype is nothing more than a visible genotype then Pandora is on the correct track in their approach to developing a musical genome.
    Music is an art and therefore enjoyed because of its aesthetics. Aesthetics manifest from the observable traits of a song, "musical ideas such as... type of instruments, type of vocals, tempo, etc." Factoring any qualities beyond phenotypical traits would be pointless for an art since they won't manifest in the song until reaching the "organism" level of the emergent system, at which moment they become phenotypes.

    "After all, trying to construct a phylogeny from gene expression data is not something that biologists have attempted successfully."
    This hasn't been possible for biologist yet because it's difficult to fully answer the question "What are the traits of life?" then work back to ground level DNA blocks to figure out how to measure them (though this is an approach taken by many scientist to determine the function of a block of DNA). However, the question "What are the traits of music?" isn't as nearly far fetched due to the simplicity of music and its building blocks can be measured without breaking a sweat.

    I'm not completely on Pandora's side of the fence either. If you notice the quote above, I omitted "genre" from your list of traits because it's a human construct applied to songs rather than using the song's traits to classify its genre. If this was done correctly, building a legitimate family of music wouldn't be a problem. Pandora, sadly. falls short in their implementation of a beautiful idea by using "expert subjective [human] judgment".

    A true implementation would use a combination of machine learning, pattern recognition, and statistics to apply traits like "driving shuffle feel, wildly complex rhythm, epic buildup / breakdown, etc" to a song based on it's underlying characteristics (dynamics, melody, instrumentation, texture, etc).

    P.S. "The attributes are coded on a 10-point scale for the "amount" of each attribute." This is major flaw in their system of measuring, but not the concept itself.