The Genealogy of Music

Privacy Note:  SameGrain is a privacy-focused social media platform where anonymity is supported and promoted. The data presented in this blog is anonymized, having any attribution to individual users removed.

I recently came across an interesting infographic by Reebee Garofalo titled “The Genealogy of Pop/Rock Music” that shows the origins of the multitude of popular music subgenres born between 1955 and 1978. You can order the book Visual Explanations by Edward Tufte, which published the graphic, here or see it in digital form here. If you zoom in, you can see the birth of surf music in the early 1960’s (and its death about 5 years later) and the splitting of pop folk into folk rock, country rock, and soft rock.

It’s a nice graphic that gives context and history to the similarities between songs and musical genres. However, the groupings of the various genres do not appear to have been data-driven, but are rather a product of author discretion. And so I decided to mine the SameGrain data to see what could be learned about the similarities and origins of music genres without applying any preconceived thoughts or notions.


SameGrain (currently available in the Apple App Store) introduces users to other users that are like them in millions of potential ways, including their music tastes. We asked our users to identify their favorite genres of music from the following 26 choices: blues, classical, classic rock, country, dance, disco, electric dance music, electronic, folk, funk, gospel, hard rock, hip-hop, indie, jazz, Latin, metal opera, pop rock, rap, reggae, relaxation, soul, talk, urban, and world music. The completeness of this list is beyond the scope of this blog post.

Users’ answers to this question provide thousands of anonymized data points with which I can do a cluster analysis; in other words, using the data we can group genres that are similar to each other. I make the assumption that if a user identifies two genres as favorites, the two genres are similar in some way – be it instrumentally, tonally, whatever. This of course is not always a perfect assumption. I like both classical and rock music, but you’d have to go several hundred years into the past to find their common musical ancestor. But aggregated over thousands of responses, this assumption is – to first order – safe.

I used the Pearson product-moment correlation coefficient to measure the similarity between each genre, and then hierarchically clustered the genres into groups by continuously merging the two most similar groups. The colorful figure with the circles is a graphical representation of that analysis. The circles are labeled by the musical genre they represent, and the circles’ positions in the plot identify the groupings – genres closer together are correlated with each other. For example, users who chose hard rock were more likely to also choose classic rock than those who didn’t. Therefore hard rock and classic rock are correlated in the data. Other correlated pairs include reggae and Latin, metal and hard rock, and funk and disco. Rap, pop rock, and hip-hop form a correlated trio. Genres like country and gospel are far away from other genres; while they are mildly correlated with each other, there is very little correlation between country and any other genre. In other words, a user’s preference for country music is not predictive of whether or not they like jazz or dance or any other genre other than gospel.

Genre Popularity

The size of the circles indicates the genre’s overall popularity. Pop rock is the most popular, with classic rock, indie (somewhat ironically), hip-hop, and country following close behind. Recall that country music is not well correlated with other genres, and yet it is fairly popular. That means it is popular with people of all music tastes; whether they like hip-hop, classic rock, indie, or the blues, they’re equally as likely to like country.

Popularity by Age

Finally, the color of the circle is indicative of the popularity of the genre with age group. If a genre is more popular with users under 30, it is more green in color, and if the genre is more popular with users over 30, it is blue. Rap is the genre that stands out most distinctly as being more popular to younger people, but the color stretch can be deceiving (I didn’t provide a color scale for this plot, sorry). In fact, a “young” person is only 5 times more likely to choose rap as a favorite genre than an “old” person (as a 31-year-old, I take a small offense to my own terminology of anyone over 30 being “old”). And while classical is so often identified as a genre for the “old,” I find that it is only marginally more popular to older generations. That said, the coloring of the dots does provide some historical information of the music genres. If you assume that the age of the generation that prefers the music is related to the age of the genre itself (not a terrible assumption), then rap, hip-hop, and indie are the newest genres of music. And since humans have been speaking for least a couple hundred thousand years, perhaps it’s no surprise that talk is the “oldest” genre.


While this style of plot is aesthetically pleasing and allows for a high density of information (circles can have position, size, and color), if you were only interested in the clustering, it’s better to represent this in a dendrogram. Before joining SameGrain, I was a researcher in the field of astrophysics, where dendrograms have become a popular way to illustrate the clustering of galaxies and the fragmentation of star-forming molecular clouds. The second plot is a dendrogram of the genres, where the genres most correlated with each other are connected by the smallest number of junctions, and the horizontal length of the connecting lines is (roughly) proportional to the correlation coefficient. In this plot, it is very easy to see groups of genres, clusters of groups, and clusters of clusters.

This plot is very reminiscent in form to that of a family tree, and while this analogy works well for some genres (e.g., metal and hard rock are children of classic rock), the parent-child analogy breaks down quickly if you go very far up the tree. However, the analogy works better if we consider linked genres to be cousins to each other, much in the way modern humans are cousins – not descendants – to chimpanzees. Reggae is related to Latin, which is related to dance, which is in turn related to electric dance and electronic music.

These two plots, which combine hierarchical clustering with some basic analysis techniques, led to some interesting insights. Most notable to me was the universality of many genres. Pop rock is by far the most popular genre overall and with both age groups (as its name would imply). Pop’s closest cousins, rap and hip-hop, tend to be preferred by younger generations, but are still popular to those over 30 at a non-negligible level (perhaps owing to the fact that the first rap song, “Rapper’s Delight,” is now over 30 years old). Country is slightly preferred by younger generations relative to older ones, and enjoys modest popularity from a diverse population of music tastes.

