MusicBrainz .vs. Million Song Dataset (battle of the Open Music databases)
I’m looking to add a significant amount of Music data to the OpenRecommender project to power the new Music Recommendation services.
Since I couldn’t find one, I feel compelled to write a side-by-side comparison between MusicBrainz (the old trusty) and the Million Song Database by LabROSA @ Columbia U. (the new kid in town). The following is the breakdown, and though I’ll offer my thoughts at the end, I’ll encourage the reader decide which is the best data source.
Ready? BATTLE!
| MusicBrainz | Million Song Dataset |
|---|---|
|
Data Dump License: Features:
|
Data Dump License: Features:
|
In the end I think that MusicBrainz is a “no-brainer” in terms of being the quickest most effective way of quickly populating data for a Recommender; however, even from only scratching the surface of what’s available in the Million Song Dataset, its pretty clear that its a requirement for any Recommendation Engine that claims to be complete in the area of music recommendations, thus any final product should undertake the extra steps, effort and computing capacity required for running it.
Related articles
- Linking Open Data toolkit (lespetitescases.net)
- Listen Up, Shazam; Hundreds Of Rivals Are About To Bloom (paidcontent.org)
- Best New Mashups: Let’s Host a Party (programmableweb.com)
- Discovered a New Band? Find Out Which Songs To Check Out First With GoRankem (mashable.com)
- How to process a million songs in 20 minutes (musicmachinery.com)
- Big Radio Takes a Shot at Pandora With Clear Channel/Echo Nest Partnership (readwriteweb.com)
- Strata Week: MapReduce gets its arms around a million songs (radar.oreilly.com)

Leave a Reply
No trackbacks yet.
No post with similar tags yet.
Posts in similar categories
BC$ = Behavior, Content, Money
The goal of the BC$ project is to raise awareness and make changes with respect to the three pillars of information freedom - Behavior (pursuit of interests and passions), Content (sharing/exchanging ideas in various formats), Money (fairness and accessibility) - bringing to light the fact that:
1. We regularly hand over our browser histories, search histories and daily online activities to companies that want our money, or, to benefit from our use of their services with lucrative ad deals or sales of personal information.
2. We create and/or consume interesting content on their services, but we aren't adequately rewarded for our creative efforts or loyalty.
3. We pay money to be connected online (and possibly also over mobile), yet we lose both time and money by allowing companies to market to us with unsolicited advertisements, irrelevant product offers and unfairly structured service pricing plans.



One of the many nagging web development problems that the HTML5 working group is addressin...
CSS3 is shaping up to be quite promising.
Some of the features and shiny new to...
Since the February confirmation of the Facebook IPO, Facebook has continued to stagnate in user-base yet as an organization it holds no punches as it attempts to grow internationally, and its stock price continues to soar as Class A shares finally open up to the average person (major investment firms had first dibs at the...
The Royal Canadian Mint (RCM) has sponsored the MintChip Challenge 2012 in an effort to attract developers to the idea of developing software for the MintChip and giving away their best financial application ideas, basically, for free (on the long-shot that you are one of the few who win).
Starting April 1st, 2012, they began mailing...
Popcorn.js is an incredibly useful framework for adding timing-based events and/to Semantic metadata to rich content.
According to Mozilla: "Popcorn makes video work like the web. We create tools and programs to help developers and authors create interactive pages that supplement video and audio with rich web content, allowing your creations to live and grow online."
With...
Social media has taken over the web (for now) and the name of the game is sharing, something legislation like SOPA and PIPA just didn't seem to understand. Rather than figuring out a new economic model based on the reality of sharing on the web, that rewards this type of activity (which is essentially just...
This is a post to announce the ALPHA release of OpenRecommender, version 1.0.
Have you ever wondered if there was a better way to find information on the web? Before today, there has been lots of ways from targeted search to surfing aimlessly, or from social sharing via SNS platforms like Facebook or Google+ to required...
For a long time now, I've been a paying customer of SchedulesDirect, and by that token their parent company Zap2it (now a Tribune Media Services company).
Recently, I've started publishing my own personal Electronic Programming Guide (EPG) here on BCmoney MobileTV in an accessible format:
BC$ EPG
One of the secondary goals of my MobileTV project has...
This month I strongly considered taking a full-time position with Keane IT Services (which was recently acquired by NTT Data). Unfortunately, it turned out that role wasn't a good fit for where I'm at professionally and the responsibilities I have for my family (but it looks like another opportunity I received might be a better...
Yesterday I wrote about the Google API shutdown. It seems that I was wrong in that post about Wordreference not having an API, just a few days earlier founder Michael Kellogg announced the introduction of the brand-spanking new Wordreference API.
Like a dunce I contacted Michael by email to learn about this new revelation without double-checking...
On May 26th, Google announced the deprecation and/or shutdown of many of their most popular and widely developed against APIs, leaving many developers and even Google fanboys feeling dumbfounded, betrayed or at the very least neglected.
According to Google, the following APIs are now deprecated but have no scheduled shutdown date:
Code Search API
Diacritize API
Feedburner APIs
Finance...
Hi There, great Article by the way. I agree with your conclusions… hopefully you can do one for Book data sources in the future? I’ve been trying to find an eBook version of “Searching For Jimmy Buffett” to send my friends, because its the funniest book I have read in years!!! Looking forward to more info on eBook publishers
You’re right. MusicBrainz is one of the best data sources going today. I couldn’t get by without it. If only Picard Music Tagger could run on my Amazon Kindle! Maybe another product can do it someday…