MusicBrainz .vs. Million Song Dataset (battle of the Open Music databases)

I’m looking to add a significant amount of Music data to the OpenRecommender project to power the new Music Recommendation services.
Since I couldn’t find one, I feel compelled to write a side-by-side comparison between MusicBrainz (the old trusty) and the Million Song Database by LabROSA @ Columbia U. (the new kid in town). The following is the breakdown, and though I’ll offer my thoughts at the end, I’ll encourage the reader decide which is the best data source.
Ready? BATTLE!
MusicBrainz | Million Song Dataset |
---|---|
Data Dump License: Features:
|
Data Dump License: Features:
|
In the end I think that MusicBrainz is a “no-brainer” in terms of being the quickest most effective way of quickly populating data for a Recommender; however, even from only scratching the surface of what’s available in the Million Song Dataset, its pretty clear that its a requirement for any Recommendation Engine that claims to be complete in the area of music recommendations, thus any final product should undertake the extra steps, effort and computing capacity required for running it.
Related articles
- Linking Open Data toolkit (lespetitescases.net)
- Listen Up, Shazam; Hundreds Of Rivals Are About To Bloom (paidcontent.org)
- Best New Mashups: Let’s Host a Party (programmableweb.com)
- Discovered a New Band? Find Out Which Songs To Check Out First With GoRankem (mashable.com)
- How to process a million songs in 20 minutes (musicmachinery.com)
- Big Radio Takes a Shot at Pandora With Clear Channel/Echo Nest Partnership (readwriteweb.com)
- Strata Week: MapReduce gets its arms around a million songs (radar.oreilly.com)
