?

Log in

No account? Create an account
Semiformalishmaybe

Return of the Provolone Ranger

It was a cheesy western story, and the Provolone Ranger had just ridden into town...

I jumped through all the hoops to get my ratings data back out of Amarok, and wrote some spiffy tools that others can use to do the same. Notes on that:The problem:Amarok 2.x uses MySQL-embedded to store its internal database, and it is not trivial to connect to that programmatically outside of the codebase that made it. In theory, a MySQL instance should have a tool letting people run the interactive query tool on a MySQLe directory, and in theory somebody should've written Perl (or python or whatever) bindings to allow easy connections to such a thing. There may be some reluctance to do so because the MySQLe running inside another app could easily stomp over other instances, I imagine, because the main MySQL dæmon is not actually handling the data store. MySQLe is probably a really bad idea as-implemented. Anyhow,

The solution is to automate hooking up a proper mysqld to the data directory (in a way that doesn't interfere with *the* proper mysqld if there is one running) and allow me to look at the schema and get data out.

Tarball: amarok_analysis.tar.bz2

Components:

  • amaroksql_server - Starts a user instance of mysqld running on port 9999, and with a custom local socket attached to the amarok MySQLe directory. Might need the path to mysqld tweaked to wherever your distro keeps it
  • amaroksql_client - Starts the MySQL interactive query tool attached to that local socket. Useful for looking at the schema and figuring out what you want to dump. Probably won't need any tweaking.
  • amaroksql_dump - Does a single query to grab whatever fields you like into a single CSV. Reasonably clever in its structure so users won't need to actually write SQL. Some of this code is pretty cute.
  • amarok_analysis - Loads that CSV file into a PostgreSQL database (which it creates and makes a table for)
With all that loaded, I can do nice queries like:
  • Get a list of albums and how many tracks they have: SELECT music.album, COUNT(music.title) FROM music GROUP BY music.album
  • Get a list of albums and how many tracks they have with a rating over 8 (amarok uses a scale of 1-10): SELECT music.album, COUNT(music.title) FROM music WHERE RATING > 8 GROUP BY music.album ORDER BY COUNT(music.title);
  • The same, but constrained to those with 3+ such tracks: SELECT * FROM (SELECT music.artist, music.album, COUNT(music.title) FROM music WHERE RATING > 8 GROUP BY music.album,music.artist) AS foo WHERE foo.count > 3 ORDER BY foo.count desc
  • Get a list of tracks where I forgot to rate them: SELECT title FROM music WHERE rating = 0;

and some thoughts about rankings of music and external tools to help people think:

In providing the rankings, I went more on how much I'd like to hear a song played at a random time than how much I actually like a track - there are a few songs that I like that I almost never want to hear randomly, and several gradations. I also only included songs that I am already willing to hear randomly - I have other folders for music that I never want to hear randomly for one reason or another.

I was a bit surprised in places to find that I either appeared to really like a musician that I didn't normally count as one of my favourites, or vice versa. Although there's no real way to see genre or year for all this stuff (my normal way of managing ID3/OGG tags strips out all that info), I learned a bit during the tagging process that I have tendencies to like some kinds of music much more strongly than I thought. I suspect that this is not particular to me - the process of putting parts of ourself, through use of tools like this, under examination can reveal either that quantification is itself a problem (often true), that we have selection bias, that our mapping of meaning to quantifiable meaning is messy (I found myself wishing for more than a single scale of rating, for example), or things that actually seem to be true and are legitimately things we haven't learned about ourself through introspection (even introspection as shallow as "what kind of music do I like?"). I'm not sure if this is an argument for transhumanism (if so, it is a very careful one given all the opportunities for fallacy noted above), but it might be - person plus laptop (or paper) is potentially improved in self-discovery over person with no tools. Some of this might be because we can correct for "I never thought of that" moments, and some of it might be forcing us to face conclusions that our brain would normally nudge us away from (e.g. liking embarassing types of music like party music or Hanson).

A few specific notes as examples of things I've gotten from doing a few queries:Favourite albums (based on count with score over 8):

  • Johan Soderqvist - Let the Right One In soundtrack - 14 awesome tracks
  • Fiona Apple - Extraordinary Machine - 8 awesome tracks
  • Fiona Apple - When the Pawn Hits the Conflict - 8 awesome tracks
  • Firewater - Golden Hour - 8 awesome tracks
  • Tosca Tango Orchestra - Waking Life soundtrack - 6 awesome tracks
  • No Use for a Name - Keep Them Confused - 5 awesome tracks
  • Toad the Wet Sprocket - Fear - 5 awesome tracks
  • Oingo Boingo - Dead Man's Party- 5 awesome tracks
  • Firewater - Man on the Burning Tightrope - 5 awesome tracks
  • Händel - Music for the Royal Fireworks - 5 awesome tracks
  • Pet Shop Boys - Fundamental - 4 awesome tracks
  • Watershed - Fifth of July - 4 awesome tracks
  • Brave Combo - Polkas for a Gloomy World - 4 awesome tracks
Favourite artists (same basis):
  • Fiona Apple - 19 awesome tracks
  • Firewater - 17
  • UNKNOWN - 16 (I am slightly surprised this category is not higher)
  • Oingo Boingo - 15
  • Johan Sonderqvist - 14
  • No Use For a Name - 12
  • VNV Nation - 11 (they don't show up with good albums because they release a lot of small albums)
  • Thou Shalt Not - 7
  • Radiohead - 7
  • Death Cab for Cutie - 7
  • Avril Lavigne - 7
  • Flogging Molly - 7
  • Chumbawamba - 6
  • Bad Religion - 6
  • TMBG - 6
  • Tosca Tango Ochestra - 6
It would be tempting to refine the queries to go by average rating of albums, but I tend to banish individual songs into their mirrored "do not play randomly" directories if I don't like them and that'd skew things oddly. This kind of query does not reveal anything about albums that are overall above average but lack any songs that truly stand out - I have a fair selection of Flamenco music that's like that.

It would be further interesting to try to associate enough metadata about the specific contents of songs, e.g. what musical themes they have, to see what queries based on that reveal (quantifying that would be tricky). I kind of wish I had date and genre information in the ID3/Ogg tags now, because while I can look at particular musicians and manually add it in to plot their rise and fall in awesomeness over time (TMBG comes most notably to mind - their first few albums were raw, they were pretty good for awhile, and their recent stuff is mostly pretty lousy), that takes a lot of manual munging.

I would encourage other people to play with these tools (if they use Amarok and are willing to rate everything) and see what they can learn about their musical tastes.

People on LJ and in my friendslist will get a link to the actual data files and more full results of analysis.

Tags:

Comments