One thing you can do in iTunes is to give songs a rating from 1 to 5 stars. It also has an option to play highly-rated songs more often during shuffle play, but frankly, I can’t be bothered to go through my collection rating songs by hand. And besides, the metadata that iTunes stores for each song includes things like the number of times it was played, the last time it was played, the number of times it was skipped, the last time it was skipped, and the time when it was added to the library. It seems that from this, it should be possible to figure out what I like and what I don’t like. Specifically, it should be possible to write an AppleScript script that goes through the library and computes ratings.

Play Ratio and Confidence

iTunes allows you to rate songs on a scale of 1 to 5 stars (or no rating). Assume that every song has been played at least once. If playcount + skipcount gives the number of times the song has come up, then intuitively it would seem that the rating should be proportional to the number of times I’ve let the song play all the way through instead of skipping it, or

1 + 4 × playcount/(playcount+skipcount)

The obvious problem with this is that if it’s a new song that’s only been played once (and never been skipped), then it’ll get a rating of 5, even if it’s crap. Clearly, in this case we don’t have enough history to determine whether a song is good or bad, so its rating should start out at 3 stars, or about average. As the song comes up more often, we can pay more attention to the play ratio, and allow the rating to vary between 2 and 4 stars. Finally, we can allow the play ratio to determine the rating all the way from 1 to 5 stars. In other words, to our formula above we want to add a multiplier of the form f(playcount+skipcount), where f(1) = 0 and f(x) asymptotically approaches 1 as x increases. f(x) = 1-1/x might fit the bill.

where p is the play count; s is the skip count; Q() is the quality function, which says how good a song is, based on the proportion of times I’ve allowed it to play all the way through (going from -0.5 to 0.5); C() is a confidence function, which says how confident I am in the output of Q(); and R() is the rating function: it’s just the quality Q, scaled by the confidence C, and adjusted to go from 1 to 5.

The function that most obviously needs to be tweaked is C(): we want it to converge fairly quickly: the song I’ve played most often has a play count of 28, and most songs are in the 1-10 range, so we don’t want a function that requires a song to come up 50 times before we can confidently conclude that it’s good or that it sucks.

Evolving Tastes

A further twist is that my tastes in music change over time: there were albums that I liked a lot ten years ago, but which I can’t stand to listen to anymore. Ideally, if we had a history of when each song was played and skipped, we could compute its popularity as above, but giving recent play and skip events more weight than old ones. Unfortunately, all we have are the total play and skip counts, the time of the most recent play or skip event, and the time the song was added. Still, it ought to be possible to do something with this, though I’m not certain how.

If a song hasn’t come up in 10 years, then presumably its rating should gradually drift back to 3: the more time has passed since I heard it, the more likely it is that my old opinion of it is out of date, and the rating should start afresh.

The age of a song is also a factor: I tend to listen to new albums more often than old ones, simply because they’re new. So perhaps new songs should get a ratings boost just for being new (though a song that was added a week ago and has been played once and skipped 10 times should still be considered sucky).

The time elapsed between the most recent play and skip events should also influence the rating: if I last played a song yesterday and last skipped it a year ago, that should nudge its rating up. And contrariwise, if I last played it two years ago and last skipped it last week, that should nudge its rating downward.

As I said above, I’m not sure how this should work. This is just a sketch of some ideas.

Guilt by Association

We can also introduce the idea of guilt by association: my opinion of a given song is likely to be close to my opinion of similar songs. And since there are relatively few data points for any given song, it can be useful to bring in data from other songs. For instance, if I’ve skipped every Led Zeppelin song that’s come up in the last year, then it’s likely that I’ll have a low opinion of the next Led Zeppelin song, even if there’s little history associated with it.

We can do the same thing with albums: if nine songs on an album are crap, then the tenth is probably crap as well (except, of course, that there are plenty of albums with one good song and nine crap ones). Or genres: if I’ve played a lot of synthpop in the last year, I’m likely to like the next synthpop song that comes up.

Another nice thing about this approach is that with one song, we can only see when it was last played and when it was last skipped. If we look at all of the songs in a given genre, or by a prolific band like New Order or TMBG, we have more of a view over time, and can see whether the last-played times tend to be clustered in the recent past, or have been declining over the past few years, and so forth.

Another way to get extra history for a song is to look for songs that appear in multiple albums. That is, songs with the same artist and title, but different albums. The rating of the version of Birdhouse in Your Soul that appears on Flood should be the same as the rating of the version that appears on the compilation album Dial-A-Song, since they’re the same song. But this isn’t a hard and fast rule: the same Bach cantata might be brilliant or it might be crap, depending on the orchestra and conductor. And how does this apply to EPs with seven vastly different remixes of the same song, by the original artist?

Feedback

One thing that strikes me in the above is that there are lots of formulas one can throw at the problem and get a rating. But getting a reasonable result will require a lot of work: should a song’s rating drift back to 3 after not being heard for a year? Three years? Five years? If nine songs on an album suck, to what extent should that affect the rating of the remaining song? What if the album is a compilation?

Since we’re talking about figuring out what I like, why not use the time-honored tradition of adding another layer of indirection, and have the computer figure out how to figure out what I like?

I’m thinking of something like a neural network in which some of the nodes are evaluation criteria, like “play ratio” and “last skipped within the last week”, other nodes combine results of criteria in various ways, and the top level spits out a rating between 1 and 5 stars. If we manually rate some songs, we can then use back-propagation to teach the network to accord more weight to certain criteria and less to others.

2 Responses to Calculating My Tastes

There’s an iTunes-like program for Linux (KDE) called Amarok, and one of its cool features is a pluggable rating system. The default scoring formula works pretty well, but it’s relatively simple to write a new one that fits your listening habits. The formula I use now accounts for how long a skipped song played. A song that is skipped within the first few seconds gets a higher penalty than a song that is skipped near the end.

Also, it can use a MySQL database to store your music collection’s metadata, it’s really easy to set up a browser-based interface for searching through your library. Totally cool stuff!