Cookie Notice

As far as I know, and as far as I remember, nothing in this page does anything with Cookies.

2010/03/19

Higher Order MP3 Directory Organization, A First Step

I have a huge number of MP3s. I am sure I haven't heard all of them. Some have weird tags. Some have no tags. Some are not really MP3s, but "you can't download this" HTML files, or just zero-sized files. When I bump into them, I can fix these things (read: delete the bad files) but it can take some time.

I had wanted to use the power of Perl to help with this, but while there are great numbers of modules to help with just about anything, I didn't have a directory walker I liked.

Everything that claims to be an MP3 gets counted. Yay! (Just so you know, the current count is 37604.) There's lots of included modules that I don't use yet. HOP.pm simply puts MJD's directory walker into a module where I can get it on demand, so I don't have to copy and paste. Having a command-line set for the directory would be good, but not today.

And needless to say, you can adjust this to do a lot of other things. Check file sizes. Find file names without track numbers. Stuff like that. There are three downsides so far: You don't have hashes to find repeated songs, you don't have MP3 tag information, and you have to run it again (with the associated lag of running a directory walker on 30,000+ MP3s.

But there are solutions.

Digest::SHA1. MP3::Info and/or MP3::Tag. DBI.

I run Linux. sudo apt-get install mysql-server gets me a DB. Run once, save the data and query until you're sick. I started out with this schema.

length is song length in seconds. run_length is song length in HH:MM:SS format, and yeah, I have some MP3s that push that, if not exceed it. Or that's the theory, at least.

And some would say it's bad schema design, but I'm not so much worried about grouping by artist or album or year. Those tell me if the file has ID3 tags or not. I'm focused on the MP3 file itself here.

This is still a work in progress. I don't use Carp here, but I generally include it when I should. As I'm debugging, I always have Data::Dumper floating around so I can see what the data structures are. I could probably just use MP3::Info instead of MP3::Tag. Haven't decided yet. Digest::SHA1 gives a cryptographically-secure hash of the MP3, so that should detect duplicates. HOP was mentioned earlier, and MusicDB is a wrapper module that allows me to have my DB passwords in one convenient place, so I just have to worry about the actual SQL. There are some bugs — length doesn't give the right info yet — but I have all the info on any discrete MP3 file.

Notice though, that the function has become sufficiently big and complicated that I've pulled it out and given it a name. Also notice how I'm starting to use placeholders, which should make my DB interface more efficient.

A good thing to add would be to see if a file has been put into the DB, and if so, to get the unique index, file size and hash to check for changes, then update only if there's changes, rather than inputting it in again.