The subject above IS the subject of this thread, but as far as I can determine, it is a wishful subject - A subject lacking reality. If there IS a database through which one can translate the filename of a DVD into the actual title of its contents, I can't find it.

I've found that with a movie's IMDB number, one can find - of course - all the information about the title that IMDB itself retains, as well as large numbers of posters from other sites. But nowhere is that number or any other bit of identifying information tied to the disc's name itself.

If such a database existed, HandBrake - or some companion program - could automatically (for most discs) supply title, cast, movie poster, and even memorable quotes, all without human intervention. (Except for some TV series discs that all have the same name, drat them.)

Don't forget that a lot of DVDs are mastered sloppily. I'm sure I'm not alone in purchasing a new, commercial disc only to pop it in and find the name is something like DVD_MOVIE.

So you're going to need more than that--probably the titles, their durations, and associated chapter durations and audio tracks would be enough. But then to build a database like that you'll need public contributions. That means in no time at all the well will be poisoned by people who do "Main Feature" rips or remove extra languages.

You might want to look at TagChimp, which tried to do something similar. But I've never had it work for me, not once.

I've thought about this before.
& even started to code a solution/database for it.

But I found, discident.com; and am testing it out to see
if it's good enough so I wouldn't have to write my own.
So far the DVDs I've tried to identify with
the service are accurate. But I've still gotta do a bit
more testing to 'trust' it.

I have coded something like this and exposed it as an XML-RPC service on my website (see http://jbcobb.net/?page_id=28 for details). I have a set of wrapper scripts for HandBrakeCLI that uses it to automatically select the right tracks to rip and names them appropriately. I have GPLed the source and put it up on SourceForge under the name DVD Metabase (because it is scoped to do a LOT more than just this)...

It is currently only on Linux but any bored Windows or Mac coder could get the dependencies moved to other platforms easy enough. The database itself is one that only my wife and I are populating with our own purchases and as such, until someone pays me to make it all super-accurate or volunteers to help, comes with a 30 foot warranty.

well, I mean you could just query an imdb search for the name and take the first result imdb gives you (which is likely to be the correct one I guess) ... but as someone said before, a lot of movies just have some weird title so you're gonna run into problems with those ..

We initially looked into doing some screen-scraping to try to jump-start the database but the kind of information we needed simply was not reliably-available on imdb...specifically, extra feature names and track index mappings. Nothing tricky but enough was missing that we let it go. Its just too valuable *in our use case* to know that track #19 on such and such disc is the directors cut of movie xxxxxx.avi or that track #18 for example is the gag_reel...Also, (still in way alpha but..) knowing that for a certain pressing of Space: Above and Beyond specific HandBrake settings can overcome crappy analog-to-digital conversion better than defaults is useful too...kinda like a wikipedia thing, the more public knowledge the stronger and more useful it can be. For me its just fun ATM

We fooled with screen scraping imdb.com in an effort to sorta seed our database but abandoned the idea because it simply did not answer the same question that we were. To them "Indiana Jones" is a concept and a movie. Our DB tells the ripper "track five is the main movie and you should label it 'Indiana Jones and the Last Crusade.avi'" for example; with special features this is even more important, also not on IMDB.

Also (in pre-release versions) I am tucking additional or special HandBrake commands, etc so if a particular disc is notorious for having a crappy quality, steps can be taken to automatically get the best of what you have. There are other advantages here but these are the ones that are really working for me. YMMV

jeffcobb wrote:
Also (in pre-release versions) I am tucking additional or special HandBrake commands, etc so if a particular disc is notorious for having a crappy quality, steps can be taken to automatically get the best of what you have. There are other advantages here but these are the ones that are really working for me. YMMV

That's a great idea, Dark Knight and Wall-e comes to mind in terms of ripping problems.
I'm on OS X, but I'll take a look at your code/db and see if I can port it over .

Hi Joel; First most of the code is Python ATM so for broad strokes we are most of the way there. The gotcha on the other hand is something that I am recoding to have at least linux and windows counterparts, essentially the hashing algorithm in libdvdread which is you get the HandBrake source and build it for Linux, one of the extra apps that gets created is called disc_id which examines a disc and generates a unique hash based on certain characteristics. This is what is used to associate a physical disc to extended metadata, including the additional rip settings.

Now as luck would have it I now find myself in a place where it would be really handy to have this bit of functionality in a more portable and pluggable format so that is something I am doing right now. I must admit however that I know linux and windows the best and therefore would more than welcome any Mac advice/assistance/whatever.

jeffcobb wrote:Hi Joel; First most of the code is Python ATM so for broad strokes we are most of the way there. The gotcha on the other hand is something that I am recoding to have at least linux and windows counterparts, essentially the hashing algorithm in libdvdread which is you get the HandBrake source and build it for Linux, one of the extra apps that gets created is called disc_id which examines a disc and generates a unique hash based on certain characteristics. This is what is used to associate a physical disc to extended metadata, including the additional rip settings.

Now as luck would have it I now find myself in a place where it would be really handy to have this bit of functionality in a more portable and pluggable format so that is something I am doing right now. I must admit however that I know linux and windows the best and therefore would more than welcome any Mac advice/assistance/whatever.

Jeff

I've just started coding with the mac actually, but Xcode really seems to make things easy haha.
The really attractive features your db has is the title data, and rip settings.

But I'll look over your code. Currently I'm using the solution above, but yours would
be a nice addition (+ fallback db) as yours is much more versatile. But we'll need
community help to build the db. But you see, tagChimp, as stated above got polluted.
Isn't exactly the same thing, and with the way this is queryed, you do have a chance to be
less pseudo data prone than maybe tagChimp.

So, a locking based system such as tagChimp is using now would be nice.
But I have some other ideas if you plan on making a user-community db
to prevemt such pollution. So let me know, on the direction you plan on
going with that, & I'll be glad to help ^__^.

Joel, seriously I am making this up as I go and the reason I am pushing this up to SourceForge is to get other peoples input. So if you code, cool but even if you just help in the discussion that would be wonderful. There are a lot of similar and related issues I am trying to find the best path for right now. Yes I think it would be optimal if it could be user-community driven for a lot of reason that have to do with being more than a sum of its parts. And I think there is potential for more but need those ideas to get banging off each other. Its how I learn.

There are areas of discussion on security, policing or self-policing or whatever to keep some kind of standards on things, different models for data propagation, etc. Now is the perfect time to hash this out to because its starting to pick up steam and any sudden or pervasive API changes will soon have a much higher cost....

jeffcobb wrote:Joel, seriously I am making this up as I go and the reason I am pushing this up to SourceForge is to get other peoples input. So if you code, cool but even if you just help in the discussion that would be wonderful. There are a lot of similar and related issues I am trying to find the best path for right now. Yes I think it would be optimal if it could be user-community driven for a lot of reason that have to do with being more than a sum of its parts. And I think there is potential for more but need those ideas to get banging off each other. Its how I learn.

There are areas of discussion on security, policing or self-policing or whatever to keep some kind of standards on things, different models for data propagation, etc. Now is the perfect time to hash this out to because its starting to pick up steam and any sudden or pervasive API changes will soon have a much higher cost....

Jeff

Well, as stated above I started to make something similar awhile back.
So I have plenty of ideas and whatnot for the data and structure and what not.

But let me know how this thing goes, and I'll let you know how the os x build goes ;P.
And if you put it on sourcefourge, I'll be sure to keep up with the discussions there.

I need to check out your current structure, but chapters along with chapter timecodes would
be a nice addition that no other db has ^__^

However it would be freaking GREAT to have someone to bang these ideas off of! its been a one-man show for too long and while it solves my use cases I doubt it completely solves anyone elses. Also your timing is superb if it calls for field changes to the db since I am literally 2 days away from a major upgrade involving a lot of DB changes. I do have some things I would like to chat with you WRT the chapter hacks...that dovetails into something else I have cooking so this could really work for everybody, like double the features out of the same code

I PMed you the other day about the DVD DB, but I'm going to post some ideas here to see what everyone else thinks.

I've been looking into ways to implement all of these features of the database,
and I was thinking of coding some integration with VLC to monitor user input and ask the user
to select certain things like "Please select the main movie" and then monitor the title/chapters.
"Please select deleted scenes", "Please select and other relevant DVD Extra's (and name them ^__^)",
etc. It would be completely optionally for users to do so, but at least 3 users would need to select the same
things in order to have it a "confirmed" entry.

Now for every movie title, feature title, we need to record the chapters and timecodes, the video stream's timecode,
and all audio names, languages, format's, and their timecode(s), and every language subtitles and their timecodes.
That way the DB will know just about everything about the movie to tell users about and whatnot. Asking the users
to name the track's from the options of "Main", "Commentary", "Foreign Language". The VLC scanning program
would be smart enough to look at format's and languages of each.

Also, the DB (and likewise, the programs that will support it) will only accept REAL DVDs, which basically means
only DVDs with multiple titles, since everything with only 1 is most likely a title only rip and/or a home video.

Also, much simpler things need to be input, like DVD Title, Date of Release, DVD Rip Workarounds, etc.
& was also thinking about putting some kind of genre option, like "Comedy", "Action & Adventure", etc.
& maybe maybe, stating whether a DVD is very "noise"-y, needs to be "de-interlaced", and other usefu
encoding information.

& also, this next option is completely farfetched and is downright ridiculous, but I was
thinking about screen capturing the DVD while the user was selecting the different options and whatnot.
& uploading the pics in order to be able to reference and/or build a pseudo menu in a mp4 file.
It's technically feasible, but I've never done it, but it can't hurt to look into it right. I mean chances are it
wouldn't be supported by Apple Devices, but it'd probably work in some things ha.

And the people at the Muxo forums seem like they've converted a lot of DVDs subtitles for use with Muxo,
and was thinking about asking them for contribution to a centralized DB that could take their Apple-compliant entries
and match them up with movies, and then your DB could reference those SRT files for use with Muxo (sine your DB would
already know what subtitles are in whatever movies ;P).

Also the DB needs to be able to be queryed by Movie Title&Year (if years not specified, most recent gets shown), and your disc "fingerprint".
And I think that's it haha.

Of course these are all suggestions, and I haven't necessarily gotten down to thinking tech specs of the DB, but just wanted you and/or anyone
else to comment.

I'm not sure how difficult this would be, but a lot of progress has been made in audio fingerprinting and I think at least some basic algorithms for it are out in the open now. Since most DVDs seem to have at least one trailer included on the disk, and databases like IMDB tend to have trailers for a large number of movies, maybe this would be a way to start?

For example, it could work something like this:

1) Get fingerprints from every audio track shorter than (e.g.) 5 minutes when the DVD is inserted.

2) Query the database against fingerprints of the audio tracks of available on-line trailers on IMDB, use that as starting information for the database.

This answers a slightly different problem than the one you guys are approaching (identification of the movie vs. the particular DVD encoding of that movie), but it should be additive to your project at the least.

The potential problems I would see are legal/technical issues involved with fingerprinting the audio tracks on a DVD, violating IMDB's terms of service (likely), and getting the initial fingerprints. I know services like MusicIP offer their programs, but it would probably be a ToS violation to use their software for this use.

Just a thought. I'd be happy to help, but my coding skill is rather limited and my time even more so. Looking forward to seeing what you guys come up with...

gabberwok wrote:Okay, obvious problem - trailers for other movies included on DVDs. Still, it might help narrow things down.

Thanks for your suggestion ^__^.
We've actually made a lot of progress since that last post though.
And it's not a bad idea, but it would be next to impossible to implement
something like that. As there's multiple trailers for every DVD, more than one trailer
on the DVD, and I don't believe there's enough DVDs with trailers of themselves to merit
that, not to mention finding the trailer consistently (which is a function of the DB in the first place) haha.

But you seem rather interested in this, right?
When I polish up the tools of the database, would
you like to beta test them and give me feedback?
That is, if your on a mac. If you have windows or other.
It'll take me longer to port the tools.
It won't be for a couple of weeks, but it'd be nice to
get a users opinion of my tools? ^__^

I'm not sure if this helps you at all on your quest, but the DVD application "Rip-it" uses a system called DiscIdent to replace the DVD title (e.g. PASSION_CHRIST with Passion of the Christ, as an example) with the movie's true name. Haven't really read how it works, but it DOES work about 95% of the time. If that's any help in what you're doing...

Not to get too far off topic, but this youtube video from Scrubs ( http://www.youtube.com/watch?v=SobnBZVtFa0 ) is possibly one of the funnier moments ever on television. That's one of the situations where an anonymous note left on someone's desk is the best course of action. Good luck with the title database.