CUETools Database

You probably heard about AccurateRip, a wonderful database of CD rip checksums, which helps you make sure your CD rip is an exact copy of original CD. What it can tell you is how many other people got the same data when copying this CD.

What are the advantages?

The most important feature is the ability not only to detect, but also correct small amounts of errors that occurred in the ripping process. That is, CTDB's purpose is not so much to tell a good rip from a bad rip (although it can be used for this), but more to provide a means to fix a rip, when the user is certain their copy is damaged and is confident that there is an undamaged one in database. (The general rule-of-thumb will be that a CTDB confidence of 2/X or higher indicates that the rip in the database is probably undamaged.)

CTDB is free of the offset problems. You don't even need to set up offset correction for your CD drive to be able to verify and what's more important, submit rips to the database. Different pressings of the same CD are treated as the same disc by the database; it doesn't care.

Verification results are easier to deal with. There are exactly three possible outcomes: the rip is correct, the rip contains correctable errors, or the rip is unknown (or contains errors beyond repair).

If there's a match, you can be certain it's really a match, because in addition to a recovery record, the database uses a well-known CRC32 checksum of the whole CD image (except for 10×588 offset samples in the first and last seconds of the disc). This checksum is used as a rip ID (CTDBID) in CTDB.

What are the downsides and limitations?

CTDB doesn't bother with tracks. Your rip as a whole is either good/correctable, or it isn't. If one of the tracks is damaged beyond repair, CTDB cannot tell which one. As of CTDB 2.0, individual track verification is done. You still need to verify the entire disc though.

If your rip contains errors, the verification/correction process will involve downloading about 200kb of data (or possibly more for popular CD's), which is much more than it takes for AccurateRip.

The verification process is slower than with AR.

The database was just born, and at the moment contains far fewer CDs than AR.

Right now [as of Oct. 2011] confidence levels are a bit of a mess, because they combine past measurements of AR confidence levels with the number of direct CTDB submissions (even if from the same rip) since that point. Eventually, CTDB confidence levels will be reset to the actual number of independent submissions. As of CTDB 2.0, confidence levels are no longer based on AccurateRip data.

How many errors can a rip contain and still be repairable?

That depends. The best case scenario is when there's one continuous damaged area up to 30-40 sectors (about half a second) long for most discs. As of CTDB 2.0, one continuous damaged area up to about 75 sectors (a second) on popular discs.

What information does the database contain per each CD?

Offset-finding checksum, a small (16-byte) recovery record for a set of samples throughout the CD, which allows detection of the offset difference between the rip in the database and your rip, even if your rip contains some errors.

CRC32 of the whole disc (except for some leadin/leadout samples). (CTDBID)

Submission date, artist, title.

180KB recovery record (about twice that for more popular CD's), which is stored separately and accessed only when verifying a broken rip or repairing it.

Submission log, including user's ip addresses and CD drive model, which is used to eliminate erroneous submissions, virtual/faulty drives, and to verify if a submission by one user is independently confirmed.