I have some wave files from a set of cd's that I'm trying to fix. They have what I'd call "digital screw-up" on them (these are purchased cd's, yet have artifacts like a bad cd-r)..

It appears that groups of 5 samples in the right channel are offset by varying amounts, in patches throughout the tracks. It sounds incredibly annoying, and I have tried lots of different ways to fix them, filtering, click removal, interlacing other sections of the song over them. They definately improve the sound quality, but naturally, I want what they *originally* sounded like!

What I dawned on recently though was that the affected samples, and their offsets seem to follow a pattern. You get regularly spaced groups of 5, with regular spacing between each bad sample. And the amount it is offset by, from what I can tell so far, it appears to be by the difference between it's actual value and the value before it.. which would be hard to work out mathematically, because how are you meant to know what it "should" be?

Anyway, the best way I thought to fix this problem would be to get a friend to write a program that manipulated the samples in mathematical form. I can do maths, so could probably find a reasonable way of correcting the samples, then he could write a program that performed the algorithm on the correct samples in a wave.

My question is, could this be done by converting a wave file into a file that contains the sample values as integers? Or is this what a wave file is, meaning it could be acted on directly by a program that could search through the values and alter certain ones?

Thanks!

(i can post samples for any kind person who wants to help figure out how to fix them!)

edit:Ok, I've looked at the files some more, and noticed something quite obvious. The problem samples all take the same value as their preceding sample. I found that taking the difference between the corresponding left channel's sample and it's preceeding sample was a good approxomation (it moved the bad samples in the right direction and close to the right amount), and that would be why.

Is there a way of predicting how the samples should be changed to follow their natrual curve (as would be hinted at by the preceeding and following samples?) It would need to be specifically for when one sample is of unknown value, but the others are correct.. I'm not sure if standard click-restoration in Cool-Ed can deal with this.. just single bad samples, rather than a click in a vinyl which spans dozens of samples...

I'd probably use matlab for this if it was a small amount of files. It's slow, but has a lot of functions (it can load and save wavfiles, has dsp and statistical functions etc) and is easy to program. But of course, if you don't have access to a computer with matlab... have to try something else.

Edit: By the way... I assume you have already tried EAC in secure mode and Deglitch?

Aaah.. that's a good idea! I have access to matlab on the computers at university.. I was going to ask my maths lecturer about it, and the prediction algorithms I saw on the Monkeys Audio site.. it seems like they could be made to find a good approximation to the missing value.

I ripped the cd's with Audiocatalyst years ago and the problem was there. I took those cd's back and said hey, they're damaged. Then more recently I bought them again, ripping them with CDex, again I found the problem, in the same portions of the same tracks!

I ripped them with EAC as well, but not using deglitch. Instead I tried the standalone deglitch program which claims to be more accurate. It works on some passages, generally the quieter ones, but when there are drums inolved (as there are in most parts) it only picks up about half the bad samples, and gets some wrong too (as in, corrects left channel samples when it shouldn't), so I'm guessing I need to do it myself as I know exactly which samples are messed up and which ones aren't.

Question is, how best to do it?

I don't have a clue how to use matlab yet, but could problably collar one of the computer technicians to get some help! And then it's a case of finding the best predictor?

If you had 4 samples, x1, x2 x3 & x4 (x4 being the unkown one)

something like x4 = (2 * X3) - X2 or, x4 = (3 * X3) - (3 *X2) + X1 ?

If I found the best one of these to use (or a way to decide the best one of many predictors to use), would it help running it forewards as well as backwards? I know the samples that follow on from the broken sample, so it could predict backwards from those too..

And I have intact left channel information (most of the time).. the way I was editing the files was by taking the difference between the two channels, this leaves a waveform with much lower amplitude, so it was easier to find and remove the peaks. Could this 'difference' waveform be used at all? Or is it in effect just the same information presented in a different way?

2nd degree x1,x2,x3,x4 :If the origin of the coordinates is X1, and the interval between the samples is 1, and the parabol formula is y=(b*x+a)^2 + c

solving the system

X1=a^2+cX2=(b+a)^2+cX3=(2b+a)^2+cX4=(3b+a)^2+c

Knowing X1, X2 and X4 will give you a, b, c, and X3.

Besides, I would be very interested in the exact pattern your samples are coming in.In the DAE quality analysis I'm running, I found some groups of 60 samples, showing mostly groups of 3 (bad-good-bad-good-bad) separated by 24 ones.After a lot of research in the rare and wrong documents about CIRC on the web, and some help from BobHere, in EAC forum, I found that it was the exact pattern of a 17 frames burst error passed through a CIRC chipset correcting 4 wrong symbols at the C2 stage.I'd be curious to see if yours match a 2 or 3 errors chipset.

Just tell me if your groups of 5 come in bigger clusters, and the total number of wrong sample in a cluster (they must be between 10 and 500 wrong samples), and I'll give you the theoretical pattern. Because so far, having problems with the CIRC documentation, I use the experimental results to correct the theory.Now, I'd like to see if the corrected theory predicts other experimental results like yours, so don't give me the pattern, just if there are clusters.

I never realized you'd posted here, sorry for the long delay in my reply.

I've been beavering away with Cool Edit cutting up the song into the small bad segments and making sure they're all aligned for a program to sort out automatically! I have the first test results for you too!

I've put the exact structure (as best I know it) here - but if you don't want to read it till later, just skip this white portion. Come back select it all with your mouse to read later!

Here is the overall pattern :

[good section][bad section][good section][bad section]......

The good section was ~55 seconds, or ~2'400'000 samples,and the bad section was ~3 seconds, or ~140'000 samples

Within each bad section, I had this structure :

[broken cluster][good cluster][broken cluster][good cluster].......

A broken cluster spanned 64 samples, starting with a bad sample, then 15 good, then a bad, then 15 good.. until the 64th was the final bad sample.

Then a good cluster lasted 224 samples, leaving the pattern repeating itself every 288 samples.

It wasn't quite this regular though, sometimes bad samples would be missing, at the very beginning and end of the bad group, you'd only have 1 - 3 bad ones, but they would still be spread across 64 samples with spacings of 16 (or 32, 48)..

And also, I believe that every 6th [bad bit] only contained 4 bad samples, spanning 48 in total. So, this meant every 6th repetition only lasted 256, rather than the 288.

I'm not sure if my problem will fit your theory, but I'd like to see if your theory is correct, it could be very useful in diagnosing problems?

I get groups of 5 bad samples, and within each affected cluster there are just under 500 groups, about 2400 broken samples in total. All the clusters are roughly the same size, within 4 groups of eachother.

[just to remind me and you... group = 5 bad, cluster = set of ~500 groups]

So, it's quite a complex pattern. I've tried ripping the cd's on different drives with EAC / CDex, and found the best proof that they were physically wrong is that they play identically on my hi-fi. I am fairly sure the wave's I've ripped are actually what is on the cd, it's not a problem with my drives - unless the same problem could affect a non-PC cd drive?

I've taken ErikS's advice on this and have used Matlab. It's an awesome program Thanks

An input .au file can be read and converted into a table of numbers. We then scanned through this list and picked out values that were the same as the previous value. Discarded ones that weren't in a position divisible by 16 (to exclude any samples that were meant to be like that due to the quantization of 16 bit)..

We then removed all these bad samples from the dataset, then used cubic-spline fitting to estimate these missing values. The resulting file could be saved as .au and compared.

I know spectral graphs aren't the right way to go about it, but check this :

I have learnt a lot about program strcutres and Matlab doing this! I'm definately gonna buy the dude who helped me a beer! He even likes drum'n'bass too (that's what some these samlpes are of).

I'm going to try and improve on the results, as I have the L-R channel difference as well... I can use the same matrix of broken sample locations and apply it to this 2nd file, which usually contains a much smoother waveform, and so will be predicted better I hope

Wow! This is what I call glitch removal ! They are even not visible in the waveform, nor detected by SoundForge, nor by Deglitch ! And cubic interpolaton ! Even hifi CD players use linear.

About the pattern, it seems to have nothing to do with CIRC, at least not with a CIRC burst error. This means that it shouldn't come from a CD playback problem. It doesn't sound like a DAT problem either. I don't know what can have caused this.

Hehe thanks! It was done using "cubic spline". I've searched on google, and can't find anything simple enough to understand how it works... but the way it was explained to me is that it guesses at your missing value using every other value in the data-set, not just the surrounding 2 with linear-regression, or surrounding 4 with cubic-interpolation.

Pretty cool huh? I'm trying to improve upon it though, I noticed a couple of points after the fitting seemed worse than they were before, in areas of the waveform that looked (to the eye) to be hard to predict, so I'm going to use the left-right difference to help improve that.