gts, with regards to your JPEG analogy... you refer more similarly to the way that GIF works - GIF works only on 256-colour images. Any image you save as a GIF will become a 256-colour image, but aside from changing the amount of colors of the image, GIF doesn't use any perceptual encoding. The compression is achieved through methods very similar to ZIP archive compression. A superior image format to GIF is PNG. It uses a similar compression method but does not limit you to 256 colours.

These image formats cannot be compared to MP3. Rather, you can consider FLAC, LPAC or Monkey's Audio, to be the audio equivalent of GIF or PNG. They analyze the waveform of the file and eliminate redundant information in such a way as that the decoder replaces it bit-perfectly.

Codecs like AAC, MP3 and Vorbis have an added issue, that is the MDCT. The more cosine transforms you apply to a signal, the worse it gets, no matter the bitrate you use. MPC, in this case, has the advantage of being a subband codec, and therefore, doesn't degrade that much with repeated reencodings.

I see other people keep repeating this kind of argument, that now sounds more like a magic formula to me. sort of "mdct ist dead, subband ist god"

Do you actually refer to the problem of time resolution?Wouldn't it be more accurate then to talk about "not-so-good time resolution vs. better time res."?

Besides, theoretically AAC (and Vorbis?) short blocks can achieve nearly as good time res. as MPC. Is it right?

Also, IAFAIK, a codec being "subband" might not guarantee a better time res., per se. It's just the mpeg subbanding algo is such that the time res is high. Is it right?

Not to resurrect a dead thread or anything, but it does seem to be the most appropriate place for the question. (If it's not, please forgive me )

I understand the reasoning why re-encoding is still a lossy process, but I've always been curious about the following idea(s):

Assuming mp3 encoding, in it's basic form consists of 2 steps (I'm not too well versed in the nitty gritty of encoding, so don't hessitate to correct me where I'm wrong : - a lossy psycho-perceptual "throwing away" / modifying of the waveform. - followed by a lossless compression of this new modified wave (presumably the same waveform that results from decoding).

Could it not be possible than to design some sort of method to *insert* the decoded wave, back into the encode process, *after* the "throwing away" / alteration point, so that this wave (which has already undergone this process, in it's original encoding) can now undergo the same lossless compression. There by getting it back to it's "original" mp3 encoded state?

And in a somewhat related manner: Is the mp3 decode process, reversable? Ie. Is there an inverse algorithm that can be used to take a decoded waveform, and reconstruct the mp3 that generated it. (Obviously knowing all the original encoding conditions). Not encoding in the traditional sense, merely an "undoing" of whatever the decode process did.

Of course, I am by no means saying these things are possible, I'm merely curious about the ideas

Lossy sound encoding is not like you described. It's more like analysing the frequency content of the file in small chunks, deciding which frequencies to dump or round and compressing the frequency information. The decoding is then synthesizing the frequencies back into waveforms while making sure that small chunks flow smoothly into each other.

Frequency information is more complex than the waveform, so you would know enough to reconstruct it.

If you try to analyze the file again, you will get the frequencies that were in the first mp3 plus noise coming from rounding and smoothing the chunks together. The encoder will not necessarily be able to tell that the noise is to be discarded, in fact, the noise may modify your original frequencies in unpredictable ways. So the encoder may decide to dump or round something else.

There was another thread recently, where we could agree that in theory one could write a codec that would be reencode without loss. However, the codec would be very inefficient.

A general purpose algorithm for recreating the original MP3 is possible for certain definitions of "possible." One such algorithm is a brute force technique of generating every MP3 of the correct length and comparing their output with the wav file. It would complete in exponential time. In this case, that would probably mean "many orders of magnitude longer than the age of the universe". You'd have to be certain that your decoder produced output the same way as the one that decoded that wav file otherwise you'd wait an awful long time just to get 42.

This is probably what I would call a Hard Problem . If I had a dollar, I'd wager it's NP-complete, but I can't prove it (in the mathmatical sense). If you find a solution to an NP-complete problem that finishes in polynomial time, you would become Very Famous . I'm on the P != NP side of the fence though, so good luck. :-)

edit: Doh, need to type faster or write shorter posts.

This post has been edited by phong: Aug 21 2003, 02:59

--------------------

I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.http://www.phong.org/

Could it not be possible than to design some sort of method to *insert* the decoded wave, back into the encode process, *after* the "throwing away" / alteration point, so that this wave (which has already undergone this process, in it's original encoding) can now undergo the same lossless compression. There by getting it back to it's "original" mp3 encoded state?

IIRC FhG looked at this. You could call it an mp3 un-decoder if you like.

I don't know if they managed it. But in theory it's no harder than cracking an encryption algorithm. Forget the audio processing - think of mp3 decoding as a mathematical process for converting one set of numbers into a larger set of numbers - the task is to reverse the transformation.

Now, it may be impractically complicated to do so, but to say it's impossible is foolish - especially when there isn't even any encryption key to crack! And remember - we can get a close result just by re-encoding the .wav, so it's not like you have to start from no knowledge.

In reply to the original question, the simplistic answer is to think of psychoacoustic coding as a spreading process.

The audio signal is spread in the time and frequency domain (blurred, if you want to use an image analogy) as much as possible, without it being noticeable.

But next time you send the audio through the codec, it gets spread again. So now it's even more spread, or blurred, and (even in a perfect codec) this takes it beyond the point where the blurring is just noticeable. And if you repeat this many times... yuk!

You can download such an encoder and test the claim for yourself. (e.g. SoloH - set it to layer I, 384kbps). I think you'll find that the claim is maketing nonesense or urban legend - the quality degrades as you transcode, just like any lossy codec.

However, high bitrate layer I and layer II is designed so that it can be transcoded several times before audible problems appear. This is because it was expected that these codecs would be widely used in broadcasting, where 10 generations of coding/decoding may be required.

Despite this design goal, the results are not bit-identical, and would probably not sound particularly good on critical material to critical ears.

For digital distribution, the BBC and some other broadcasters in europe used NICAM rather than high bitrate MPEG. (Now 128kbps mp2 is considered broadcast quality they don't seem to worry about things like quality!)

I've just tried the experiment using an MPEG 1 layer I encoder, and of course it doesn't work.

What's more, now that I've engaged my brain, it's obvious that the DCC > DCC copy would not be bit exact even if the audio codec offered this possibility, because there is no mechanism with the PCM SPDIF signal to synchronise the two audio codecs. MPEG operates using frames of audio data. Ignoring for a moment the fact that bit-perfect transcoding is impossible, you can't even begin to think about getting close unless you align the frames in the coded and re-coded versions (because quantisation decisions are made on a frame by frame basis). Correct alignment is something you only have a 1 in 384 chance of hitting when dubbing between two DCC decks.

And, even if you did, the copy would not be bit perfect, for all the reasons discussed in this thread.

Just a re-clarification of my point, incase it wasn't too clear: In the first example, what I'm trying to describe is not re-encoding as is commonly understood. It's more of a "cheat". Where by you "skip" the processes that typically adds noise and error. The idea that "this piece of audio has already been processed, so no need to process it again". Key to this idea of course is the concept that the mp3 process goes something like "take the sample, edit, change, manipulate it, etc, psycho-perceuptually minimize it, all so that it can compress nicely". Where this last process, the "compression", is essentially reversable. You don't get the original back of course, but you can get the "edited changed, maipulated part" back.

Ofcourse, thats based on my general and "mainstream" impression of how mp3 operates. Which is likely not the case, heh. So I guess thats what I am asking right now? Is ther no real "compression" part of the mp3 encode process? A compression in the sense that it's reversible? (I wonder where I got this idea from? heh. I think I may have read somewhere on the use of huffman encoding in Vorbis, so I just assumed thatt some form of "end of the line" lossless compression is used in most audio codecs ).

With regards to mp3 "Un-decode" being NP-complete: does that mean to say that there are non-deterministic aspects of the decode algorithm? Or at least, non-inversable? I'd imagine that only those sort of characteristics would require a "brute force" attack, heh.

phong: I think you are wrong on both counts in the last message, but I like your iterative approach. ;-)

aggies: it's perfectly clear what you mean, and your intuition is not far off, but you need to read more into the replies.

Obviously every mainstream data compressor will use every trick to pack data as tight as possible. Huffman and similar algorithms (Rice, arithmetic etc.) are indeed the last step because they pack data on the level of individual bits, and there is not much packing you can do beyond that.

The central idea of lossy audio compression is to represent sound is the most compressible way so that quality loss is inaudible. Fewer numbers or rounder numbers pack tighter, so individual frequencies are dropped or rounded at the codec's discretion to maximize compressor efficiency down the line.

The decompressor is of course deterministic, but it is indeed irreversible because it performs blending and rounding. That translates to noise - new content - that you will have to locate and discard in order to reconstruct the original mp3.

Now, back to the iterative approach: if it were possible to specify a meaningful distance between two decodings so that successive approximations could minimize it, the problem would land squarely in P. I'm not going to develop this further, because I've got to make the architects happy with their hatches.

I don't know if they managed it. But in theory it's no harder than cracking an encryption algorithm. Forget the audio processing - think of mp3 decoding as a mathematical process for converting one set of numbers into a larger set of numbers - the task is to reverse the transformation.

Now, it may be impractically complicated to do so, but to say it's impossible is foolish - especially when there isn't even any encryption key to crack! And remember - we can get a close result just by re-encoding the .wav, so it's not like you have to start from no knowledge.

Wouldn't the pigeonhole principle prevent something like this from working properly?

I've thought of something similar, figuring that a neural net might actually be useful in something like this, assuming you could devise a good input/output scheme.

My idea was more for restoration of JPEG artifacting, rather than music, but this way you could train the net for a certain kind of restoration (ie, specifically restoring nature photos) and just improve image quality somewhat, due to knowledge of the specific type of image.

I later figured, after reading some stuff about compression, the pigeonhole principle, and the like, that it wouldn't really be possible to create something that brought lost fidelity back. That information is lost from the bitstream and cannot be retrieved. There may be some elaborate hacks to restore a facsimile of that information with a fraction of the original dataset, like SBR, but unless properly coded data is there, all the un-coder would be able to do would be to make the audio sound better to our ears, not necessarily more accurate.

I could be completely off-base here... I know nothing of the mathematics of compression, only calculus and computer programming.

Sublime:easiest way to think of it, its like xeroxing a xerox, just never as good as original and if you keep re-encoding same file, it will get worse and worse...

Ideed. And with that in mind, the idea would be to somehow be able to determine the "copiers memory after scanning the original image", from the end resulting copy. Obviously it's not the original, but it would be able to reproduce the copy, at will, without any loss.

But alas, I had thought/feared it was as such. I figured that I couldnt' be the first person to think along those lines, and that if it was possible it'd probably already be done.

With regards to the pigeon hole though, I'd say that one has to take such theories with a grain of salt. Yes, in the most strictist of sense, you can't recreate more information (or recover lost information), from data that has lost/reduced it. In music thats lost fidelity, in images it could be lost resolution/clarity. For the simple reason that the set of input to produce that output is larger than 1. Ie. Theres many starting points to get you to that end state, so there is no way to figure out which one it was. Too many pigeons, not enough holes .

However it becomes very difficult to correctly quantify the "amount" of "information" in some item of data. Meaning that you can sometimes "cheat". For example, look at image reconstruction algorithms. They can take a blurry/low resolution picture and enhace it to a a more clear/higher res image. In the strictest sense, that should be impossible. But they use things like light refraction from known light sources, and other such "cheats" to essentially eliminate alot of the possibilities. To reduce the number of pigeons, if you will. Which in actuallity, is that they are increasing the amount of "information" found in the image. They are able to find "more" info in the image. Conventional information theory says that you can't get "more" from "less". But you can get around that in neat ways where you can show that "less" is actually not as little as you had thought.

I've always wondered about the classic NP traveling salesmen problem. Yes it's way above polynomial time to compute all possible routes, to find the best one. But what if you could cheat, eliminating subroutes that were likely not in the solution. So that any master route containing the sub route, would automatically be ignored. Do it enough, you can reduce the complexity of the result set. Sure it's not guaranteed to succeed. But what if you had a probability of sucess of 80%. Than statistically you just run it a bunch of times to guarantee sucess, which still comes in as polynomial. Basically the idea that each possible answer is NOT equally likely to be correct. And so you focus your attentions on the much smaller subset of more likeli answers.

But I digress a bit, heh. If there is indeed uncertainty in the decode process, such that it does not have a well defined inverse, than all hope is lost, heh. Although, that should mean than that it is possible for more than one mp3 datastream to decode to the exact same resulting wav?

50 years ago, there were a lot of old 78rpm discs around that people wanted to issue onto LPs. But they couldn't get the noise out of them. Or if they did, the results sounded horrible. And there's a problem: you have a valid signal, and added noise. And theory tells you that when the two are mixed together, you can no longer separate them. Where both are essentially unknown, you can't reconstruct either. It is not possible. You can't get back that nice clean audio signal from within all that noise.

And then Cedar came along, followed by many others, and built products that declick and denoise old records! How? Is theory wrong? No! But it doesn't matter if the result is mathematically identical to what was originally recorded - it just matters that, after cedar processing, it sounds significantly better than the noisy version.

Whether or not you can "un-decode" an mp3, I'm not sure. I'm 100% certain that you can't get from the mp3 back to the original .wav without loss. However, we often say "the information has been lost - it can't be put back". Not perfectly, no. But come 2050, when the only source of some recordings are 128kbps mp3s, I bet you some bright spark has a process which makes them sound a lot closer to the original than the surviving mp3 does.

So, my advice is to try to do the impossible. (Though please, don't jump out of your upstairs window thinking you can fly!). Because some people succeed, and they usually make a lot of money out of a product that is "good enough" - much more money than the theorists who say "it is theoretically impossible to do this perfectly".

50 years ago, there were a lot of old 78rpm discs around that people wanted to issue onto LPs. But they couldn't get the noise out of them. Or if they did, the results sounded horrible. And there's a problem: you have a valid signal, and added noise. And theory tells you that when the two are mixed together, you can no longer separate them. Where both are essentially unknown, you can't reconstruct either. It is not possible. You can't get back that nice clean audio signal from within all that noise.

And then Cedar came along, followed by many others, and built products that declick and denoise old records! How? Is theory wrong? No! But it doesn't matter if the result is mathematically identical to what was originally recorded - it just matters that, after cedar processing, it sounds significantly better than the noisy version.

Whether or not you can "un-decode" an mp3, I'm not sure. I'm 100% certain that you can't get from the mp3 back to the original .wav without loss. However, we often say "the information has been lost - it can't be put back". Not perfectly, no. But come 2050, when the only source of some recordings are 128kbps mp3s, I bet you some bright spark has a process which makes them sound a lot closer to the original than the surviving mp3 does.

So, my advice is to try to do the impossible. (Though please, don't jump out of your upstairs window thinking you can fly!). Because some people succeed, and they usually make a lot of money out of a product that is "good enough" - much more money than the theorists who say "it is theoretically impossible to do this perfectly".

Cheers,David.

This is what I like about this forum.To qoute a famous US President John F Kennedy............"Some folks dream of things as they are and say why...I dream of things that never where and say, Why Not

Do it enough, you can reduce the complexity of the result set. Sure it's not guaranteed to succeed. But what if you had a probability of sucess of 80%. Than statistically you just run it a bunch of times to guarantee sucess, which still comes in as polynomial.

This is a very good definition of a heuristic. Heuristics are used all the time to get "pretty good" answers to Hard problems. A good example is checking a number for primality. You can do this with 100% certainty by factoring (very slow), or a recently discovered algorithm that takes O(n^12) (still quite slow). There are other algorithms (some of which only work on certain sets of numbers) that when given a number to test and a seed number will tell you if it's prime or not, but get it wrong a certain percentage of the time (for the sake of example, we'll say they guess right 75% of the time). These tests can be run quickly, and changing the seed number results in an independant test. If you test a number with a few dozen seed numbers you can be sure that it's prime within a very small margin of error.

So, it's certainly possible to develop a heuristic to create a close approximation of the original mp3 file from a decoded wav file, and it would be much easier than getting the EXACT mp3 file back. Don't ask me to write it though, it doesn't sound like fun at all. I'd much rather work on graph coloring register allocation.

As far as "image enhancement" algorithms go... Most don't get much additional information from the image. Instead, they make up ficticious image data that looks pretty.

--------------------

I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.http://www.phong.org/

Cosine transformation is NOT LOSSY. After transofmation you just cut some frequences. If you have no information in cut out freequences you get NOT LOSSY compression. So if you cut something once you can't cut it second time. There is no "rounding" or something similar. For example you have sequence of bytes '15 35 156 204 32 0 14 87 5 9 45 34 35 147 21 224' let assume that after cosine transform you have '245 125 54 2 20 6 1 2 0 1 0 0 0 0 0 0'. In order to compress you just cut off last 6 bytes. After decoding you will get exactly the same wav file. So mpeg compressin NOT ALWAYS is lossy. In case of repeatedly compresing-decompresing the same file with the same parameters you will not loose anything.And there is no need to use 192 or even 128 kbps if your sound source is radio quality. You won't get any better qualty. Imagine that instead last 6 zeroes you will cut only 3. You just have bigger files. Here helps much VBR because it will cut more or less depending on how much zeroes you have at the end of spectrum.