Video Production Stack Exchange is a question and answer site for engineers, producers, editors, and enthusiasts spanning the fields of video, and media creation. It's 100% free, no registration required.

I am facing a common issue that I cannot find a term to describe: I have a few low quality VHS rips that were encoded with a certain Apple ProRes codec (irrelevant in the context of this particular question) and take up several GB of space. I know for sure that the video content is of such low quality that it shouldn't take up more space than say, an XDCAM EX 1080 HD file of longer length and higher visual quality.

Similarly, how is it possible that one video file can become ten times larger without any gains in fidelity when encoded with a lossless codec or with a higher bitrate? I would guess that somehow the encoder is adding "extra" information as it re-encodes... but obviously it is impossible for a low fidelity file to gain quality when it is merely re-encoded with higher quality settings... so what is going on here?

Is there a word to describe this phenomenon? Also, how can I make sure my transcoding doesn't INCREASE the file size? I've been looking for tools that can measure the appropriate bitrate for files (like the low-fi VHS rips that somehow can take up to 10GB... which shoudln't happen) but I can't find any tools like this... What I want to do in this situation is to re-encode these massive files to an appropriate format with a reasonable file size without losing any visible quality...

3 Answers
3

This isn't a phenomenon, this is compression. It is simply how it works.

Compression works by taking an input, runs it through some algorithms and then gets an output that matches up either exactly (lossless) or approximately (lossy) with the original input. It is not stored like normal video data as a set of pixels, but rather some form of data that models the original set of pixels using less space.

When you decode a video, it works from the compressed format and generates out a stream of pixel values that can be played back as actual video. When you transcode from one format to another, the video first goes through this decoding process to get a set of pixel color values and then goes through the encoding for the new format.

What you seem to be missing is that it does not matter how high quality the actual signal is that goes into the encoding, it will work off whatever you give it. You could save a low quality, artifact riddled, grainy, nasty, noisy video as a lossless format and you would end up with a perfectly preserved low quality, artifact riddled, grainy, nasty, noisy video. There would be no further degradation in quality, but you still have crappy video and it takes a huge amount of space because it is not storing an approximation of the crappy video, but rather the exact data.

If instead you were to use a lossy compression that was "similar" to the quality level of the video, you would introduce even more bad artifacts and lose any semblance of actual signal that was left in the video.

This principal is also referred to as generations of loss. It is mostly a carry over from the analog days when no copy was lossless and thus the numbers of generations of copies had to be kept to a minimum to preserve quality. With the digital era, this problem was greatly alleviated since a digital file can be copied perfectly, however it still lives on in transcoding where there is generations of loss.

This point actually also explains why the SD video files are so much larger than you expect. Most consumers are use to end-user compression formats with relatively small file sizes. These file sizes are great for end users because they don't need to be edited with or re-encoded, however such formats rapidly fall apart when you try to encode them again because of their small file size and large amount of loss (even if it isn't readily obvious to a normal viewer.)

Production quality DV video used a gigabyte or more of storage every 4.7 minutes or so, so it isn't at all unexpected that your high quality SD capture (even if the source itself was crap) is "large" by consumer standards. If you don't need to edit the videos, it is perfectly fine to transcode them as finished videos to a low bit-rate consumption file size. I'd recommend using 2 pass VBR encoding for this purpose, h.264 should do fine and you could use a low bitrate.

Bit rates are ultimately what determine file size in compression. The bit rate defines the amount of data that is allowed to be used to encode the video and roughly put, the compression algorithm will approximate the footage as best it can with the given bit-rate. The lower the bitrate, the smaller the file, but the lower the quality and the less able you will be to re-encode the video in the future without catastrophic quality loss.

As far as what levels you can use without noticeable quality loss for a final output, that really depends on the content and is far too broad for a Q/A site without a sample of video to look at. Compression is an incredibly complex field with lots of options and there are people who do nothing but handle doing compression for their entire living. The colors, the amount of motion, the amount of noise, etc, all have an impact on how much data is actually required to store the video in a way that the end user won't notice artifacts.

Generations of loss was the term that I was looking for. Sometimes I forget that compression is as complex as it seems, so I guess I'll just have to play with settings until I find some settings that will work for archiving the VHS rips. Thanks! This one was a tough tie— The other answer was really great on giving some easy to understand information on how video compression works.
–
omega.richardJul 23 '14 at 1:38

To understand this you need to understand how codecs actually work.
A plain uncompressed video frame e.g. a single picture is pretty large. I'm talking about a bitmap, not a lossless encoded video, no encoding at all, just plain pixel information.

Here simple example of a Full HD frame for some perspective:
We have a resolution of 1920x1080 that equals to 2073600 pixels. Now each pixels has three color values. That is usually an 8bit value per color (though it could also be 10/12/16 or 32bit). So we have 2073600*3*8 which equals to 49766400 bits. To get bytes we have to divide by 8 and we get 6220800 Byte or 5.93MB.
So that's nearly 6MB per frame, for a 30FPS video that's 178MB per second!
A 10 minute video would be an insane 104GB.

Now luckily some really smart people developed all kinds of algorithms to define a frame in a smarter/shorter way while still containing the full information of the original bitmap. That's what we would call a lossless codec, just a different way to save the same information.

So when you record your VHS you capture device is getting every frame as a bitmap that it will encode the way you tell it to. So even while VHS uses analog compression and has a overall bad quality you still end up with the same chunk of information as with a Blu-ray encoded to the same resolution and bit-depth that your capture device is recording your VHS. "Good quality" is only determined by your perception of the picture.

Just keep in mind that when you decode a video file or capture something to play it back in your video player or to encode it with another codec (transcode) you always have this huge raw bitmap as the intermediate step, the "raw" information that you can use to describe it in a new way or simply display it on your monitor.

Now there is a limit to how well you can describe an image in a shorter/smaller way without discarding some information.

Those are the so called lossy codecs like h264. These are even smarter and complex algorithm where a frame isn't just a frame anymore, the encoder looks at several frames and trys to estimate how frames in between of so called "keyframes" will look like based of the information in those keyframes.
They also divide the image into several blocks of information and try to "guess" how they should look. Because we know how we guess information we just need a "clue" so when we decode our frame we can guess pretty accurately how this chunk of information would have looked like but its not a 100% accurate representation of the source information.

Thats why it can be so much smaller than our accurate representation with a lossless codec.

It's also the reason why we had this blocking in heavily compressed MPEG2 encoded videos back in the earlier days of the internet, h.264 handles that in a lot smarter way, hence we don't really have this type of artifacts anymore but we get these weird distortion sometimes when the image smears in this "blocky way", that happens when there is a decoding error and we missed a keyframe to decode the frames between the next keyframe.

To answer your last bit. MP4 with h264 encoded video is generally good idea. Use a tool like Handbrake, it makes the task very easy and you still have a lot of control of the outcome. Either choose a preset or set the "RF" setting to 18 to get a visually lossless video. Meaning you wont see a visual quality difference.

It takes a lot of bits to accurately, or nearly-accurately reproduce the input pixels, regardless of what they contain. The only exception is low-complexity stuff like a screen capture or animation, where big areas are EXACTLY the same colour, and/or at bit-for-bit identical from frame to frame.

The difference between your intuition and real life comes from the fact that it's complexity as measured by a computer, not human-perceived visual complexity / quality, that codecs operate on.

Compressing the noise is where all the bits are going. A round trip through a lossy codec will actually decrease the compressibility of animation, because the blurring and especially blocking artifacts are then input that you're asking the next codec to reproduce.

If there was a way to pick out just the image you want to see from the VHS noise and MPEG blocking artifacts on a typical capture, i.e. find just the complexity you want to keep, and throw away everything else, it would be the Holy Grail of computer video. It would be by definition the perfect denoise filter, as well as something you could build a ridiculously efficient codec around.

All we have now are rough approximations that try to guess what's important to keep based on DCT coefficients or wavelets, and similar measures. (e.g. high energy in higher coefficients means there are probably some edges here, so it would be worse than usual to distort this 16x16 pixel block of the picture).