If video were a sequence of bitmap images. Your math is already off for png/jpg image files.
–
Daniel Beck♦May 6 '12 at 18:24

The two existing answers don't emphasize the salient attribute about video compression: most (if not all) all video codecs employ lossy compression. That is, some picture information is discarded when the raw video is compressed and encoded. The amount or degree of discarded and lost image information/detail is determined by a quality factor. As for audio compression, there are both lossy and lossless compression techniques.
–
sawdustMay 6 '12 at 23:19

@sawdust: They don't? I thought my third paragraph made that fairly clear. Anyway, giving too much information is sometimes not so good; I believe in giving enough to allow the asker to learn more, if desired. Otherwise, I could say your post doesn't emphasize why someone would pick one compressor over another, or why there are so many different methods, etc, etc.
–
Marty FriedMay 6 '12 at 23:48

@sawdust You're correct, this was somewhat buried in the JPEG part. I added a little more details.
–
slhck♦May 7 '12 at 9:55

2 Answers
2

What you've calculated is the bitrate for a raw, uncompressed video. You typically won't find these except in research or other specialized applications. Even broadcasters use compressed video, albeit at a much higher bitrate than your typical YouTube video.

So, video quality has a lot to do with how the video was compressed. The more you compress it, the less bits it takes per frame. Also, the more you compress, the worse the quality is. Now, some videos are much easier to compress than others – in essence, this is why they have a lower bitrate even though they have the same resolution and framerate.

In order to understand why this is, you need to be aware of the two main principles video compression uses. These are called "spatial" and "temporal redundancy".

Spatial redundancy

Spatial redundancy exists in images that show natural content. This is the reason JPEG works so well — it compresses image data because blocks of pixels can be coded together. These are 8 × 8 pixels, for example. These are called "macroblocks".

Modern video codecs do the same: They basically use similar algorithms to JPEG in order to compress a frame, block by block. So you don't store bits per pixel anymore, but bits per macroblock, because you "summarize" pixels into larger groups. By summarizing them, the algorithm will also discard information that is not visible to the human eye — this is where you can reduce most of the bitrate. It works by quantizing the data. This will retain frequencies that are more perceivable and "throw away" those we can't see. Quantizing factor is expressed as "QP" in most codecs, and it's the main control knob for quality.

You can now even go ahead and predict macroblocks from macroblocks that have been previously encoded in the same image. This is called intra prediction. For example, a part of a grey wall was already encoded in the upper left corner of the frame, so we can use that macroblock in the same frame again, for example for the macroblock right next to it. We will just store the difference it had to the previous one and save data. This way, we don't have to encode two macroblocks that are very similar to each other.

Why does bitrate change for same image size? Well, some images are easier to encode than others. The higher the spatial activity, the more you actually have to encode. Smooth textures take up less bits than detailed ones. The same goes for intra prediction: A frame of a grey wall will allow you to use one macroblock to predict all others, whereas a frame of flowing water might not work that well.

Temporal redundancy

This exists because a frame following another frame is probably very similar to its predecessor. Mostly, just a tiny bit changes, and it wouldn't make sense to fully encode it. What video encoders do is just encode the difference between two subsequent frames, just like they can do for macroblocks.

Taking an example from Wikipedia's article on motion compensation, let's say this is your original frame:

Then the difference to the next frame is just this:

The encoder now only stores the actual differences, not the pixel-by-pixel values. This is why the bits used for each frame are not the same every time. These "difference" frames depend on a fully encoded frame, and this is why there are at least two types of frames for modern codecs:

You occasionally need to insert I-frames into a video. The actual bitrate depends also on the number of I-frames used. Moreover, the more difference in motion there is between two subsequent frames, the more the encoder has to store. A video of "nothing" moving will be easier to encode than a sports video, and use less bits per frame.

I believe your math is actually correct, but there is a little more to it; compression is the missing link here.

You calculated the uncompressed bit rate, and came up with the reason that compression exists. The bit rates become impossibly large with uncompressed video. So, they compress the video at the source, and uncompress it at the receiver, and then the bit rate becomes manageable. You just need a fast enough decompressor, which may be hardware or software.

So, the issue becomes how much compression can be tolerated - it's not lossless, usually, so you are losing information, but they try to make it intelligent enough to lose the less important data that won't be so noticeable. It usually is fairly easy until there is a lot of motion, then it becomes more complicated.

Edit: Forgot to add, but the parts that implement the compression method is the codec; I noticed that you used this as a tag in your post.