Data compression

From Wikipedia, the free encyclopedia

"Source coding" redirects here. For the term in computer programming, see Source code.

Process of encoding information using fewer bits than the original representation

In signal processing, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.[3]

The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding; encoding done at the source of the data before it is stored or transmitted.[4] Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal.

Compression is useful because it reduces resources required to store and transmit data. Computational resources are consumed in the compression process and, usually, in the reversal of the process (decompression). Data compression is subject to a space–time complexity trade-off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources required to compress and decompress the data.[5][6]

Transcription

This episode is brought to you by Curiosity
Stream.
Hi, I'm Carrie Anne, and welcome to Crash
Course Computer Science!
Last episode we talked about Files, bundles
of data, stored on a computer, that are formatted
and arranged to encode information, like text,
sound or images.
We even discussed some basic file formats,
like text, wave, and bitmap.
While these formats are perfectly fine and
still used today, their simplicity also means
they’re not very efficient.
Ideally, we want files to be as small as possible,
so we can store lots of them without filling
up our hard drives, and also transmit them
more quickly.
Nothing is more frustrating than waiting for
an email attachment to download. Ugh!
The answer is compression, which literally
squeezes data into a smaller size.
To do this, we have to encode data using fewer
bits than the original representation.
That might sound like magic, but it’s actually
computer science!
INTRO
Lets return to our old friend from last episode,
Mr. Pac-man!
This image is 4 pixels by 4 pixels.
As we discussed, image data is typically stored
as a list of pixel values.
To know where rows end, image files have metadata,
which defines properties like dimensions.
But, to keep it simple today, we’re not
going to worry about it.
Each pixel’s color is a combination of three
additive primary colors: red, green and blue.
We store each of those values in one byte,
giving us a range of 0 to 255 for each color.
If you mix full intensity red, green and blue
- that’s 255 for all three values - you
get the color white.
If you mix full intensity red and green, but
no blue (it’s 0), you get yellow.
We have 16 pixels in our image, and each of
those needs 3 bytes of color data.
That means this image’s data will consume
48 bytes of storage.
But, we can compress the data and pack it
into a smaller number of bytes than 48!
One way to compress data is to reduce repeated
or redundant information.
The most straightforward way to do this is
called Run-Length Encoding.
This takes advantage of the fact that there
are often runs of identical values in files.
For example, in our pac-man image, there are
7 yellow pixels in a row.
Instead of encoding redundant data: yellow
pixel, yellow pixel, yellow pixel, and so
on, we can just say “there’s 7 yellow
pixels in a row” by inserting an extra byte
that specifies the length of the run, like
so:
And then we can eliminate the redundant data
behind it.
To ensure that computers don’t get confused
with which bytes are run lengths and which
bytes represent color, we have to be consistent
in how we apply this scheme.
So, we need to preface all pixels with their
run-length.
In some cases, this actually adds data, but
on the whole, we’ve dramatically reduced
the number of bytes we need to encode this
image.
We’re now at 24 bytes, down from 48.
That’s 50% smaller!
A huge saving!
Also note that we haven’t lost any data.
We can easily expand this back to the original
form without any degradation.
A compression technique that has this characteristic
is called lossless compression, because we
don’t lose anything.
The decompressed data is identical to the
original before compression, bit for bit.
Let's take a look at another type of lossless
compression, where blocks of data are replaced
by more compact representations.
This is sort of like “don’t forget to
be awesome” being replaced by DFTBA.
To do this, we need a dictionary that stores
the mapping from codes to data.
Lets see how this works for our example.
We can view our image as not just a string
of individual pixels, but as little blocks
of data.
For simplicity, we’re going to use pixel
pairs, which are 6 bytes long, but blocks
can be any size.
In our example, there are only four pairings:
White-yellow, black-yellow, yellow-yellow
and white-white.
Those are the data blocks in our dictionary
we want to generate compact codes for.
What’s interesting, is that these blocks
occur at different frequencies.
There are 4 yellow-yellow pairs, 2 white-yellow pairs, and 1 each of black-yellow and white-white.
Because yellow-yellow is the most common block,
we want that to be substituted for the most
compact representation.
On the other hand, black-yellow and white-white,
can be substituted for something longer because
those blocks are infrequent.
One method for generating efficient codes
is building a Huffman Tree, invented by David
Huffman while he was a student at MIT in the
1950s.
His algorithm goes like this.
First, you layout all the possible blocks
and their frequencies.
At every round, you select the two with the
lowest frequencies.
Here, that’s Black-Yellow and White-White,
each with a frequency of 1.
You combine these into a little tree... ...which
have a combined frequency of 2, so we record
that.
And now one step of the algorithm done.
Now we repeat the process.
This time we have three things to choose from.
Just like before, we select the two with the
lowest frequency, put them into a little tree,
and record the new total frequency of all
the sub items.
Ok, we’re almost done.
This time it’s easy to select the two items
with the lowest frequency because there are
only two things left to pick.
We combine these into a tree, and now we’re
done!
Our tree looks like this, and it has a very
cool property: it’s arranged by frequency,
with less common items lower down.
So, now we have a tree, but you may be wondering
how this gets us to a dictionary.
Well, we use our frequency-sorted tree to
generate the codes we need by labeling each
branch with a 0 or a 1, like so:
With this, we can write out our code dictionary.
Yellow-yellow is encoded as just a single
0.
White-yellow is encoded as 1 0 (“one zero”)
Black-Yellow is 1 1 0
and finally white-white is 1 1 1.
The really cool thing about these codewords
is that there’s no way to have conflicting
codes, because each path down the tree is
unique.
This means our codes are prefix-free, that
is no code starts with another complete code.
Now, let’s return to our image data and
compress it!
Our first pixel pair, white-yellow, is substituted
for the bits “1 0”.
The next pair is black-yellow, which is substituted
for “1 1 0”.
Next is yellow-yellow with the incredibly
compact substitution of just “0”.
And this process repeats for the rest of the
image:
So instead of 48 bytes of image data ...this
process has encoded it into 14 bits -- NOT
BYTES -- BITS!!
That’s less than 2 bytes of data!
But, don’t break out the champagne quite
yet!
This data is meaningless unless we also save
our code dictionary.
So, we’ll need to append it to the front
of the image data, like this.
Now, including the dictionary, our image data
is 30 bytes long.
That’s still a significant improvement over 48
bytes.
The two approaches we discussed, removing
redundancies and using more compact representations,
are often combined, and underlie almost all
lossless compressed file formats, like GIF,
PNG, PDF and ZIP files.
Both run-length encoding and dictionary coders
are lossless compression techniques.
No information is lost; when you decompress,
you get the original file.
That’s really important for many types of
files.
Like, it’d be very odd if I zipped up a
word document to send to you, and when you
decompressed it on your computer, the text
was different.
But, there are other types of files where
we can get away with little changes, perhaps
by removing unnecessary or less important
information, especially information that human
perception is not good at detecting.
And this trick underlies most lossy compression
techniques.
These tend to be pretty complicated, so we’re
going to attack this at a conceptual level.
Let’s take sound as an example.
Your hearing is not perfect.
We can hear some frequencies of sound better
than others.
And there are some we can’t hear at all,
like ultrasound.
Unless you’re a bat.
Basically, if we make a recording of music,
and there’s data in the ultrasonic frequency
range, we can discard it, because we know
that humans can’t hear it.
On the other hand, humans are very sensitive
to frequencies in the vocal range, like people
singing, so it’s best to preserve quality
there as much as possible.
Deep bass is somewhere in between.
Humans can hear it, but we’re less attuned
to it.
We mostly sense it.
Lossy audio compressors takes advantage of
this, and encode different frequency bands
at different precisions.
Even if the result is rougher, it’s likely
that users won’t perceive the difference.
Or at least it doesn’t dramatically affect
the experience.
And here comes the hate mail from the audiophiles!
You encounter this type of audio compression
all the time.
It’s one of the reasons you sound different
on a cellphone versus in person.
The audio data is being compressed, allowing
more people to take calls at once.
As the signal quality or bandwidth get worse,
compression algorithms remove more data, further
reducing precision, which is why Skype calls
sometimes sound like robots talking.
Compared to an uncompressed audio format, like a WAV or FLAC (there we go, got the audiophiles back)
compressed audio files, like MP3s,
are often 10 times smaller.
That’s a huge saving!
And it’s why I’ve got a killer music collection
on my retro iPod.
Don’t judge.
This idea of discarding or reducing precision
in a manner that aligns with human perception
is called perceptual coding, and it relies
on models of human perception,
which come from a field of study called Psychophysics.
This same idea is the basis of lossy compressed
image formats, most famously JPEGs.
Like hearing, the human visual system is imperfect.
We’re really good at detecting sharp contrasts,
like the edges of objects, but our perceptual
system isn’t so hot with subtle color variations.
JPEG takes advantage of this by breaking images
up into blocks of 8x8 pixels, then throwing
away a lot of the high-frequency spatial data.
For example, take this photo of our directors
dog - Noodle.
So cute!
Let’s look at patch of 8x8 pixels.
Pretty much every pixel is different from
its neighbor, making it hard to compress with
loss-less techniques because there’s just
a lot going on.
Lots of little details.
But human perception doesn’t register all
those details.
So, we can discard a lot of that detail, and
replace it with a simplified patch like this.
This maintains the visual essence, but might
only use 10% of the data.
We can do this for all the patches in the
image and get this result.
You can still see it’s a dog, but the image
is rougher.
So, that’s an extreme example, going from
a slightly compressed JPEG to a highly compressed
one, one-eighth the original file size.
Often, you can get away with a quality somewhere
in between, and perceptually, it’s basically
the same as the original.
The one on the left is one-third the file
size of the one on the right.
That’s a big savings for essentially the
same thing.
Can you tell the difference between the two?
Probably not, but I should mention that video
compression plays a role in that too, since
I’m literally being compressed in a video
right now.
Videos are really just long sequences of images,
so a lot of what I said about them applies
here too.
But videos can do some extra clever stuff,
because between frames, a lot of pixels are
going to be the same.
Like this whole background behind me!
This is called temporal redundancy.
We don’t need to re-transmit those pixels
every frame of the video.
We can just copy patches of data forward.
When there are small pixel differences, like
the readout on this frequency generator behind
me, most video formats send data that encodes
just the difference between patches, which
is more efficient than re-transmitting all
the pixels afresh, again taking advantage
of inter-frame similarity.
The fanciest video compression formats go
one step further.
They find patches that are similar between
frames, and not only copy them forward, with
or without differences, but also can apply
simple effects to them, like a shift or rotation.
They can also lighten or darken a patch between
frames.
So, if I move my hand side to side like this
the video compressor will identify the similarity,
capture my hand in one or more patches, then
just move these patches around between frames.
You’re actually seeing my hand from the
past… kinda freaky, but it uses a lot less data.
MPEG-4 videos, a common standard, are often
20 to 200 times smaller than the original,
uncompressed file.
However, encoding frames as translations and
rotations of patches from previous frames
can go horribly wrong when you compress too
heavily, and there isn’t enough space to
update pixel data inside of the patches.
The video player will forge ahead, applying
the right motions, even if the patch data
is wrong.
And this leads to some hilarious and trippy
effects, which I’m sure you’ve seen.
Overall, it’s extremely useful to have compression techniques for all the types of data I discussed today.
(I guess our imperfect vision and hearing
are “useful,” too.)
And it’s important to know about compression
because it allows users to store pictures,
music, and videos in efficient ways.
Without it, streaming your favorite Carpool
Karaoke videos on YouTube would be nearly
impossible, due to bandwidth and the economics
of transmitting that volume of data for free.
And now when your Skype calls sound like they’re
being taken over by demons, you’ll know
what’s really going on.
I’ll see you next week.
Hey guys, this week’s episode was brought
to you by CuriosityStream which is a streaming
service full of documentaries and non­fiction
titles from some really great filmmakers,
including exclusive originals.
Now I normally give computer science recommendations since this is Crash Course Computer Science and all
and Curiosity Stream has a ton of
great ones. But you absolutely have to check
out “Miniverse” starring everyone’s
favorite space-station-singing-Canadian astronaut,
Chris Hadfield, as he takes a roadtrip across
the Solar System scaled down the the size
of the United States.
It’s basically 50 minutes of Chris and his
passengers geeking out about our amazing planetary
neighbors and you don’t want to miss it.
So get unlimited access today, and your first two months are free if you sign up at curiositystream.com/crashcourse
and use the promo code "crashcourse" during the sign up process.

Lossless

Lossless data compressionalgorithms usually exploit statistical redundancy to represent data without losing any information, so that the process is reversible. Lossless compression is possible because most real-world data exhibits statistical redundancy. For example, an image may have areas of color that do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be encoded as "279 red pixels". This is a basic example of run-length encoding; there are many schemes to reduce file size by eliminating redundancy.

The Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage.[7]DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow. In the mid-1980s, following work by Terry Welch, the Lempel–Ziv–Welch (LZW) algorithm rapidly became the method of choice for most general-purpose compression systems. LZW is used in GIF images, programs such as PKZIP, and hardware devices such as modems.[8] LZ methods use a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded. Grammar-based codes like this can compress highly repetitive input extremely effectively, for instance, a biological data collection of the same or closely related species, a huge versioned document collection, internet archival, etc. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string. Other practical grammar compression algorithms include Sequitur and Re-Pair.

The strongest modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling.[9] In a further refinement of the direct use of probabilistic modelling, statistical estimates can be coupled to an algorithm called arithmetic coding. Arithmetic coding is a more modern coding technique that uses the mathematical calculations of a finite-state machine to produce a string of encoded bits from a series of input data symbols. It can achieve superior compression compared to other techniques such as the better-known Huffman algorithm. It uses an internal memory state to avoid the need to perform a one-to-one mapping of individual input symbols to distinct representations that use an integer number of bits, and it clears out the internal memory only after encoding the entire string of data symbols. Arithmetic coding applies especially well to adaptive data compression tasks where the statistics vary and are context-dependent, as it can be easily coupled with an adaptive model of the probability distribution of the input data. An early example of the use of arithmetic coding was in an optional (but not widely used) feature of the JPEG image coding standard.[10] It has since been applied in various other designs including H.263, H.264/MPEG-4 AVC and HEVC for video coding.[11]

Lossy

In the late 1980s, digital images became more common, and standards for lossless image compression emerged. In the early 1990s, lossy compression methods began to be widely used.[8] In these schemes, some loss of information is accepted as dropping nonessential detail can save storage space. There is a corresponding trade-off between preserving information and reducing size. Lossy data compression schemes are designed by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to the variations in color. JPEG image compression works in part by rounding off nonessential bits of information.[12] A number of popular compression formats exploit these perceptual differences, including psychoacoustics for sound, and psychovisuals for images and video.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the audio signal. Compression of human speech is often performed with even more specialized techniques; speech coding, or voice coding, is sometimes distinguished as a separate discipline from audio compression. Different audio and speech compression standards are listed under audio coding formats. Voice compression is used in internet telephony, for example, audio compression is used for CD ripping and is decoded by the audio players.[9]

Machine learning

There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution) while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as a justification for using data compression as a benchmark for "general intelligence."[14][15][16]

Feature space vectors

However a new, alternative view can show compression algorithms implicitly map strings into implicit feature space vectors, and compression-based similarity measures compute similarity within these feature spaces. For each compressor C(.) we define an associated vector space ℵ, such that C(.) maps an input string x, corresponds to the vector norm ||~x||. An exhaustive examination of the feature spaces underlying all compression algorithms is precluded by space; instead, feature vectors chooses to examine three representative lossless compression methods, LZW, LZ77, and PPM.[17]

Data differencing

Data compression can be viewed as a special case of data differencing:[18][19] Data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing." This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.

When one wishes to emphasize the connection, one may use the term differential compression to refer to data differencing.

Uses

Audio

Audio data compression, not to be confused with dynamic range compression, has the potential to reduce the transmission bandwidth and storage requirements of audio data. Audio compression algorithms are implemented in software as audio codecs. Lossy audio compression algorithms provide higher compression at the cost of fidelity and are used in numerous audio applications. These algorithms almost all rely on psychoacoustics to eliminate or reduce fidelity of less audible sounds, thereby reducing the space required to store or transmit them.[2]

The acceptable trade-off between loss of audio quality and transmission or storage size depends upon the application. For example, one 640 MB compact disc (CD) holds approximately one hour of uncompressed high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in the MP3 format at a medium bit rate. A digital sound recorder can typically store around 200 hours of clearly intelligible speech in 640 MB.[20]

Lossless audio compression produces a representation of digital data that decompress to an exact digital duplicate of the original audio stream, unlike playback from lossy compression techniques such as Vorbis and MP3. Compression ratios are around 50–60% of original size,[21] which is similar to those for generic lossless data compression. Lossless compression is unable to attain high compression ratios due to the complexity of waveforms and the rapid changes in sound forms. Codecs like FLAC, Shorten, and TTA use linear prediction to estimate the spectrum of the signal. Many of these algorithms use convolution with the filter [-1 1] to slightly whiten or flatten the spectrum, thereby allowing traditional lossless compression to work more efficiently. The process is reversed upon decompression.

When audio files are to be processed, either by further compression or for editing, it is desirable to work from an unchanged original (uncompressed or losslessly compressed). Processing of a lossily compressed file for some purpose usually produces a final result inferior to the creation of the same compressed file from an uncompressed original. In addition to sound editing or mixing, lossless audio compression is often used for archival storage, or as master copies.

Lossy audio compression

Comparison of spectrograms of audio in an uncompressed format and several lossy formats. The lossy spectrograms show bandlimiting of higher frequencies, a common technique associated with lossy audio compression.

Lossy audio compression is used in a wide range of applications. In addition to the direct applications (MP3 players or computers), digitally compressed audio streams are used in most video DVDs, digital television, streaming media on the internet, satellite and cable radio, and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (5–20% of the original size, rather than 50–60%), by discarding less-critical data.[22]

The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying perceptually irrelevant sounds, that is, sounds that are very hard to hear. Typical examples include high frequencies or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not at all.

Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital generation loss). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3) as a megabyte can store about a minute's worth of music at adequate quality.

Coding methods

To determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. Once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components calculated using the absolute threshold of hearing and the principles of simultaneous masking—the phenomenon wherein a signal is masked by another signal separated by frequency—and, in some cases, temporal masking—where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.[23]

Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) before quantization. LPC may be thought of as a basic perceptual coding technique: reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.[22]

Lossy formats are often used for the distribution of streaming audio or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.[22]

Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is a two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples that must be analysed before a block of audio is processed. In the minimum case, latency is zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication)).

Speech encoding

Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using a relatively low bit rate.

If the data to be compressed is analog (such as a voltage that varies with time), quantization is employed to digitize it into numbers (normally integers). This is referred to as analog-to-digital (A/D) conversion. If the integers generated by quantization are 8 bits each, then the entire range of the analog signal is divided into 256 intervals and all the signal values within an interval are quantized to the same number. If 16-bit integers are generated, then the range of the analog signal is divided into 65,536 intervals.

This relation illustrates the compromise between high resolution (a large number of analog intervals) and high compression (small integers generated). This application of quantization is used by several speech compression methods. This is accomplished, in general, by some combination of two approaches:

Only encoding sounds that could be made by a single human voice.

Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.

History

A literature compendium for a large variety of audio coding systems was published in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this collection documented an entire variety of finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.[24] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.

The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an engineering professor at the University of Buenos Aires.[25] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,[26] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. Twenty years later, almost all the radio stations in the world were using similar technology manufactured by a number of companies.

Video

Video compression is a practical implementation of source coding in information theory. In practice, most video codecs are used alongside audio compression techniques to store the separate but complementary data streams as one combined package using so-called container formats.[27]

Encoding theory

Video data may be represented as a series of still image frames. Such data usually contains abundant amounts of spatial and temporal redundancy. Video compression algorithms attempt to reduce redundancy and store information more compactly.

Most video compression formats and codecs exploit both spatial and temporal redundancy (e.g. through difference coding with motion compensation). Similarities can be encoded by only storing differences between e.g. temporally adjacent frames (inter-frame coding) or spatially adjacent pixels (intra-frame coding).
Inter-frame compression (a temporal delta encoding) is one of the most powerful compression techniques. It (re)uses data from one or more earlier or later frames in a sequence to describe the current frame. Intra-frame coding, on the other hand, uses only data from within the current frame, effectively being still-image compression.[23]

A class of specialized formats used in camcorders and video editing use less complex compression schemes that restrict their prediction techniques to intra-frame prediction.

Usually video compression additionally employs lossy compression techniques like quantization that reduce aspects of the source data that are (more or less) irrelevant to the human visual perception by exploiting perceptual features of human vision. For example, small differences in color are more difficult to perceive than are changes in brightness. Compression algorithms can average a color across these similar areas to reduce space, in a manner similar to those used in JPEG image compression.[10] As in all lossy compression, there is a trade-off between video quality and bit rate, cost of processing the compression and decompression, and system requirements. Highly compressed video may present visible or distracting artifacts.

Other methods than the prevalent DCT-based transform formats, such as fractal compression, matching pursuit and the use of a discrete wavelet transform (DWT), have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness of such methods.[23]

Inter-frame coding

Inter-frame coding works by comparing each frame in the video with the previous one. Individual frames of a video sequence are compared from one frame to the next, and the video compression codec sends only the differences to the reference frame. If the frame contains areas where nothing has moved, the system can simply issue a short command that copies that part of the previous frame into the next one. If sections of the frame move in a simple manner, the compressor can emit a (slightly longer) command that tells the decompressor to shift, rotate, lighten, or darken the copy. This longer command still remains much shorter than intraframe compression. Usually the encoder will also transmit a residue signal which describes the remaining more subtle differences to the reference imagery. Using entropy coding, these residue signals have a more compact representation than the full signal. In areas of video with more motion, the compression must encode more data to keep up with the larger number of pixels that are changing. Commonly during explosions, flames, flocks of animals, and in some panning shots, the high-frequency detail leads to quality decreases or to increases in the variable bitrate.

Hybrid block-based transform formats

Processing stages of a typical video encoder

Today, nearly all commonly used video compression methods (e.g., those in standards approved by the ITU-T or ISO) share the same basic architecture that dates back to H.261 which was standardized in 1988 by the ITU-T. They mostly rely on the DCT, applied to rectangular blocks of neighboring pixels, and temporal prediction using motion vectors, as well as nowadays also an in-loop filtering step.

In the prediction stage, various deduplication and difference-coding techniques are applied that help decorrelate data and describe new data based on already transmitted data.

Then rectangular blocks of (residue) pixel data are transformed to the frequency domain to ease targeting irrelevant information in quantization and for some spatial redundancy reduction. The discrete cosine transform (DCT) that is widely used in this regard was introduced by N. Ahmed, T. Natarajan and K. R. Rao in 1974.[29]

In the main lossy processing stage that data gets quantized in order to reduce information that is irrelevant to human visual perception.

In the last stage statistical redundancy gets largely eliminated by an entropy coder which often applies some form of arithmetic coding.

In an additional in-loop filtering stage various filters can be applied to the reconstructed image signal. By computing these filters also inside the encoding loop they can help compression because they can be applied to reference material before it gets used in the prediction process and they can be guided using the original signal. The most popular example are deblocking filters that blur out blocking artefacts from quantization discontinuities at transform block boundaries.

History

All basic algorithms of today's dominant video codec architecture have been invented before 1979.
In 1950, the Bell Labs filed the patent on DPCM[30] which soon was applied to video coding. Entropy coding started in the 1940s with the introduction of Shannon–Fano coding[31] on which the widely used Huffman coding is based that was developed in 1950;[32] the more modern context-adaptive binary arithmetic coding (CABAC) was published in the early 1990s.[33] Transform coding (using the Hadamard transform) was introduced in 1969,[34] the popular discrete cosine transform (DCT) appeared in 1974 in scientific literature.[29][35]
The ITU-T's standard H.261 from 1988 introduced the prevalent basic architecture of video compression technology.

Genetics

Genetics compression algorithms are the latest generation of lossless algorithms that compress data (typically sequences of nucleotides) using both conventional compression algorithms and genetic algorithms adapted to the specific datatype. In 2012, a team of scientists from Johns Hopkins University published a genetic compression algorithm that does not use a reference genome for compression. HAPZIPPER was tailored for HapMap data and achieves over 20-fold compression (95% reduction in file size), providing 2- to 4-fold better compression and in much faster time than the leading general-purpose compression utilities. For this, Chanda, Elhaik, and Bader introduced MAF based encoding (MAFE), which reduces the heterogeneity of the dataset by sorting SNPs by their minor allele frequency, thus homogenizing the dataset.[36] Other algorithms in 2009 and 2013 (DNAZip and GenomeZip) have compression ratios of up to 1200-fold—allowing 6 billion basepair diploid human genomes to be stored in 2.5 megabytes (relative to a reference genome or averaged over many genomes).[37][38] For a benchmark in genetics/genomics data compressors, see [39]

Outlook and currently unused potential

It is estimated that the total amount of data that is stored on the world's storage devices could be further compressed with existing compression algorithms by a remaining average factor of 4.5:1.[citation needed] It is estimated that the combined technological capacity of the world to store information provides 1,300 exabytes of hardware digits in 2007, but when the corresponding content is optimally compressed, this only represents 295 exabytes of Shannon information.[40]