Data compression

Data compression is the process of encoding data so that it takes less storage space or less transmission time than if the data was not compressed.

This is possible by taking advantage of redundancy in the data.

Data compression is a fundamental topic of computer science.

Compressed data is data that has been passed through a data compression software system.

Uncompressed data is data that has not been passed through a data compression software system.

There are two fundamentally different types of compression:

Lossless data compression loses no data, but noise in a signal may dominate the compression ratio. For symbolic data such as spreadsheets, text, executables, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).

Lossy data compression loses some data irrecoverably. This technique works well for data such as music or pictures, where it is usually possible to discard parts of the signal in a way that is hardly perceptible for humans. Lossy compression methods can compress a signal more tightly than lossless methods. Lossy compression can be used on symbolic data only when converting symbols to symbols that are equivalent in some sense, such as stripping rich text data such as HTML down to plain text, replacing long expressions in a spreadsheet with shorter expressions that produce the same value, or removing debugging symbols from an executable.

Many data compression systems are best viewed with a four stage compression model.

Data compression topics:

multimedia compression

entropy - often cited as data compression lower bound

Common Data compression algorithms:

Huffman coding (simple entropy coding)

arithmetic coding (more advanced entropy coding; encumbered by patents as of October 2001)

LZW

Deflation (used in PKZIP, gzip, and PNG)

MP3

JPEG (image compression using a windowed cosine transform, then quantization, then Huffman coding)

fractal compression

Closely allied with data compression are the fields of coding theory and cryptography. Theoretical background is provided by information theory and algorithmic information theory. When compressing information in the form of signals we often use methods of digital signal processing.

The idea of data compression is deeply connected with statistic inference and particularly with the maximum likelihood principle.