INTRODUCTION TO DATA COMPRESSION

 

 

Data Compression is a technique which allows us to squeeze or compress as much information on an existing WAN circuit. Compression improves the throughput of the network, and is a way to free up bandwidth.

 

Bandwidth in Wide Area Networks has always been expensive. More importantly mission-critical applications now require more bandwidth than ever before, demand for higher bandwidth is constantly growing, enterprises need to find cost effective ways to solve this issue, and one of the alternatives is data compression.

Data compression can significantly decrease the size of a frame thus reducing the network traffic over the network.

 

“Data compression works by identifying patterns in a stream of data, and choosing a more efficient method of representing the same information. Essentially, an algorithm is applied to the data to remove as much redundancy as possible.” [i]

 

The metric used to measure data compression is the "compression ratio". This shows how effective an algorithm is at compressing data.

 

There are many algorithms that are used to compress data although some are more effective depending on the source of data used. For example an algorithm can be effective compressing an image file but not as efficient when compressing an audio file

 

Another important point to be noted here is Shannon’s Limit law which states that there is a limit that a data source can be compressed; after that point is crossed it is impossible to accurately decompress the data.

 

Compression is a technique designed to gain as much bandwidth from an existing Internetwork. Compression like queuing aims to provide critical time to plan and implement network updates as well as improving network utilisation. However by executing a compression algorithm restrain is placed on the router as it needs more time to process and uses more CPU cycles.

 

As WAN utilisation is reduced CPU utilisation of the router is increased considerably, as you can see there is a trade off between these two, although it can depend on the compression algorithm employed.

 

It is recommended to examine the possible effects of compression before it is implemented, it is also important to find out the current CPU utilisation of the router. By typing the command (show cpu process) we can see the state of the router, if it is already running at 80% +  compression could cause the router to crash bringing the whole network down.

 

Compression is best applied on slower links, since the process of compressing and decompressing also takes time, it would not be very useful on faster links.

 

 

Lossless Compression

 

Lossless data compression is used when the data has to be uncompressed exactly as it was before compression. Data compression techniques that generate exact copies of the original data upon decompression are typically classified as loss-less compression algorithms. Text files are stored using lossless techniques, since losing a single character can seriously change the meaning of a word or not make sense at all.

 

However, there are strict limits to the amount of compression that can be obtained with lossless compression. Lossless compression ratios are generally in the range of 2:1 to 8:1.

 

Lossless compression methods can be categorized depending on the type of file that they have been designed to compress. There are three main types of files: images, sound, and text.

In theory any algorithm should be able to compress any type of data, this is true although, they are unable to significantly compress a file unless is the one they have designed to work with.

 

Most lossless compression methods use two forms of algorithms, a statistical model, which is used for the input data, and another which maps the input data with the bit strings, in this way data that is more frequent will produce a shorter output compared to the more infrequent data.

 

Lossless compression is used in commonly used programs such as: ZIP file format, used by PKZIP, WinZip and Mac OS 10.3, and the Unix programs bzip2, gzip and compress. Other popular formats include Stuffit and RAR

 

In the case of general-purpose networking and communications, data integrity is clearly of vital importance and there is no point beyond which data loss is permissible. Therefore, all network compression technologies must be based solely on loss-less compression.

 

Lossy Compression

 

A “lossy” compression algorithm is one that does not necessarily produce an exact copy of the input data after compression and decompression. Since lossy compression techniques are not required to retain all of the information content of the original data, they are able to reduce the input data to a size that is smaller than the limit set by Shannon’s law.

 

Most image, video, and audio compression algorithms are lossy in nature. This is because our audio and visual senses have a limited range of resolution beyond which finer details and differences are no longer distinguishable to our eyes or ears. By eliminating these imperceptible differences, lossy compression techniques can typically achieve reduction ratios that exceed those of loss-less techniques.

 

Lossy compression is a familiar type of compression and is widely used on the Internet particularily in streaming media and telephony applications. These methods are normally referred as codecs.

 

 

There are rwo types of lossy compression codecs:

 

 

 

 

 

 

Statistical vs. Dictionary compression

 

Statistical compression algorithms are usually employed in single applications where the data is predictive, and consistent, since traffic through networks is inconsistent and unpredictable, it is not very suitable for compressing data on routers.

 

Dictionary compression algorithms in the other hand are better suited to compressing data on routers. They take advantage of redundancy in the data, and replace them with codes. The symbols represented by the codes are stored in the “dictionary”. This approach is more responsive to data variations and offers more flexibility, which is important when transmitting over Wide Area Networks.

The “dictionary” can also adapt to any data variation to suit the needs of the traffic, by using a larger size dictionary we can improve the compression ratio.

 

 

 


 

[i] http://www.cisco.com/warp/public/116/wan_compression_faq.html

 

17/12/2004

 

 

[ii] http://en.wikipedia.org/wiki/Lossy_data_compression

 

24/12/2004