Skip to main content
Login | Suomeksi | På svenska | In English

Effects of Dynamic Parameter Discovery on Lossless Compression Algorithms

Show full item record

Title: Effects of Dynamic Parameter Discovery on Lossless Compression Algorithms
Author(s): Lübbers, Henning
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Discipline: Computer science
Language: English
Acceptance year: 2012
Abstract:
General-purpose lossless data compression continues to be an important aspect of the daily use of computers, and a multitude of methods and corresponding compression programs has emerged since information theory was established as a distinct area of research by C.E. Shannon shortly after World War II. Shannon and others discovered several theoretical bounds of data compression and it is, for instance, nowadays known that there can be neither a compressor that can compress all possible inputs, nor any mechanism that compresses at least some inputs while preserving the lengths of the incompressible ones. Although it is therefore indisputable that any compressor must necessarily expand some inputs, it is nonetheless possible to limit the expansion of any input to a constant number of bits in worst case. In order to determine how the established theoretical bounds relate to existing compression programs, I examined two popular compressors, GZip and BZip2, and concluded that their behaviour is not optimal in all respects as they may expand inputs in worst case more than theoretically necessary. On the other hand, the examined programs provide very good compression for most realistic inputs rather swiftly, a characteristic that is most likely appreciated by most computer users. Motivated by a review of the most fundamental bounds of data compression, i.e. Kolmogorov Complexity, Entropy and the Minimum Description Length Principle, and further encouraged by our analysis of GZip and BZip2, I propose a generic, pipelined architecture in this thesis that can — at least in theory - be employed to achieve optimal compression in two passes over the input to be compressed. I subsequently put the proposed architecture directly to the test, and use it to substantiate my claim that the performance of compression (boosting) methods can be improved if they are configured for each input anew with a dynamically discovered set of optimal parameters. In a simple empirical study I use Huffman Coding, a classic entropy-based compression method, as well as Move-To-Front Coding (MTF), a compression boosting method designed to exploit locality among source symbols, to demonstrate that the choice of implied source alphabet influences the achieved compression ratio and that different test inputs require different source alphabets to achieve optimal compression.


Files in this item

Files Size Format View
Lubbers.pdf 1.032Mb PDF

This item appears in the following Collection(s)

Show full item record