Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "Relative Lempel-Ziv"

Sort by: Order: Results:

  • Kivivuori, Eve (2023)
    In this thesis, we discuss the Relative Lempel-Ziv (RLZ) lossless compression algorithm, our implementation of it, and the performance of RLZ in comparison to more traditional lossless compression programs such as gzip. Like the LZ77 compression algorithm, the RLZ algorithm compresses its input by parsing it into a series of phrases, which are then encoded as a position+length number pair describing the location of the phrase within the text. Unlike ordinary LZ77, where these pairs refer to earlier points in the same text and thus decompression must happen sequentially, in RLZ the pairs point to an external text called the dictionary. The benefit of this approach is faster random access to the original input given its compressed form: with RLZ, we can rapidly (in linear time with respect to the compressed length of the text) begin decompression from anywhere. With non-repetitive data, such as the text of a single book, website, or one version of a program's source code, RLZ tends to perform poorer than traditional compression methods, both in terms of compression ratio and in terms of runtime. However, with very similar or highly repetitive data, such as the entire version history of a Wikipedia article or many versions of a genome sequence assembly, RLZ can compress data better than gzip and approximately as well as xz. Dictionary selection requires care, though, as compression performance relies entirely on it.