Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Kähkönen, Harri"

Sort by: Order: Results:

  • Kähkönen, Harri (2023)
    The volume of data generated by high-throughput DNA sequencing has grown to a magnitude that leads to substantial computational challenges in storing and searching the data. To tackle this problem, various computational methodologies have been developed in recent years to space-efficiently index collections of data sets and enable efficient searches. One of the most recent indexing methods, Spectral Burrows-Wheeler Transform (SBWT), presents all distinct k-mers of a DNA sequence using only 4 bits and a small additional space for the rank data structures per k-mer. In addition to being space-efficient, it also enables k-mer membership queries in linear time relative to k, and constant time relative to the number of distinct k-mers in the sequence. The queries rely on rank queries over bit vectors. Experiments run on a single CPU thread have shown that in one second, hundreds of thousands of k-mer membership queries can be performed over SBWT. By parallelizing the queries on a CPU, it is possible to execute millions of queries per second. However, Graphic Processing Units (GPUs) have much more parallelization potential. The main contribution of the thesis is an implementation of the k-mer membership queries over SBWT with GPU computing. Optimizing the queries to be performed on a GPU made it possible to perform over a billion queries per second. Furthermore, the thesis presents a new enhancement for the queries over SBWT called presearching, which doubles the speed of the original SBWT search query. The rank query needed for the membership queries is implemented using space-efficient poppy rank data structures, and its derivative cumulative-poppy data structure which is one of the contributions of the thesis.