Skip to main content
Login | Suomeksi | På svenska | In English

A Variant of the Multiple Random Projection Trees Algorithm for Nearest Neighbors Search

Show full item record

Title: A Variant of the Multiple Random Projection Trees Algorithm for Nearest Neighbors Search
Author(s): Tuomainen, Risto Olli Oskari
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Language: English
Acceptance year: 2016
Abstract:
In nearest neighbors search the task is to find points from a data set that lie close in space to a given query point. To improve on brute force search, that computes distances between the query point and all data points, numerous data structures have been developed. These however perform poorly in high dimensional spaces. To tackle nearest neighbors search in high dimensions it is commonplace to use approximate methods that only return nearest neighbors with high probability. In practice an approximate solution is often as good as an exact one, among other reasons because approximations can be of such a high quality that they are practically indistinguishable from exact solutions. Approximate nearest neighbors search has found applications in many different fields, and can for example be used in the context of recommendation systems. One class of approximate nearest neighbors algorithms is space partitioning methods. These algorithms recursively partition the data set to smaller subsets in order to construct a search structure. Queries can then be performed very efficiently by using this structure to prune data points without needing to evaluate their distances to the query point. A recent proposal belonging to this class of algorithms is multiple random projections trees (MRPT). MRPT uses random projection trees (RP-trees) to prune the set from which nearest neighbors are searched. This thesis proposes a voting algorithm for using multiple RP-trees in nearest neighbors search. We also discuss a further improvement, called mixture method. The performance of these algorithms was evaluated against the previous MRPT algorithm using two moderately high dimensional data sets. Mixture method was found to improve considerably on MRPT in terms of accuracy attained. The results presented in this thesis suggest that the mixture method may potentially be a strong algorithm for nearest neighbors search, especially in very high dimensional spaces.


Files in this item

Files Size Format View
tuomainen_gradu_final.pdf 670.7Kb PDF

This item appears in the following Collection(s)

Show full item record