Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Tuomainen, Risto Olli Oskari"

Sort by: Order: Results:

  • Tuomainen, Risto Olli Oskari (2016)
    In nearest neighbors search the task is to find points from a data set that lie close in space to a given query point. To improve on brute force search, that computes distances between the query point and all data points, numerous data structures have been developed. These however perform poorly in high dimensional spaces. To tackle nearest neighbors search in high dimensions it is commonplace to use approximate methods that only return nearest neighbors with high probability. In practice an approximate solution is often as good as an exact one, among other reasons because approximations can be of such a high quality that they are practically indistinguishable from exact solutions. Approximate nearest neighbors search has found applications in many different fields, and can for example be used in the context of recommendation systems. One class of approximate nearest neighbors algorithms is space partitioning methods. These algorithms recursively partition the data set to smaller subsets in order to construct a search structure. Queries can then be performed very efficiently by using this structure to prune data points without needing to evaluate their distances to the query point. A recent proposal belonging to this class of algorithms is multiple random projections trees (MRPT). MRPT uses random projection trees (RP-trees) to prune the set from which nearest neighbors are searched. This thesis proposes a voting algorithm for using multiple RP-trees in nearest neighbors search. We also discuss a further improvement, called mixture method. The performance of these algorithms was evaluated against the previous MRPT algorithm using two moderately high dimensional data sets. Mixture method was found to improve considerably on MRPT in terms of accuracy attained. The results presented in this thesis suggest that the mixture method may potentially be a strong algorithm for nearest neighbors search, especially in very high dimensional spaces.