Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Laine, Matti (2018)
    Since data-driven business and decision making have ben trending, data has found to be a valuable part of organizations capital. Centralized data warehouse has typically been a solution for organizations to collect, manage and exploit data. While data volumes have gained, also data warehouse technologies have evolved over time and technical architectures have become more complex. In the process of that, metadata has become more essential instrument for managing large-scale data warehouse and analytics environments. Though metadata-based data warehouse architecture and automation have become quite common concepts, there are still potential areas and subsets of metadata left that has been quite out of the spot. One of those is metadata generated by the usage of data warehouse and analytics platform. Capability to measure a product and its usage is one of the key things to improve product technically, but also get insights to the ways that data has been consumed by end users. Before usage metadata and metrics can be analysed and results can be exploited, data must be collected and stored in a format that facilitates its further utilization purposes. That raises questions: how usage-related metadata and metrics should be modelled and stored to facilitate its utilization with other metadata of the data warehouse environment? How that collected usage metadata should be managed? Depending on the technical environment and setup, how usage metadata and metrics can be collected in the first hand? As a result of this study, design process to model database usage by analysing requirements and conceptualizing usage in a hierarchical model is presented. Also logical level reference model for usage metadata and methods for usage metadata management are presented. In addition, concepts for usage metadata and metrics collection process are discussed in the context of Amazon Redshift data warehouse service. Though presented reference model is an example outcome of the design process, phases of the design process can be generalized and exploited when designing usage metadata model in different use cases and technical environments.
  • Suontausta, Kati (2012)
    Henkivakuutuksia käsiteltäessä tarkastelun alla on aina vakuutetun jäljellä oleva elinaika, jota kuvaa positiivinen satunnaismuuttuja T. Usean vakuutetun henkivakuutuksen ollessa kyseessä, oletetaan, että vakuutettujen jäljellä olevat elinajat ovat toisistaan riippumattomia. Toinen henkivakuutusten käsittelyssä olennainen käsite on kuolevuus(intensiteetti), jonka avulla saadaan laskettua todennäköisyyksiä vakuutetun jäljellä olevalle elinajalle. Henkivakuutuslaskennassa vakuutuskaudet voivat olla pitkiä, jopa useita kymmeniä vuosia. Yhtiön kannalta tärkeää on, että vakuutuksen antaminen on kannattavaa. Perusperiaatteena vakuutusyhtiöllä on aina ekvivalenssiperiaate eli vakuutusmaksujen täytyy vastata vakuutetulle maksettavia korvauksia koko vakuutuskauden ajalta. Koska maksuhetkien välillä saattaa olla paljonkin eroa, täytyy vakuutuksien hinnoittelussa ottaa huomioon korkoutuvuus. Usean vakuutetun elämänvaravakuutuksessa korvaus S maksetaan vakuutuskauden lopussa hetkellä n. Korvauksen suuruus riippuu siitä, ketkä vakuutetuista ovat tällöin elossa. Yleensä ensin tiedetään halutun korvauksen suuruus S. Vakuutusmaksun P suuruus saadaan laskemalla hetkeen nolla diskontatun korvauksen S odotusarvo. Usean vakuutetun kuolemanvaravakuutuksessa korvaus maksetaan aina kuoleman sattuessa ennen hetkeä n. Korvaushetkiä voi siis olla yhtä monta kuin on vakuutettujakin. Kuolemanvaravakuutuksen osalta periaate on sama kuin elämänvaravakuutuksessakin, mutta käytännössä laskeminen on paljon haastavampaa, johtuen nimenomaan maksuhetkien lukumäärästä ja ajankohdista. Riskien hallitsemiseksi yhtiölle tärkeä käsite on vastuuvelka, joka on määritelmän mukaan tulevien korvausten ja tulevien vakuutusmaksujen nykyarvojen erotus. Vastuuvelka kertoo siis sen, kuinka paljon yhtiöllä pitää olla varallisuutta, jotta se selviäisi tulevaisuudessa tapahtuvien vakuutustapahtumien seurauksena maksettavista korvauksista. Usean vakuutetun henkivakuutuksia käsiteltäessä vastuuvelan suuruuteen vaikuttaa erityisesti se, missä tilassa prosessi on tarkasteluhetkellä eli ketkä vakuutetuista on elossa ja ketkä kuolleet. Tärkeä apuväline vastuuvelan suuruuden ennustamisessa on Thielen yhtälö, joka kuvaa vastuuvelan muutoksia. Thielen yhtälöä muodostettaessa tarkastellaan virtauskaavion avulla millaisia muutoksia vakuutettujen tiloissa mahdollisesti tapahtuu pienellä aikavälillä. Mahdollisten tapahtumien seurauksena maksettavien korvausten ja maksujen suuruus painotetaan vastaavan tapahtuman todennäköisyydellä, jolloin summasta saadaan muodostettua Thielen differentiaaliyhtälö. Usean vakuutetun henkivakuutuksia on mahdollista lähestyä myös Markov -prosessin näkökulmasta. Tällöin oletetaan, että prosessi siirtyy tilasta toiseen intensiteeteillä, joka vastaa vakuutettujen kuolevuuksia. Nyt prosessin siirtymätodennäköisyydet saadaan laskettua näiden avulla ja vakuutuksen nettokertamaksua laskettaessa sovelletaan samaa periaatetta kuin aikaisemminkin. Markovilainen lähestymistapa helpottaa erityisesti kolmen tai useamman hengen vakuutuksien käsittelyä.
  • Attallah, Nashwa (2022)
    The demand for natural and man-made cellulosic-based materials is in an increase continually due to the world population growth. Cotton production does not meet this demand. Consequently, a rational strategy to close this “cellulosic products gap” is to increase the production of man-made cellulosic products, following the principles of green chemistry. Cellulose is an essential skeletal component in plants and is a nearly limitless polymeric raw material with intriguing structure and properties. Due to its inherent insolubility, this crystalline and stiff homopolymer has not yet reached its full application potential. The diversity of regenerated cellulose materials formed through physical dissolution and regeneration has been remarkable in recent decades, showing tremendous possibilities in the fields of textiles, packaging, biomedicine, water treatment, and optical/electrical devices. Since most of the agents used in the physical dissolution and regeneration process can be recycled and reused and the nature of cellulose is preserved, no chemical reactions take place. This method is therefore environmentally friendly and holds the promise of bringing about a new Green Revolution in the widespread use of cellulose-like natural resources. Given the fabrication of new materials using an ecologically benign technology and the replacement of petroleum-based materials, the effects and advantages of such physical processes on society are very fascinating. This thesis includes the dissolution of microcrystalline cellulose (MCC), which represents a highly crystalline and pure cellulose model substrate by 7-methyl-1,5,7- triazabicyclo [4.4.0] dec-5-enium acetate [mTBDH] [OAc] superbase ionic liquid (SIL), which has an extreme dissolution power for cellulose. Cellulose can be first dissolved in IL at 80 ◦C and then regenerated, upon cooling, with the addition of n-propanol as an antisolvent for cellulose, leading to a phase separation. The second part of the thesis is the regeneration of cellulose in the form of films from cellulose-IL solutions. [mTBDH] [OAc] IL was used for the first time as a plasticizer for the preparation of transparent cellulose films.
  • Goriachev, Vladimir (2018)
    In the case of remote inspection and maintenance operations, the quality and amount of information available to the operator on demand plays a significant role. In knowledge-intensive tasks performed remotely or in a hazardous environment, augmented and virtual reality technologies are often seen as a solution capable of providing the required level of information support. Application of these technologies faced many obstacles over the years, mostly due to the insufficient maturity level of their technical implementations. This thesis contains a description of the research work related to the usage of augmented and virtual reality in remote inspection and maintenance operations, and is aimed at solving some of the most common problems associated with the application of these technologies. During the project, an optical see-through augmented reality glasses calibration method was developed, as well as a virtual reality application for robotic teleoperation. The implemented teleoperation system was tested in two different simulated scenarios, and the additional questions of the immersive environment reconstruction, spatial user interface, connection between virtual and real worlds are addressed in this thesis report.
  • Barua, Shawon (2019)
    Monitoring of indoor air quality (IAQ) is important because IAQ is directly related to human health and comforts. The purpose of this study was to develop a non-targeted approach for the screening of organic compounds present in indoor air. The sampling was done using cryogenic active and passive samplers, and the separation and analysis were done by using a liquid chromatograph coupled with a triple quadrupole mass spectrometer (LC-QQQ-MS). First, experimental design for sampling by variables, cooling temperature and sampling period was made and optimized to -15 ˚C and 120 minutes respectively to ensure efficient sample collection. For mass spectrometry in both positive and negative ionization modes, the ion source parameters, gas temperature, gas flow, nebulizer pressure and capillary voltage were optimized to 300 ˚C, 10 L/min, 45 psi and 4000 V respectively to enable as much detector response as possible facilitating detection and analysis of the compounds in the sample. The concentration of compounds in the raw sample being very low, one important step was to optimize the sample preparation method to enrich the sample for smooth detection and further analysis. Since the sample was collected in the form of condensate water, different sample preparation methods such as evaporation, liquid-liquid extraction (LLE) and solid-phase extractions (SPEs) with different cartridges were adopted for preconcentration. Comparing the outcomes from different sample preparation methods, it was found that sequential SPE using C2 and C18 cartridges gives maximum compound recovery, e.g., 1.5 and 2.6 times compared to evaporation and LLE respectively in positive ion mode, and 2.6 and 4.1 times in negative ion mode. Therefore, this methodology was adopted to analyze the condensate water samples from two sick houses in Finland. The results from the sick houses were compared with a reference house having no sick building syndrome (SBS) to look for potential compounds causing health issues. The data analysis was done using MZmine 2.3.4 software. Additionally, tandem mass spectrometric (MS/MS) acquisition parameters were optimized and product ions were determined as the initial step of identification of compounds in the sample of the first house. The methods developed in this work would be useful to analyze various natural samples including the analysis of outdoor air also.
  • Oksanen, Valtteri (2023)
    Catechol is widely produced platform chemical, and many fine chemicals, including pharmaceuticals and pesticides, contain catechol moieties. Catechols are nucleophilic, but their polarity can be reversed by oxidizing them into electrophilic o-benzoquinones (OBQs). OBQs are highly reactive and react readily with nucleophiles but also with dienes, dienophiles and ylides. However, OBQs have many reactive sites, which often leads to lack of selectivity in reactions. In the literature review of this thesis, the methods to control selectivity in nucleophilic additions, cycloadditions and Wittig reactions of OBQs are reviewed. Selectivity is often increased by blocking undesired reactive sites by substituents. If substituents can’t be altered, it is possible to control selectivity by choice of catalysts and substrates as well as stoichiometric ratios of substrates. In addition, the literature review will also focus on how the use of o-benzoquinones have been utilized in organic synthesis. In the experimental part, nucleophilic additions of amino acids and silyl enol ethers to o-benzoquinones are studied in practice. Reactions of amino acids with OBQs resulted only in polymerization despite the efforts to control the selectivity. However, ZnCl2 catalysed addition of silyl enol ethers into OBQs yielded only 1,4-addition products. The method was then optimized with two model reactions after which 30 different 1,4-addition products were successfully synthesized. For most of these products this method is the only proven synthesis route.
  • Vasilieva, Marta (2016)
    The work includes research in the fields of information visualization, eye tracking and information retrieval. The emphasis is put on visualization techniques which raise the level of a user's attention for performing search tasks more efficiently. The visualization technique that is described in the work is a visual cue. Visual cues direct users' attention to the relevant information on the screen. By tracking eye gaze data common user behavior patters are defined and eye gaze metrics are calculated. By analyzing eye gaze patterns we get an insight on possible design solutions which include the implementation of visual cues for improving a user's performance. By aiming at increasing user performance we propose introducing visual cues which would reduce the time spent on the search task and facilitate the attention switching within areas of interest. One of the key metrics which we get based on eye gaze data is the time spent on reading activities during the search session. In the work we calculate the duration of each type of a reading activity and suggest ways of decreasing the overall time spent, as well as means of enhancing the attention switching. ACM Computing Classification System (CCS): B.2.2 Performance Analysis and Design Aids D.2.2 Design Tools and Techniques H.3.3 Information Search and Retrieval
  • Karvo, Sara (2023)
    Zooplankton are an important link in marine pelagic food webs as they transfer energy from primary producers to higher trophic levels such as planktivorous fish. They migrate vertically in the water column, ascending to feed near the surface at night and descending to hide from visual predators for the day (diel vertical migration, DVM). Zooplankton are detected with Acoustic Doppler Current Profilers (ADCPs). These devices were developed for measuring water currents using acoustic pulses, a technique which requires particles such as zooplankton in the water column to scatter the sound. As a by-product of the velocity measurements, it provides information of these scatterers as echo intensity. This method has been used in researching zooplankton DVM, however, not in the northern Baltic Sea prior to this study. In this thesis, the data processing steps required to analyze echo intensity were examined for the specific environment of the Finnish Archipelago Sea. A one-year-long time-series was processed and averaged seasonally to investigate different patterns in zooplankton DVM. Vertical velocity data were used in estimating migration speed, and available reference measurements were combined to the data to examine the environmental factors affecting zooplankton DVM. Synchronized DVM was observed especially in autumn, however, indications of other migration patterns such as unsynchronized and reverse migration were detected during summer and winter, respectively. The primary cue behind zooplankton DVM was light, but additional contributing factors such as phytoplankton and currents were identified and discussed. The maximum migration speeds detected were approximately 10 cm/s downwards and 4 cm/s upwards. ADCP data are a good indicator of zooplankton migration in the northern Baltic Sea and in the future, it could prove beneficial in zooplankton monitoring and biomass estimates.
  • Häkkinen, Iira (2024)
    Foundation models have the potential to reduce the level of supervision required for medical image segmentation tasks. Currently, the medical image segmentation field still largely relies on supervised, task specific models. The aim of this thesis is to investigate if a foundation model, the Segment Anything Model (SAM), can be used to reduce the level of supervision needed for medical image segmentation. The main goal of this thesis is to see if the annotation workload required to generate labeled medical segmentation datasets can be significantly reduced with the help of Segment Anything Model. The second goal of this thesis is to validate the zero-shot performance of the Segment Anything Model on a medical segmentation dataset. A UNet model is used as a baseline. The results of this thesis give positive feedback on SAM's ability to be used as a tool for medical image annotation. During the experiments, it was found that especially for homogeneous, clearly outlined tasks, like organs, using ''pseudo labels'' generated by SAM for training a UNet model resulted in comparable accuracy with training a UNet model on human-annotated labels. Furthermore, the results show that zero-shot SAM has somewhat comparable performance to UNet, and even beats UNet in two of the experimented tasks. For one complexly structured task, SAM and UNet with pseudo labels, trained using SAM's masks, fail to produce accurate results. It is notable that some of the tasks have small training dataset sizes, which limits the test accuracy of UNet. The results are in accordance with recent literature which shows that zero-shot SAM can have comparable performance to state-of-the-art models with large and distinct objects, but when it comes to small, complex structures, SAM is not up to par accuracy-wise to the state-of-the-art medical segmentation models.
  • Bagchi, Rupsha (2017)
    The Internet of Things is a proliferating industry, which is transforming many homes and businesses, making them smart. However, the rapid growth of these devices and the interactions between these devices, introduces many challenges including that of a secure management system for the identities and interactions of the devices. While the centralized model has worked well for many years, there is a risk of the servers becoming bottlenecks and a single point of failure, thereby making them vulnerable to Denial-of-Service attacks. As a backbone of these interactions, Blockchain is capable of creating a highly secure, independent and distributed platform. Blockchain is a peer to peer, distributed ledger system that stores all the transactions taking place within the network. The main purpose of the servers that form a part of the distributed system is to provide a consensus, using various consensus algorithms, on the state of the blockchain at any given time and to store a copy of all the transactions taking place. This thesis explores the Blockchain technology in general and investigates its potential with regard to access management of constrained devices. A proof of concept system has been designed and implemented that demonstrates a simplified access management system using Ethereum Blockchain. This was done to check whether the concept can be applied at a global level. Although the latency of the network depends on the computing power of the resources participating in the Blockchain, an evaluation of the proof of concept system has been made, keeping in mind the smallest device that can be involved in the consensus process. Docker containers have been used to simulate a cluster of the nodes participating in the Blockchain, in order to examine the implemented system. An outline of the various advantages and the limitations of Blockchains in general, as well as the developed proof of concept system, has also been provided.
  • Leppänen, Leo (2017)
    In this study we use element-level usage data that was collected from the online learning material of an university level introductory programming course for identification of areas-of-interest in the course material as well as for prediction of student learning outcomes. The data was collected in-situ using a JavaScript component embedded in the online learning material, which recorded which HTML elements were visible on the user's screen after each interaction (movement and click) and if the user's screen had been still for at least 2500 milliseconds. A visual analysis indicates that students spend large amounts of time on material sections that discuss special syntactic structures that they are unable to infer from previous experience. Overall, the analysis was able to identify areas of the online learning material that seem to be too long and in-depth for the concepts they are discussing, when the things the students have previously learned are taken into account. This high-level analysis also revealed that the time the students spent viewing an assignment's prompt was statistically significantly correlated with the perceived workload, difficulty and educational value of that same assignment. We observe that when partial correlations are considered, and multiple comparisons are corrected for, time spent with an assignment's prompt on the screen is no longer statistically significantly correlated with the three variables. The same usage data was used to investigate whether material usage statistics can predict learning outcomes or identify strong and at-risk students. The results indicate that based on just three to four weeks of data, it is possible to identify strong and at-risk students with some accuracy. Furthermore, it seems possible to identify student programming assignment scores and total course scores with a somewhat high accuracy. Models based on material usage statistics also displayed some light predictive power in predicting student exam scores. It was also shown that the predictive powers of these models are not based solely on student effort or time-on-task. All told, this thesis demonstrates that fine-grained online learning material usage data is feasible to collect and useful in understanding both the students and the learning material. The results suggest that very simple and almost entirely domain-independent data sources can be used to predict student performance to a relatively large degree, suggesting that a combination of such simple domain-independent metrics could match highly domain dependent and more complex metrics in predictive power, giving raise to more widely usable educational analytics tools.
  • Äijälä, Cecilia (2020)
    Tropes are storytelling devices or conventions that can be found in storytelling media, for example in movies. DBTropes is an RDF-dataset with media-trope connections extracted from Tv Tropes, which is a community-edited wiki listing tropes for various creative works. This study investigates whether the tropes of films can be used to cluster similar movies. First, we extracted film-trope connections from the DBTropes dataset. We then took four samples from the dataset, three for clustering films, and one for clustering tropes. We used the film–trope connections to calculate euclidean and cosine distances between movies and for the last sample between tropes. Then we clustered the samples with hierarchical clustering using complete and ward linkage. Our results show that hierarchical clustering can group similar films together using this dataset. For calculating distances the cosine distance method works significantly better than euclidean distance. Both hierarchical clustering methods complete and ward work well. It depends on the chosen sample, which of them results in a clearer and more interpretable output. We conclude that the data works well for clustering similar films or similar tropes.
  • Hästbacka, Matti (2023)
    The direct economic impacts of the global tourism industry account for 4 % of global GDP and 8 % of global greenhouse gas emissions. The industry is in transformation caused by climate change, political instability and rapid technological development. In addition, the relationship between biodiversity conservation and tourism as well as the growing popularity are considered megatrends impacting the sector. Traditional mass tourism destinations, such as the Canary Islands, may start seeing new kinds of visitors, if traveling to exotic destinations becomes difficult as a result of these transformations. Understanding transformations affecting tourism requires information about tourists’ mobilities, interests and preferences. However, traditional data collection methods may not necessarily be suited for studying quickly changing tourism. The need for Information about visitations to natural and protected areas is especially high, as traditional tourism indicators, such as flights and accommodation statistics do not tell where the tourists spend time. Social media data may enable production of new kind of knowledge and studying nature-based tourism in a new way. In this thesis, I intent to assess the role of nature in tourism in the Canary Islands, Spain using data from the photo-sharing platform Flickr. First, I compare the spatiotemporal patterns of Flickr data against official data about tourism flows to confirm the feasibility of Flickr as a data source in the Canary Islands context. I then try to understand the importance of nature visitations and differences in nature visitation patterns between visitors from different countries. Finally, I turn to analyse contents of the images to see what kinds of nature-related topics are important for each group, making use of a deep learning and cluster detection algorithms. I verify the results of my empirical analysis with data collected through interviewing experts familiar with Canary Islands tourism. The results of my research show that Flickr reflects Canary Islands tourism patterns moderately well, and that it can be used to produce information about differences in nature visitation patterns. Protected areas are shown to be important and central for Canary Islands tourism, but differences in interest toward these areas between groups are notable. Results of the content analyses show that while differences between groups exist, both nature-related content and photos of humans are important in content posted from PAs. Verification data collected through expert interviews shows that the observed differences between groups correspond to the experts’ perceptions about differences between different groups. The findings of my thesis demonstrate the importance of nature and protected areas in Canary Islands tourism and confirm earlier knowledge about the use of Flickr in studying nature visitations. The results may inform future research in the Canary Islands. More broadly, they provide information about the feasibility and limitations of the use of social media data for nature-based tourism research.
  • Ba, Yue (2021)
    Ringed seals (Pusa hispida) and grey seals (Halichoerus grypus) are known to have hybridized in captivity despite belonging to different taxonomic genera. Earlier genetic analyses have indicated hybridization in the wild and the resulting introgression of genetic material cross species boundaries could potentially explain the intermediate phenotypes observed e.g. in their dentition. Introgression can be detected using genome data, but existing inference methods typically require phased genotype data or cannot separate heterozygous and homozygous introgression tracts. In my thesis, I will present a method based on Hidden Markov Models (HMM) to identify genomic regions with a high density of single nucleotide variants (SNVs) of foreign ancestry. Unlike other methods, my method can use unphased genotype data and can separate heterozygous and homozygous introgression tracts. I will apply this method to study introgression in Baltic ringed seals and grey seals. I will compare our method to an alternative method and assess our method with simulated data in terms of precision and recall. Then, I will apply it to seal data to search for introgression. Finally, I will discuss what future directions to improve our method.
  • Laitinen, Emma (2023)
    Implementing software process improvement (SPI) models or standards can be challenging for a small organization due to their limited resources compared to larger companies. The ISO/IEC 29110 series of systems and engineering standards were designed especially for very small entities (VSEs), i.e. for organizations having up to 25 employees. Company X is a small Finnish software company following a Scrum workflow. At Company X, challenges have been identified in the software testing process. Because of the company’s size, ISO/IEC 29110 out of different SPI standards was identified as a potential fit for improving this process. While the ISO/IEC 29110 standard can be applied to any software life cycle method, including agile, there is no formal guide on how to implement the standard in an agile environment. The aims of this thesis are two-fold: first, to investigate how Scrum corresponds with the standard, and second, to use the standard to identify weak points in Company X’s current software testing process and to identify action points to address them. The mappings between Scrum and the standard were investigated by carrying out a systematic literature review (SLR). A self-assessment and a software testing deployment package provided with the standard were used to assess the current testing process and to identify shortcomings in it. The shortcomings were analyzed and action points feasible in Company X’s context were suggested. The improved process containing the suggested action points was then re-assessed. The SLR yielded only a handful of papers, indicating that the topic of implementing ISO/IEC 29110 into an agile lifecycle in practice is relatively unexplored. The three papers together provided mappings for all three aspects of the standard vs. their counterparts in Scrum: activities, roles, and work products. The baseline assessment of Company X’s current process yielded a score of achieved ‘Partially’ (46,5 %). A set of seven shortcomings were identified in the assessment process and nine action points were suggested to address them. Assessing the improved process improved the score to implemented ‘Fully’ (97 %).
  • Niiranen, Juha (2016)
    The demand for mobile services is increasing constantly and mobile network operators need to significantly upgrade their networks to respond to the demand. The increasing complexity of the networks makes it impossible for a human operator to manage them optimally. Currently the network management operations are automated using a pre-defined logic. The future goal is to introduce cognitive network management functions which can adapt to changes in the network context and handle uncertainty in network data. This thesis discusses using Markov Logic Networks for cognitive management of mobile networks. The method allows uncertain and partial information and makes it possible to consolidate knowledge from multiple sources into a single, compact, representation. The model can be used to infer configuration changes in network parameters and the model parameters can be learned from data. We test the method in a simulated LTE network and examine the results in terms of improvements in network performance and computational cost.
  • Karvonen, Mikko (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2008)
    The usual task in music information retrieval (MIR) is to find occurrences of a monophonic query pattern within a music database, which can contain both monophonic and polyphonic content. The so-called query-by-humming systems are a famous instance of content-based MIR. In such a system, the user's hummed query is converted into symbolic form to perform search operations in a similarly encoded database. The symbolic representation (e.g., textual, MIDI or vector data) is typically a quantized and simplified version of the sampled audio data, yielding to faster search algorithms and space requirements that can be met in real-life situations. In this thesis, we investigate geometric approaches to MIR. We first study some musicological properties often needed in MIR algorithms, and then give a literature review on traditional (e.g., string-matching-based) MIR algorithms and novel techniques based on geometry. We also introduce some concepts from digital image processing, namely the mathematical morphology, which we will use to develop and implement four algorithms for geometric music retrieval. The symbolic representation in the case of our algorithms is a binary 2-D image. We use various morphological pre- and post-processing operations on the query and the database images to perform template matching / pattern recognition for the images. The algorithms are basically extensions to classic image correlation and hit-or-miss transformation techniques used widely in template matching applications. They aim to be a future extension to the retrieval engine of C-BRAHMS, which is a research project of the Department of Computer Science at University of Helsinki.
  • Vartiainen, Panu (2014)
    The thesis discusses possibilities for using metadata and context information in annotating, sharing, and searching user-created content in the mobile domain. The first part of the thesis discusses metadata, ontologies, context information, and imaging. The latter part of the thesis describes a prototype system for classifying and annotating digital photographs and storing context information as metadata of the photographs in a mobile phone. Another role of the prototype system is to perform context- and ontology-based information retrieval using a mobile phone user interface. The prototype system contains a limited RDF metadata engine and an ontology browser for mobile phones, as well as a server-side metadata and content repository. The implementation demonstrates that a part of the creation-time context, such as the location and temporal context, can be automatically gathered in a mobile phone, and stored as metadata for the content. In addition, the same parts of context information can be used for searching. The content and the metadata can be stored on a server and shared with other users. The prototype is built around a tourism scenario that works as an example of how these technologies can be used in a mobile phone.
  • Hussain, Zafar (2020)
    The National Library of Finland has digitized newspapers starting from late eighteenth century. Digitized data of Finnish newspapers is a heterogeneous data set, which contains the content and metadata of historical newspapers. This research work is focused to study this rich materiality data to find the data-driven categorization of newspapers. Since the data is not known beforehand, the objective is to understand the development of newspapers and use statistical methods to analyze the fluctuations in the attributes of this metadata. An important aspect of this research work is to study the computational and statistical methods which can better express the complexity of Finnish historical newspaper metadata. Exploratory analyses are performed to get an understanding of the attributes and extract the patterns among them. To explicate the attributes’ dependencies on each other, Ordinary Least Squares and Linear Regression methods are applied. The results of these regression methods confirm the significant correlation between the attributes. To categorize the data, spectral and hierarchical clustering methods are studied for grouping the newspapers with similar attributes. The clustered data further helps in dividing and understanding the data over time and place. Decision trees are constructed to split the newspapers after attributes’ logical divisions. The results of Random Forest decision trees show the paths of development of the attributes. The goal of applying various methods is to get a comprehensive interpretation of the attributes’ development based on language, time, and place and evaluate the usefulness of these methods on the newspaper data. From the features’ perspective, area appears as the most imperative feature and from language based comparison Swedish newspapers are ahead of Finnish newspapers in adapting popular trends of the time. Dividing the newspaper publishing places into regions, small towns show more fluctuations in publishing trends, while from the perspective of time the second half of twentieth century has seen a large increase in newspapers and publishing trends. This research work coordinates information on regions, language, page size, density, and area of newspapers and offers robust statistical analysis of newspapers published in Finland.
  • Rychkova, Kseniya (2022)
    The Traveling Salesman Problem (TSP) is a well-known optimization problem. The time needed to solve TSP classically grows exponentially with the size of the input, placing it into the NP-hard computational complexity class–the class of problems that are at least as hard as any other problem solvable in nondeterministic polynomial time. Quantum computing gives us a new approach to searching through such a huge search space, using methods such as quantum annealing and phase estimation. Although the current state of quantum computers does not give us enough resources to solve TSP with a large input, we can use quantum computing methods to improve on existing classical algorithms. The thesis reviews existing methods to efficiently tackle TSP utilizing potential quantum resources, and discusses the augmentation of classical algorithms with quantum techniques to reduce the time complexity of solving this computationally challenging problem.