Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Title

Sort by: Order: Results:

  • Särkijärvi, Joona (2023)
    Both descriptive combinatorics and distributed algorithms are interested in solving graph problems with certain local constraints. This connection is not just superficial, as Bernshteyn showed in his seminal 2020 paper. This thesis focuses on that connection by restating the results of Bernshteyn. This work shows that a common theory of locality connects these fields. We also restate the results that connect these findings to continuous dynamics, where they found that solving a colouring problem on the free part of the subshift 2^Γ is equivalent to there being a fast LOCAL algorithm solving this problem on finite sections of the Cayley graph of Γ. We also restate the result on the continuous version of Lovász Local Lemma by Bernshteyn. The LLL is a powerful probabilistic tool used throughout combinatorics and distributed computing. They proved a version of the lemma that, under certain topological constraints, produces continuous solutions.
  • Meaney, Alexander (2015)
    X-ray computed tomography (CT) is widely used in medical imaging and materials science. In this imaging modality, cross-sectional images of a physical object are formed by taking numerous X-ray projections from different angles and then applying a reconstruction algorithm to the measured data. The cross-sectional slices can be used to form a three-dimensional model of the interior structure of the object. CT is a prime example of an inverse problem, in which the aim is to recover an unknown cause from a known effect. CT technology continues to develop, motivated by the desire for increased image quality and spatial resolution in reconstructions. In medical CT, reducing patient dose is a major goal. The branch of CT known as X-ray microtomography (micro-CT) produces reconstructions with spatial resolutions in the micrometer range. Micro-CT has been practiced at the University of Helsinki since 2008. The research projects are often interdisciplinary, combining physics with fields such as biosciences, paleontology, geology, geophysics, metallurgy and food technology. This thesis documents the design and construction of a new X-ray imaging system for computed tomography. The system is a cone beam micro-CT scanner intended for teaching and research in inverse problems and X-ray physics. The scanner consists of a molybdenum target X-ray tube, a sample manipulator, and a flat panel detector, and it is built inside a radiation shielding cabinet. Measurements were made for calibrating the measurement geometry and for testing reconstruction quality. Two-dimensional reconstructions of various samples were computed using the plane which passes through the X-ray point source and is perpendicular to the axis of rotation. This central plane of the cone beam reduces to fan beam geometry. All reconstructions were computed using the filtered backprojection (FBP) algorithm, which is the industry standard. Tomographic reconstructions of high quality were obtained from the measurements. The results show that the imaging system is well suited for CT and the study of reconstruction algorithms.
  • Bakharzy, Mohammad (2014)
    In the new era of digital economy, agility and the ability to adapt to market changes and customers' needs is crucial for sustainable competitiveness. It is vital to identify and consider customers' and users' needs in order to make fact-driven decisions and evaluate assumptions and hypotheses before actually allocating resources to them. Understanding the customers' needs and delivering valuable products or services based on deep customer insight, demands Continuous Experimentation. Continuous Experimentation refers to collecting customers' and users' feedback constantly and understand the real value of product and services to test new ideas and hypothesis as early as possible with minimum resource allocation. Experimentation requires a technical infrastructure including tools, methods, processes, interfaces and APIs to collect, store, visualize and analyze the data. This thesis analyses the state of the practice and state of the art regarding current tools with functionalities that support or might support continuous experimentation. The results of this analysis is a set of problems identified for current tools as well as a set of requirements to be fulfilled for tackling those problems. Among the problems, customizability of the tools to meet the needs of different companies and scenarios is of utmost importance. The lack of customizability in current tools offered companies to allocate their resources to develop their own proprietary tools tailored for their custom needs. Based on requirements that support better customizability, a prototype tool that supports continuous experimentation has been designed and implemented. The support of the tool is evaluated in a real-world scenario with respect to the requirements and customizability issue.
  • Tolvanen, Pinja (2022)
    The role of geographic thinking is essential in tackling topical challenges such as climate crisis, biodiversity loss and sustainable production of food. One powerful tool that helps to model and analyze these complex geographic phenomena is geographic information systems (GIS). Using GIS as part of geography high school education has many benefits when it is applied intentionally. However, many teachers still struggle to implement GIS in long-term classroom use even if they have gotten previous GIS training and have access to internet-based GIS, easy-access data and easier to use software. There is still a need for further research on how teachers can be supported in GIS education on a practical level. This thesis research aims to find solutions to this need. The research is conducted as design-based research that consists of problem analyses and a cyclic development process where a design solution, a GIS learning activity, is created. Problem analyses showed that combining new and existing knowledge, using multimodal learning environments, and supporting motivation and development of metacognitive skills are important to take into consideration in designing the learning activity. They also examined features that lead to successful GIS teacher training. Conducted interviews revealed that the biggest challenges with GIS education relate to scarcity of time, insufficient technical skills, and training that does not provide practical value. Teachers wished for very practical level support that is efficient timewise and offers them learning materials that are ready for easy classroom use. Based on these findings, a GIS learning activity was designed to answer the common challenges. The practical was tested consecutively by two geography teachers from a collaborative high school. Feedback revealed that the first teacher faced some challenges relating to time management during the lesson but found the activity useful. The second teacher tested the activity after some modifications had been made and the testing was overall successful. Both teachers expressed interest in using the material and the GIS software again in the future. The findings suggest that providing teachers this research-based GIS learning material has potential to support them in GIS education and to remove many common challenges. Some advantages of the practical were offering teachers a web-based GIS with simple user interface, preprocessed data already included in the service and a ready practical that can be completed in one lesson. The theme also supported the national core curriculum which is very valuable in creating new GIS materials for educational use. This study showed that relevant and inquiry-based GIS activities are still needed in high school geography education. It also serves as the first opening for new LUMA Taita -project that promotes international science education collaboration and brings research into schools in an inspiring way.
  • Ghulam, Shenelle Pearl (2016)
    Bonding is a central concept in chemistry education; thus a thorough understanding of it is crucial in order to understand various other concepts of chemistry. However, students often find it difficult to understand the concept of bonding and as a result develop alternative conceptions. Living in a macroscopic world, students may find it difficult to shift between macroscopic and molecular levels; this is one of the reasons why students find it difficult to understand chemical bonding. The wide range of complex and sophisticated scientific models that scientists have developed to explain bonding, can be confusing for students. Moreover, students develop alternative conceptions as a result of the way they are taught. Computer-based molecular modelling could be utilized to facilitate and enhance student understanding of bonding. This thesis describes a study on the supportive opportunities and challenges encountered when using computer-based molecular modelling to enhance student understanding of bonding, focusing particularly on three main inquiries. Investigating the challenges students face when utilizing computer-based molecular modelling to understand and explain chemical bonding. Exploring the features of computer-based molecular modelling that enhance student understanding of bonding. And analysing how to optimally support students understanding of bonding when using computer-based models. The study was conducted as a design-based research, centred particularly on student's opinions. An exercise was designed and implemented with 20 International Baccalaureate (11th grade students) during their chemistry lessons. The exercise sheet comprised of brief explanations on bonding, instructions to visualize models on Edumol (a web based molecular modelling and visualization environment) and questions to be answered after visualizing the models. The research results highlighted the importance of well planned activities to ensure the effective use of computer-based models. Prior to using computer-based models in class, teachers must consider possible solutions for technical difficulties that might arise. They must also plan activities based on student's prior experiences with models, to ensure that nothing hinders the students learning process. Additionally, teachers must individualize activities by taking into consideration students opinions and preferences, to ensure productive learning. Furthermore, teachers should optimize the use of the effective features of computer-based models. Features such as molecular electrostatic potentials, that are only possible to visualize via computer-based models. Finally, teachers should use the necessary supportive materials in conjunction with the computer-based models to enhance student understanding of bonding.
  • Hiillos, Nicolas (2023)
    This master's thesis describes the development and validation of a uniform control interface for drawing robots with ROS2. The robot control software was tasked with taking SVG images as input and producing them as drawings with three different robots. These robots are the Evil Mad Scientist AxiDraw V3/A3, UFACTORY xArm Lite6, and virtual xArm Lite6. The intended use case for the robots and companion control software is experiments studying human perception of the creativity of the drawing robots. The control software was implemented over the course of a little over six months and used a combination of C++ and Python. The design of the software utilizes ROS2 abstractions such as nodes and topics to combine different components of the software. The control software is validated against the given requirements and found to fulfil the main objectives of the project. The most important of these are that the robots successfully draw SVG images, that they do so in a similar time frame, and that these images look very similar. Drawing similarity was tested by scanning images, aligning them using using minimal error, and then comparing them visually after overlaying the images. Comparing aligned images was useful in detecting subtle differences in the drawing similarity of the robots and was used to discover issues with the robot control software. MSE and SSIM were also calculated for a set of these aligned images, allowing for the effect of future changes made to the robot control software to be quantitatively evaluated. Drawing time for the robots was evaluated by measuring the time taken for drawing a set of images. This testing showed that the Axidraw's velocity and acceleration needed to be reduced by 56% so that the xArm Lite6 could draw in similar time.
  • Hertweck, Corinna (2020)
    In this work, we seek robust methods for designing affirmative action policies for university admissions. Specifically, we study university admissions under a real centralized system that uses grades and standardized test scores to match applicants to university programs. For the purposes of affirmative action, we consider policies that assign bonus points to applicants from underrepresented groups with the goal of preventing large gaps in admission rates across groups, while ensuring that the admitted students are for the most part those with the highest scores. Since such policies have to be announced before the start of the application period, there is uncertainty about which students will apply to which programs. This poses a difficult challenge for policy-makers. Hence, we introduce a strategy to design policies for the upcoming round of applications that can either address a single or multiple demographic groups. Our strategy is based on application data from previous years and a predictive model trained on this data. By comparing this predictive strategy to simpler strategies based only on application data from, e.g., the previous year, we show that the predictive strategy is generally more conservative in its policy suggestions. As a result, policies suggested by the predictive strategy lead to more robust effects and fewer cases where the gap in admission rates is inadvertently increased through the suggested policy intervention. Our findings imply that universities can employ predictive methods to increase the reliability of the effects expected from the implementation of an affirmative action policy.
  • Mäkinen, Sasu (2021)
    Deploying machine learning models is found to be a massive issue in the field. DevOps and Continuous Integration and Continuous Delivery (CI/CD) has proven to streamline and accelerate deployments in the field of software development. Creating CI/CD pipelines in software that includes elements of Machine Learning (MLOps) has unique problems, and trail-blazers in the field solve them with the use of proprietary tooling, often offered by cloud providers. In this thesis, we describe the elements of MLOps. We study what the requirements to automate the CI/CD of Machine Learning systems in the MLOps methodology. We study if it is feasible to create a state-of-the-art MLOps pipeline with existing open-source and cloud-native tooling in a cloud provider agnostic way. We designed an extendable and cloud-native pipeline covering most of the CI/CD needs of Machine Learning system. We motivated why Machine Learning systems should be included in the DevOps methodology. We studied what unique challenges machine learning brings to CI/CD pipelines, production environments and monitoring. We analyzed the pipeline’s design, architecture, and implementation details and its applicability and value to Machine Learning projects. We evaluate our solution as a promising MLOps pipeline, that manages to solve many issues of automating a reproducible Machine Learning project and its delivery to production. We designed it as a fully open-source solution that is relatively cloud provider agnostic. Configuring the pipeline to fit the client needs uses easy-to-use declarative configuration languages (YAML, JSON) that require minimal learning overhead.
  • Hore, Sayantan (2015)
    Content Based Image Retrieval or CBIR systems have become the state of the art image retrieval technique over the past few years. They showed commendable retrieval performance over traditional annotation based retrieval. CBIR systems use relevance feedback as input query. CBIR systems developed so far did not put much effort to come up with suitable user interfaces for accepting relevance feedback efficiently i.e. by putting less cognitive load to the user and providing a higher amount of exploration in a limited amount of time. In this study we propose a new interface 'FutureView' which allows peeking into the future providing access to more images in less time than traditional interfaces. This idea helps the user to choose more appropriate images without getting diverted. We used Gaussian process upper confidence bound algorithm for recommending images. We successfully compared this algorithm with Random and Exploitation algorithms with positive results.
  • Ylikotila, Henri (2018)
    Self-aware computing is an emerging research area, which aims to solve issues stemming from a combination of increasingly more complex systems and diverse operating conditions. It adapts the key concepts of human self-awareness to the computational context. These concepts are well established in cognitive science, psychology, social psychology and philosophy, but novel in the software engineering context. Self-aware systems are able to acquire information about their environment and their internal state. The obtained information is used to build knowledge through reasoning. This increasing knowledge is used for building an internal learning model, which enables the system to adapt its actions. Self-aware systems can autonomously navigate runtime changes in their goals and environmental conditions, thus enabling a high degree of adaptivity to changing conditions that are difficult to predict at design time. In contrast, traditional systems have to operate with a set of predefined rules that are reliant on the design time knowledge of the designer. This study aims to identify proposed software architecture solutions that are of value for both practitioners and researchers. The study was conducted as a systematic literature review, for which we have developed a repeatable review protocol in order to cover all relevant literature in the area. The review protocol was explicitly defined and applied rigorously. In our review we managed to extract several proposed architecture designs from the reviewed 9 primary studies. These solutions propose several solutions, such as reference architectures, architecture frameworks, and architectural patterns for designing self-aware systems. This study can be used to get an overview of state of the art software architecture designs for self-aware systems. Additionally, this study can provide support for finding future research direction regarding self-aware systems.
  • Laaja, Oskari (2022)
    Mobile applications have become common and end-users expect to be able to use either of the major platforms: iOS or Android. The expectation of finding the application in their respected platform stores is strongly present. The process of publishing mobile applications into these application stores can be cumbersome. The frequency of mobile application updates can be damaged by the heaviness of the process, reducing the end-user satisfaction. As manually completed processes are prone to human errors, the robustness of the process decreases and the quality of the application may diminish. This thesis presents an automated pipeline to complete the process of publishing cross-platform mobile application into App Store and Play Store. The goal of this pipeline is to make the process faster to complete, more robust and more accessible to people without technical knowhow. The work was done with design science methodology. As results, two artifacts are generated from this thesis: a model of a pipeline design to improve the process and implementation of said model to functionally prove the possibility of the design. The design is evaluated against requirements set by the company for which the implementation was done. As a result, the process used in the project at which the implementation was taken into use got faster, simpler and became possible for non-development personnel to use.
  • Lagerspetz, Eemil (Helsingin yliopistoHelsingfors universitetUniversity of Helsinki, 2009)
    Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user's files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.
  • Savolainen, Outi (2022)
    Today, Global Navigation Satellite Systems (GNSS) provide services that many critical systems [1] as well as normal users, need in everyday life. These signals are threatened by unintentional and intentional interference. The received satellite signals are complex-valued by nature, however, state-of-the-art anomaly detection approaches operate in the real domain. Changing the anomaly detection into the complex domain allows for preserving the phase component of the signal data. In this thesis, I developed and tested a fully complex-valued Long Short-Term Memory (LSTM) based autoencoder for anomaly detection. I also developed a method for scaling of complex-numbers that forces both real and imaginary units into the range [-1,1] and does not change the direction of a complex vector. The model is trained and tested both in the time and frequency domains, and the frequency domain is divided into two parts: real and complex domain. The developed model’s training data consists only of clean sample data, and the output of the model is the reconstruction of the model’s input. In testing, it can be determined whether the output is clean or anomalous based on the reconstruction error and the computed threshold value. The results show that the autoencoder model in the real domain outperforms the model trained in the complex domain. This does not indicate that the anomaly detection in the complex domain does not work; rather, the model’s architecture needs improvements, and the amount of training data must be increased to reduce the overfitting of the complex domain and thus improve the anomaly detection capability. It was also investigated that some anomalous sample sequences contain a few large valued spikes while other values in the same data snapshot are smaller. After scaling, the values other than in the spikes get closer to zero. This phenomenon causes small reconstruction errors in the model and yields false predictions in the complex domain.
  • Rannisto, Meeri (2020)
    Bat monitoring is commonly based on audio analysis. By collecting audio recordings from large areas and analysing their content, it is possible estimate distributions of bat species and changes in them. It is easy to collect a large amount of audio recordings by leaving automatic recording units in nature and collecting them later. However, it takes a lot of time and effort to analyse these recordings. Because of that, there is a great need for automatic tools. We developed a program for detecting bat calls automatically from audio recordings. The program is designed for recordings that are collected from Finland with the AudioMoth recording device. Our method is based on a median clipping method that has previously shown promising results in the field of bird song detection. We add several modifications to the basic method in order to make it work well for our purpose. We use real-world field recordings that we have annotated to evaluate the performance of the detector and compare it to two other freely available programs (Kaleidoscope and Bat Detective). Our method showed good results and got the best F2-score in the comparison.
  • Unknown author (2023)
    This study focused on detecting horizontal and vertical collusion within Indonesian government procurement processes, leveraging data-driven techniques and statistical methods. Regarding horizontal collusion, we applied clustering techniques to categorize companies based on their supply patterns, revealing clusters with similar bidding practices that may indicate potential collusion. Additionally, we identified patterns where specific supplier groups consistently won procurements, raising questions about potential competitive advantages or strategic practices that need further examination for collusion. For vertical collusion, we examined the frequency of associations between specific government employees and winning companies. While high-frequency collaborations were observed, it is essential to interpret these results with caution as they do not definitively indicate collusion, and legitimate factors might justify such associations. Despite revealing important patterns, the study acknowledges its limitations, including the representativeness of the dataset and the reliance on quantitative methods. Nevertheless, our findings carry substantial implications for enhancing procurement monitoring, strengthening anti-collusion regulations, and promoting transparency in Indonesian government procurement processes. Future research could enrich these findings by incorporating qualitative methods, exploring additional indicators of collusion, and leveraging machine learning techniques to detect collusion.
  • Ikkala, Tapio (2020)
    This thesis presents a scalable method for identifying anomalous periods of non-activity in short periodic event sequences. The method is tested with real world point-of-sale (POS) data from grocery retail setting. However, the method can be applied also to other problem domains which produce similar sequential data. The proposed method models the underlying event sequence as a non-homogeneous Poisson process with a piecewise constant rate function. The rate function for the piecewise homogeneous Poisson process can be estimated with a change point detection algorithm that minimises a cost function consisting of the negative Poisson log-likelihood and a penalty term that is linear to the number of change points. The resulting model can be queried for anomalously long periods of time with no events, i.e., waiting times, by defining a threshold below which the waiting time observations are deemed anomalies. The first experimental part of the thesis focuses on model selection, i.e., in finding a penalty value that results in the change point detection algorithm detecting the true changes in the intensity of the arrivals of the events while not reacting to random fluctuations in the data. In the second experimental part the performance of the anomaly detection methodology is measured against stock-out data, which gives an approximate ground truth for the termination of a POS event sequence. The performance of the anomaly detector is found to be subpar in terms of precision and recall, i.e., the true positive rate and the positive predictive value. The number of false positives remains high even with small threshold values. This needs to be taken into account when considering applying the anomaly detection procedure in practice. Nevertheless, the methodology may have practical value in the retail setting, e.g., in guiding the store personnel where to focus their resources in ensuring the availability of the products.
  • Rauth, Ella (2022)
    Northern peatlands are a large source of methane (CH4) to the atmosphere and can vary strongly depending on local environmental conditions. However, few studies have mapped fine-grained CH4 fluxes at the landscape-level. The aim of this study was to predict land cover and CH4 flux patterns in Pallastunturi, Finland, in a study area dominated by forests, peatlands, fells, and lakes. I used random forest models to map land cover types and CH4 fluxes with multi-source remote sensing data and upscaled CH4 fluxes based on land cover maps. The random forest classifier reliably detected the same land cover patterns as the CORINE Land Cover maps. The main differences between the land cover maps were forest type classification, misclassification between neighboring peatland types, and detection of sparsely vegetated areas on fells. The upscaled CH4 fluxes of sinks were very robust to changes in land cover classification, but shrub tundra and peatland CH4 fluxes were sensitive to the level of detail in the land cover classification. The random forest regression performed well (NRMSE 6.6%, R2 82%) and predicted similar CH4 flux patterns as the upscaled CH4 flux maps, despite predicting larger areas that act as CH4 sources than the upscaled CH4 flux maps. The random forest regressor also better predicted CH4 fluxes in peatlands due to added information about soil moisture content from the remote sensing data. Random forests are a good model choice to detect landscape patterns and predict CH4 patterns in northern peatlands based on remote sensing and topographic data.
  • Reunamo, Antti (2020)
    Popularity of mobile instant messaging applications has flourished during the last ten years, and people are using them to exchange private and personal information on daily basis. These applications can be freely installed from online marketplaces, and average users may have several of them installed on their devices. The amount of information available from these messaging applications for a third-party eavesdropper via network traffic analysis has therefore grown significantly as well. Security features of these applications have also been developing over the years, and the communication between the applications and the background server infrastructure nowadays practically always employs encryption. Recently, more advanced end-to-end encryption methods have been developed to hide the content of the exchanged data even from the messaging service providers. Machine learning techniques have successfully been utilized in analyzing encrypted network traffic, and previous research has shown that this approach can effectively be used to detect mobile applications and the actions users are performing in those applications regardless of encryption. While the actual content of the messages and other transferred data cannot be accessed by the eavesdropper, these methods can still lead to serious privacy compromises. This thesis discusses the present state of machine learning-based identification of applications and user actions, how feasible it would be to actually perform such detection in a Wi-Fi network and what kind of privacy concerns would arise.
  • Roy, Suravi Saha (2020)
    A global pandemic, COVID-19 began in December 2019 in Wuhan, China. Since then it has expanded all around the globe and was declared a global pandemic in early March by the World Health Organization (WHO). Ever since this pandemic started, the number of infections grew exponentially. Currently, there is a global rise in COVID-19 cases with 3.6 million new cases and new deaths with a weekly growth of 21%. The disease outbreak caused over 55.6 million infected cases and more than 1.34 million deaths worldwide since the beginning of this pandemic. Reverse transcription polymerase chain reaction (RT-PCR) test is the best protocol currently in use to detect COVID-19 positive patients. In a setup with low resources especially in developing countries with huge populations, RT-PCR test is not always a viable option for being expensive, time-consuming and it requires trained professionals. With the overwhelming number of infected cases, there is a significant need for a substitute that is cheaper, faster and accessible. In that regard, machine learning classification models are developed in this study to detect COVID-19 positive patients and predict the patient deterioration in the presence of missing data using a dataset published by hospital Israelita Albert Einstein, at São Paulo, Brazil. The dataset consists of 5644 anonymous patient samples who visited the hospital and tested for RT-PCR along with additional laboratory test results providing 111 clinical features. Additionally, there are more than 90% missing values in this dataset. To explore missing data analysis on COVID-19 clinical data, a comparison between a complete case analysis and imputed case analysis is reported in this study. It is established that the logistic regression model with multivariate imputations by chained equations (MICE) on the data, provides 91% and 85% sensitivity respectively for detecting COVID-19 positive patients and predicting the patient deterioration. The area under the receiver operating characteristics curve (AUC) score is reported at 93% and 89% for both tasks respectively. Sensitivity and AUC scores are selected for evaluating the model’s performance as false negatives are harmful for patient screening and triaging. The proposed pipeline is an alternative approach towards COVID-19 diagnosis and prognosis. Clinicians can employ this pipeline for early screening of COVID-19 suspected patients, triaging the medical procedures and as a secondary diagnostic tool for deciding patient’s priority for treatments by utilizing low-cost, readily available laboratory test results.
  • Katainen, Riku (2013)
    After the Human Genome Project completed the mapping of human DNA sequence in 2001, a new era began in biological and medical research. The genetic basis of various diseases, such as cancer, could be studied with higher precision than ever before. The map of human genome enabled next-generation sequencing (NGS) techniques and not only DNA sequencing got faster and cheaper to perform, also the amount of data started to increase exponentially. The field of bioinformatics, which combines both computer and life sciences, got a great challenge to handle all the data available and to dig out relevant information out of it. Various tools with heavily enhanced or completely new kinds of algorithms were developed for the demanding task of the analysis of NGS data, which are in the focus of this thesis. For the search of cancer causing mutations, NGS methods enable genome scale studies with the precision of a single molecule. However, the spectacular scale and preciseness of the data offer another challenge – how to distinguish trivial data from the non-trivial, and furthermore, how to separate reliable data from erroneous. The raw data must be put through a pipeline of various processing tools, which organize and humanize the data with the help of the map of human genome. After data processing, the data is feasible for the actual cancer specific analysis, where causative mutations can be hunted down. For this purpose, I have developed an analysis and visualization software, Rikurator, which provides various features and tools to handle the NGS data. Rikurator is designed for comparative analysis of dozens of cancer samples, quality filtering, controlling and visualization to name a few. In addition to tools in data processing pipeline, this thesis will describe features and implementation of Rikurator.