Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Datatieteen maisteriohjelma"

Sort by: Order: Results:

  • Zhao, Chenhui (2023)
    In recent years, classical neural networks have been widely used in various applications and have achieved remarkable success. However, with the advent of quantum computing, there is a growing interest in quantum neural networks (QNNs) as a potential alternative to classical machine learning. In this thesis, we study the architectures of quantum and classical neural networks. We also investigate the performance of QNNs compared to classical neural networks from various aspects, such as vanishing gradient, trainability, expressivity. Our experiments demonstrate that QNNs have the potential to outperform classical neural networks in specific scenarios. While more powerful QNNs exhibit improved performance compared to classical neural networks, our findings also indicate that less powerful QNNs may not always yield significant improvements. This suggests that the effectiveness of QNNs in surpassing classical approaches is contingent on factors such as network architecture, optimization techniques, problem complexity.
  • Trizna, Dmitrijs (2022)
    The detection heuristic in contemporary machine learning Windows malware classifiers is typically based on the static properties of the sample. In contrast, simultaneous utilization of static and behavioral telemetry is vaguely explored. We propose a hybrid model that employs dynamic malware analysis techniques, contextual information as an executable filesystem path on the system, and static representations used in modern state-of-the-art detectors. It does not require an operating system virtualization platform. Instead, it relies on kernel emulation for dynamic analysis. Our model reports enhanced detection heuristic and identify malicious samples, even if none of the separate models express high confidence in categorizing the file as malevolent. For instance, given the $0.05\%$ false positive rate, individual static, dynamic, and contextual model detection rates are $18.04\%$, $37.20\%$, and $15.66\%$. However, we show that composite processing of all three achieves a detection rate of $96.54\%$, above the cumulative performance of individual components. Moreover, simultaneous use of distinct malware analysis techniques address independent unit weaknesses, minimizing false positives and increasing adversarial robustness. Our experiments show a decrease in contemporary adversarial attack evasion rates from $26.06\%$ to $0.35\%$ when behavioral and contextual representations of sample are employed in detection heuristic.
  • Shappo, Viacheslav (2022)
    The primary concern of the companies working with many customers is proper customer segmentation, i.e., division of the customers into different groups based on their common characteristics. Customer segmentation helps marketing specialists to adjust their offers and reach potential customer groups interested in a specific type of product or service. In addition, knowing such customer segments may help search for new look-alike customers sharing similar characteristics. The first and most crucial segmentation is splitting the customers into B2B (business to business) and B2C (business to consumers). The next step is to analyze these groups properly and create more through product-specific groups. Nowadays, machine learning plays a vital role in customer segmentation. This is because various classification algorithms can see more patterns in customer characteristics and create more tailored customer segmentations than a human can. Therefore, utilizing machine learning approaches in customer segmentation may help companies save their costs on marketing campaigns and increase their sales by targeting the correct customers. This thesis aims to analyze B2B customers potentially interested in renewable diesel "Neste MY" and create a classification model for such segmentation. The first part of the thesis is focused on the theoretical background of customer segmentation and its use in marketing. Firstly, the thesis introduces general information about Neste as a company and discusses the marketing stages that involve the customer segmentation approach. Secondly, the data features used in the study are presented. Then the methodological part of the thesis is introduced, and the performance of three selected algorithms is evaluated on the test data. Finally, the study's findings and future means of improvement are discussed. The significant finding of the study is that finely selected features may significantly improve model performance while saving computational power. Several important features are selected as the most crucial customer characteristics that the marketing department afterward uses for future customer segmentations.
  • Tiittanen, Henri (2019)
    Estimating the error level of models is an important task in machine learning. If the data used is independent and identically distributed, as is usually assumed, there exist standard methods to estimate the error level. However, if the data distribution changes, i.e., a phenomenon known as concept drift occurs, those methods may not work properly anymore. Most existing methods for detecting concept drift focus on the case in which the ground truth values are immediately known. In practice, that is often not the case. Even when the ground truth is unknown, a certain type of concept drift called virtual concept drift can be detected. In this thesis we present a method called drifter for estimating the error level of arbitrary regres- sion functions when the ground truth is not known. Concept drift detection is a straightforward application of error level estimation. Error level based concept drift detection can be more useful than traditional approaches based on direct distribution comparison, since only changes that affect the error level are detected. In this work we describe the drifter algorithm in detail, including its theoretical basis, and present an experimental evaluation of its performance in virtual concept drift detection on multiple datasets consisting of both synthetic and real-world datasets and multiple regression functions. Our experi- ments show that the drifter algorithm can be used to detect virtual concept drift with a reasonable accuracy.
  • Rehn, Aki (2022)
    The application of Gaussian processes (GPs) is limited by the rather slow process of optimizing the hyperparameters of a GP kernel which causes problems especially in applications -- such as Bayesian optimization -- that involve repeated optimization of the kernel hyperparameters. Recently, the issue was addressed by a method that "amortizes" the inference of the hyperparameters using a hierarchical neural network architecture to predict the GP hyperparameters from data; the model is trained on a synthetic GP dataset and in general does not require retraining for unseen data. We asked if we can understand the method well enough to replicate it with a squared exponential kernel with automatic relevance determination (SE-ARD). We also asked if it is feasible to extend the system to predict posterior approximations instead of point-estimates to support fully Bayesian GPs. We introduce the theory behind Bayesian inference; gradient-based optimization; Gaussian process regression; variational inference; neural networks and the transformer architecture; the method that predicts point-estimates of the hyperparameters; and finally our proposed architecture to extend the method to a variational inference framework. We were able to successfully replicate the method from scratch with an SE-ARD kernel. In our experiments, we show that our replicated version of the method works and gives good results. We also implemented the proposed extension of the method to a variational inference framework. In our experiments, we do not find concrete reasons that would prevent the model from functioning, but observe that the model is very difficult to train. The final model that we were able to train predicted good means for (Gaussian) posterior approximations, but the variances that the model predicted were abnormally large. We analyze possible causes and suggest future work.
  • Comănescu, Andrei-Daniel (2020)
    Social networks represent a public forum of discussion for various topics, some of them controversial. Twitter is such a social network; it acts as a public space where discourse occurs. In recent years the role of social networks in information spreading has increased. As have the fears regarding the increasingly polarised discourse on social networks, caused by the tendency of users to avoid exposure to opposing opinions, while increasingly interacting with only like-minded individuals. This work looks at controversial topics on Twitter, over a long period of time, through the prism of political polarisation. We use the daily interactions, and the underlying structure of the whole conversation, to create daily graphs that are then used to obtain daily graph embeddings. We estimate the political ideologies of the users that are represented in the graph embeddings. By using the political ideologies of users and the daily graph embeddings, we offer a series of methods that allow us to detect and analyse changes in the political polarisation of the conversation. This enables us to conclude that, during our analysed time period, the overall polarisation levels for our examined controversial topics have stagnated. We also explore the effects of topic-related controversial events on the conversation, thus revealing their short-term effect on the conversation as a whole. Additionally, the linkage between increased interest in a topic and the increase of political polarisation is explored. Our findings reveal that as the interest in the controversial topic increases, so does the political polarisation.
  • Louhi, Jarkko (2023)
    The rapid growth of artificial intelligence (AI) and machine learning (ML) solutions has created a need to develop, deploy and maintain AI/ML those to production reliably and efficiently. MLOps (Machine Learning Operations) framework is a collection of tools and practices that aims to address this challenge. Within the MLOps framework, a concept called the feature store is introduced, serving as a central repository responsible for storing, managing, and facilitating the sharing and reuse of extracted features derived from raw data. This study gives first an overview of the MLOps framework and delves deeper into feature engineering and feature data management, and explores the challenges related to these processes. Further, feature stores are presented, what they exactly are and what benefits do they introduce to organizations and companies developing ML solutions. The study also reviews some of the currently popular feature store tools. The primary goal of this study is to provide recommendations for organizations to leverage feature stores as a solution to the challenges they encounter in managing feature data currently. Through an analysis of the current state-of-the-art and a comprehensive study of organizations' practices and challenges, this research presents key insights into the benefits of feature stores in the context of MLOps. Overall, the thesis highlights the potential of feature stores as a valuable tool for organizations seeking to optimize their ML practices and achieve a competitive advantage in today's data-driven landscape. The research aims to explore and gather practitioners' experiences and opinions on the aforementioned topics through interviews conducted with experts from Finnish organizations.
  • Kang, Taize (2022)
    Story generation is an artificial intelligence task in which a computer program is used to create literature or stories. This kind of task usually involves giving an initial scene, characters, background information and goals, and then letting the computer program automatically generate a storyline and complete the narrative of the story. Transformers are widely used and achieved state of the art for many different natural language processing tasks, including story generation. With the help of attention mechanism, transforms can overcome overfittting and achieved great results. Generative Pre-trained Transformer (GPT) series are one of the best transformers, which attract many researchers. In this thesis, transformer models are used to design and implement a machine learning method for the generation of very short stories. By introducing a commonsense knowledge base and a rule generator based on it, the models can learn the relationships between context and generate coherent narratives. By given the first sentence of the story as the input, the model can complete the story. The model is based on GPT-2 model and COINS. The dataset used is a collection of short stories. By comparing with the generated results of different models in many aspects, we proved the effectiveness of the model. In addition, the compared results are analyzed to find the potential optimization methods.
  • Hu, Rosanna Yingying (2024)
    Buildings consume approximately 40% of global energy, hence, understanding and analyzing energy consumption patterns of buildings is essential in bringing desirable insights to building management stakeholders for better decision-making and energy efficiency. Based on a specific use case of a Finnish building management company, this thesis presents the challenge of optimizing energy consumption forecasting and building management by addressing the shortcomings of current individual building-level forecasting approaches and the dynamic nature of building energy use. The research investigates the plausibility of a system of building clusters by studying the representative cluster profiles and dynamic cluster changes. We focus on a dataset comprising hourly energy consumption time series from a variety of Finnish university buildings, employing these as subjects to implement a novel stream clustering approach called ClipStream. ClipStream is an attibute-based stream clustering algorithm to perform continuous online clustering of time series data batches that involves iterative data abstraction, clustering, and change detection phases. This thesis shows that it was plausible to build clusters of buildings based on energy consumption time series. 23 buildings were successfully clustered into 3-5 clusters during each two-week window of the period of investigation. The study’s findings revealed distinct and evolving energy consumption clusters of buildings and characterized 7 predominant cluster profiles, which reflected significant seasonal variations and operational changes over time. Qualitative analyses of the clusters primarily confirmed the noticeable shifts in energy consumption patterns from 2019 to 2022, underscoring the potential of our approach to enhance forecasting efficiency and management effectiveness. These findings could be further extended to establish energy policy, building management practices, and broader sustainability efforts. This suggests that improved energy efficiency can be achieved through the application of machine learning techniques such as cluster analysis.
  • Kotola, Mikko Markus (2021)
    Image captioning is the task of generating a natural language description of an image. The task requires techniques from two research areas, computer vision and natural language generation. This thesis investigates the architectures of leading image captioning systems. The research question is: What components and architectures are used in state-of-the-art image captioning systems and how could image captioning systems be further improved by utilizing improved components and architectures? Five openly reported leading image captioning systems are investigated in detail: Attention on Attention, the Meshed-Memory Transformer, the X-Linear Attention Network, the Show, Edit and Tell method, and Prophet Attention. The investigated leading image captioners all rely on the same object detector, the Faster R-CNN based Bottom-Up object detection network. Four out of five also rely on the same backbone convolutional neural network, ResNet-101. Both the backbone and the object detector could be improved by using newer approaches. Best choice in CNN-based object detectors is the EfficientDet with an EfficientNet backbone. A completely transformer-based approach with a Vision Transformer backbone and a Detection Transformer object detector is a fast-developing alternative. The main area of variation between the leading image captioners is in the types of attention blocks used in the high-level image encoder, the type of natural language decoder and the connections between these components. The best architectures and attention approaches to implement these components are currently the Meshed-Memory Transformer and the bilinear pooling approach of the X-Linear Attention Network. Implementing the Prophet Attention approach of using the future words available in the supervised training phase to guide the decoder attention further improves performance. Pretraining the backbone using large image datasets is essential to reach semantically correct object detections and object features. The feature richness and dense annotation of data is equally important in training the object detector.
  • Huertas, Andres (2020)
    Investment funds are continuously looking for new technologies and ideas to enhance their results. Lately, with the success observed in other fields, wealth managers are taking a closes look at machine learning methods. Even if the use of ML is not entirely new in finance, leveraging new techniques has proved to be challenging and few funds succeed in doing so. The present work explores de usage of reinforcement learning algorithms for portfolio management for the stock market. It is well known the stochastic nature of stock and aiming to predict the market is unrealistic; nevertheless, the question of how to use machine learning to find useful patterns in the data that enable small market edges, remains open. Based on the ideas of reinforcement learning, a portfolio optimization approach is proposed. RL agents are trained to trade in a stock exchange, using portfolio returns as rewards for their RL optimization problem, thus seeking optimal resource allocation. For this purpose, a set of 68 stock tickers in the Frankfurt exchange market was selected, and two RL methods applied, namely Advantage Actor-Critic(A2C) and Proximal Policy Optimization (PPO). Their performance was compared against three commonly traded ETFs (exchange-traded funds) to asses the algorithm's ability to generate returns compared to real-life investments. Both algorithms were able to achieve positive returns in a year of testing( 5.4\% and 9.3\% for A2C and PPO respectively, a European ETF (VGK, Vanguard FTSE Europe Index Fund) for the same period, reported 9.0\% returns) as well as healthy risk-to-returns ratios. The results do not aim to be financial advice or trading strategies, but rather explore the potential of RL for studying small to medium size stock portfolios.
  • Moilanen, Jouni Petteri (2023)
    In recent years, a concern has grown within the NLG community about the comparability of systems and reproducibility of research results. This concern has mainly been focused on the evaluation of NLG systems. Problems with automated metrics, crowd-sourced human evaluations, sloppy experimental design and error reporting, etc. have been widely discussed in the literature. A lot of proposals for best practices, metrics, frameworks and benchmarks for NLG evaluation have lately been issued to address these problems. In this thesis we examine the current state of NLG evaluation – focusing on data-to-text evaluation – in terms of proposed best practices, benchmarks, etc., and their adoption in practice. Academic publications concerning NLG evaluation indexed in the Scopus database published in 2018-2022 were examined. After manual inspection 141 of those I deemed to contain some kind of concrete proposal for improvements in evaluation practices. The adoption (use in practice) of those was again examined by inspecting papers citing them. There seems to be a willingness in the academic community to adopt these proposals, especially ”best practices” and metrics. As for datasets, benchmarks, evaluation platforms, etc., the results are inconclusive.
  • Uvarova, Elizaveta (2024)
    Asteroids within our Solar System attract considerable attention for their potential impact on Earth and their role in elucidating the Solar System's formation and evolution. Understanding asteroids' composition is crucial for determining their origin and history, making spectral classification a cornerstone of asteroid categorization. Spectral classes, determined by asteroids' reflectance spectrum, offer insights into their surface composition. Early attempts at classification, predating 1973, utilized photometric observations in ultraviolet and visible wavelengths. The Chapman-McCord-Johnson classification system of 1973 marked the beginning of formal asteroid taxonomy, employing reflectance spectrum slopes for classification. Subsequent developments included machine learning techniques, such as principal component analysis and artificial neural networks, for improved classification accuracy. Gaia mission's Data Release 3 has significantly expanded asteroid datasets, allowing more extensive analyses. In this study, I examine the relationship between asteroid photometric slopes, spectra, and taxonomy using a feed-forward neural network trained on known spectral types to classify asteroids of unknown types. Our classification gained the mean accuracy of 80.4 ± 2.0 % over 100 iterations and separated successfully three asteroid taxonomic groups (C, S, and X) and the asteroid class D.
  • Paavola, Jaakko (2024)
    Lenders assess the credit risk of loan applicants from both affordability and indebtedness perspective. The affordability perspective involves assessing the applicant’s disposable income after accounting for regular household expenditures and existing credit commitments, a measure called money-at-disposal or MaD. Having an estimate of the applicant’s expenditures is crucial, but simply asking applicants for their expenditures could lead to inaccuracies. Thus, lenders must produce their own estimates based on statistical or survey data about household expenditures, which are then passed to the MaD framework as input parameters or used as control limits to ascertain expenditure information reported by the applicant is truthful or at least adequately conservative. More accurate expenditure estimates in the loan origination would enable lenders to quantify mortgage credit risk more precisely, tailor loan terms more aptly, and protect customers against over-indebtedness better. Consequently, this would facilitate the lenders to be more profitable in their lending business as well as serve their customers better. But there is also a need for interpretability of the estimates stemming from compliance and trustworthiness motives. In this study, we examine the accuracy and interpretability of expenditure predictions of supervised models fitted to a microdataset of household consumption expenditures. To our knowledge, this is the first study to use such a granular and broad dataset to create predictive models of loan applicants’ expenditures. The virtually uninterpretable "black box" models we used, aiming at maximizing predictive power, rarely did better accuracy-wise than interpretable linear regression ones. Even when they did, the gain was marginal or in predicting minor expenditure categories that contributed only a low share of the total expenditures. Thus, ordinary linear regression is what we suggest generally provides the best combination of predictive power and interpretability. After careful feature selection, the best predictive power was attained with 20-54 predictor variables, the number depending on the expenditure category. If a very simple interpretation is needed, we suggest either a linear regression model of three predictor variables representing the number of household members, or a model based on the means within 12 "common sense groups" that we divided the households in. An alternative solution with a predictive power somewhere between the full linear regression model and the two simpler models is to use decision trees providing easy interpretation in the form of a set of rules.
  • Ulkuniemi, Uula (2022)
    This thesis presents a complication risk comparison of the most used surgical interventions for benign prostatic hyperplasia (BPH). The investigated complications are the development of either a post-surgery BPH recurrence (reoperation), an urethral stricture or stress incontinence severe enough to require a surgical procedure for their treatment. The analysis is conducted with survival analysis methods on a data set of urological patients sourced from the Finnish Institute for Health and Welfare. The complication risk development is estimated with the Aalen-Johansen estimator and the effects of certain covariates on the complication risks is estimated with the Cox PH regression model. One of the regression covariates is the Charlson Comorbidity Index score, which attempts to quantify a disease load of a patient at a certain point in time as a single number. A novel Spark algorithm was designed to facilitate the efficient calculation of the Charlson Comorbidity Index score on a data set of the same size as the one used in the analyses here. The algorithm achieved at least similar performance to the previously available ones and scaled better on larger data sets and with stricter computing resource constraints. Both the urethral stricture and urinary incontinence endpoints suffered from a lower number of samples, which made the associated results less accurate. The estimated complication probabilities in both endpoint types were also so low that the BPH procedures couldn’t be reliably differentiated. In contrast, BPH reoperation risk analyses yielded noticeable differences among the initial BPH procedures. Regression analysis results suggested that the Charlson Comoborbidity Index score isn’t a particularly good predictor in any of the endpoints. However, certain cancer types that are included in the Charlson Comorbidity Index score did predict the endpoints well when used as separate covariates. An increase in the patient’s age was associated with a higher complication risk, but less so than expected. In the urethral stricture and urinary incontinence endpoints the number of preceding BPH operations was usually associated with a notable complication risk increase.
  • Aarne, Onni (2022)
    The content we see is increasingly determined by ever more advanced recommender systems, and popular social media platform TikTok represents the forefront of this development (See Chapter 1). There has been much speculation about the workings of these recommender systems, but precious little systematic, controlled study (See Chapter 2). To improve our understanding of these systems, I developed sock puppet bots that consume content on TikTok as a normal user would (See Chapter 3). This allowed me to run controlled experiments to see how the TikTok recommender system would respond to sock puppets exhibiting different behaviors and preferences in a Finnish context, and how this would differ from the results obtained by earlier investigations (See Chapter 4). This research was done as part of a journalistic investigation in collaboration with Long Play. I found that TikTok appears to have adjusted their recommender system to personalize content seen by users to a much lesser degree, likely in response to a previous investigation by the WSJ. However, I came to the conclusion that, while sock puppet audits can be useful, they are not a sufficiently scalable solution to algorithm governance, and other types of audits with more internal access are needed (See Chapter 5).
  • Kinnunen, Samuli (2024)
    Chemical reaction optimization is an iterative process that targets identifying reaction conditions that maximize reaction output, typically yield. The evolution of optimization techniques has progressed from intuitive approaches to simple heuristics, and more recently, to statistical methods such as Design of Experiments approach. Bayesian optimization, which iteratively updates beliefs about a response surface and suggests parameters both exploiting conditions near the known optima and exploring uncharted regions, has shown promising results by reducing the number of experiments needed for finding the optimum in various optimization tasks. In chemical reaction optimization, the method allows minimizing the number of experiments required for finding the optimal reaction conditions. Automated tools like pipetting robots hold potential to accelerate optimization by executing multiple reactions concurrently. The integration of Bayesian optimization to automation reduces not only the workload and throughput but also optimization efficiency. However, adoption of these advanced techniques faces a barrier, as chemists often lack proficiency in machine learning and programming. To bridge this gap, Automated Chemical Reaction Optimization Software (ACROS) is introduced. This tool orchestrates an optimization loop: Bayesian optimization suggests reaction candidates, the parameters are translated into commands for a pipetting robot, the robot executes the operations, a chemist interprets the results, and data is fed back to the software for suggesting the next reaction candidates. The optimization tool was evaluated empirically using a numerical test function, in a Direct Arylation reaction dataset, and in real-time optimization of Sonogashira and Suzuki coupling reactions. The findings demonstrate that Bayesian optimization efficiently identifies optimal conditions, outperforming Design of Experiments approach, particularly in optimizing discrete parameters in batch settings. Three acquisition functions; Expected Improvement, Log Expected Improvement and Upper Confidence Bound; were compared. It can be concluded that expected improvement-based methods are more robust, especially in batch settings with process constraints.
  • Ilse, Tse (2019)
    Background: Electroencephalography (EEG) depicts electrical activity in the brain, and can be used in clinical practice to monitor brain function. In neonatal care, physicians can use continuous bedside EEG monitoring to determine the cerebral recovery of newborns who have suffered birth asphyxia, which creates a need for frequent, accurate interpretation of the signals over a period of monitoring. An automated grading system can aid physicians in the Neonatal Intensive Care Unit by automatically distinguishing between different grades of abnormality in the neonatal EEG background activity patterns. Methods: This thesis describes using support vector machine as a base classifier to classify seven grades of EEG background pattern abnormality in data provided by the BAby Brain Activity (BABA) Center in Helsinki. We are particularly interested in reconciling the manual grading of EEG signals by independent graders, and we analyze the inter-rater variability of EEG graders by building the classifier using selected epochs graded in consensus compared to a classifier using full-duration recordings. Results: The inter-rater agreement score between the two graders was κ=0.45, which indicated moderate agreement between the EEG grades. The most common grade of EEG abnormality was grade 0 (continuous), which made up 63% of the epochs graded in consensus. We first trained two baseline reference models using the full-duration recording and labels of the two graders, which achieved 71% and 57% accuracy. We achieved 82% overall accuracy in classifying selected patterns graded in consensus into seven grades using a multi-class classifier, though this model did not outperform the two baseline models when evaluated with the respective graders’ labels. In addition, we achieved 67% accuracy in classifying all patterns from the full-duration recording using a multilabel classifier.
  • Aaltonen, Topi (2024)
    Positron annihilation lifetime spectroscopy (PALS) is a method used to analyse the properties of materials, namely their composition and what kind of defects they might consist of. PALS is based on the annihilation of positrons with the electrons of a studied material. The average lifetime of a positron coming into contact with a studied material depends on the density of electrons in the surroundings of the positron, with higher densities of electrons naturally resulting in faster annihilations on average. Introducing positrons in a material and recording the annihilation times results in a spectrum that is, in general, a noisy sum of exponential decays. These decay components have lifetimes that depend on the different density areas present in the material, and relative intensities that depend on the fractions of each area in the material. Thus, the problem in PALS is inverting the spectrum to get the lifetimes and intensities, a problem known as exponential analysis in general. A convolutional neural network architecture was trained and tested on simulated PALS spectra. The aim was to test whether simulated data could be used to train a network that could predict the components of PALS spectra accurately enough to be usable on spectra gathered from real experiments. Reasons for testing the approach included trying to make the analysis of PALS spectra more automated and decreasing user-induced bias compared to some other approaches. Additionally, the approach was designed to require few computational resources, ideally being trainable and usable on a single computer. Overall, testing showed that the approach has some potential, but the prediction performance of the network depends on the parameters of the components of the target spectra, with likely issues being similar to those reported in previous literature. In turn, the approach was shown to be sufficiently automatable, particularly once training has been performed. Further, while some bias is introduced in specifying the variation of the training data used, this bias is not substantial. Finally, the network can be trained without considerable computational requirements within a sensible time frame.
  • Kovanen, Veikko (2020)
    Real estate appraisal, or property valuation, requires strong expertise in order to be performed successfully, thus being a costly process to produce. However, with structured data on historical transactions, the use of machine learning (ML) enables automated, data-driven valuation which is instant, virtually costless and potentially more objective compared to traditional methods. Yet, fully ML-based appraisal is not widely used in real business applications, as the existing solutions are not sufficiently accurate and reliable. In this study, we introduce an interpretable ML model for real estate appraisal using hierarchical linear modelling (HLM). The model is learned and tested with an empirical dataset of apartment transactions in the Helsinki area, collected during the past decade. As a result, we introduce a model which has competitive predictive performance, while being simultaneously explainable and reliable. The main outcome of this study is the observation that hierarchical linear modelling is a very potential approach for automated real estate appraisal. The key advantage of HLM over alternative learning algorithms is its balance of performance and simplicity: this algorithm is complex enough to avoid underfitting but simple enough to be interpretable and easy to productize. Particularly, the ability of these models to output complete probability distributions quantifying the uncertainty of the estimates make them suitable for actual business use cases where high reliability is required.