Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by master's degree program "Datatieteen maisteriohjelma"

Sort by: Order: Results:

  • Bouri, Ioanna (2019)
    In model selection, it is necessary to select a model from a set of candidate models based on some observed data. The model should fit the data well, but without being overly complex, since that would not allow the model to generalize well its predictions to unseen data. Information criteria are widely used model selection methods that select a model based on some criteria. Information criteria estimate a score for each candidate model, and use that score to make a selection. A common way of estimating such a score, rewards the candidate model for its goodness of fit on some observed data and penalizes for the model complexity. Many popular information criteria, such as Akaike's Information Criterion (AIC) and Bayesian Information Criterion (BIC) penalize model complexity by the feature dimension. However, in a non-standard setting with inherent dependencies, these criteria are prone to over-penalizing the complexity of the model. Motivated by how these commonly used criteria tend to over-penalize, we evaluate AIC and BIC on a multi-target setting with correlated features. We compare AIC and BIC, with the Fisher Information Criterion (FIC), a criterion that takes into consideration correlations amongst features and does not penalize model complexity solely by the feature dimension of the candidate model. We evaluate the feature selection and predictive performances of the three information criteria in a linear regression setting with correlated features. We evaluate the precision, recall and F1 score of the set of features each criterion selects, compared to the feature set of the generative model. Under this setting's assumptions, we find that FIC yields the best results, compared to AIC and BIC, both in the feature selection and predictive performance evaluation. Finally, using FIC's properties for feature selection, we derive a formulation that allows to approximate the effective feature dimension of models with correlated features, in linear regression settings.
  • Melkas, Laila (2021)
    Multiple algorithms exist for the detection of causal relations from observational data but they are limited by their required assumptions regarding the data or by available computational resources. Only limited amount of information can be extracted from finite data but domain experts often have some knowledge of the underlying processes. We propose combining an expert’s prior knowledge with data likelihood to find models with high posterior probability. Our high-level procedure for interactive causal structure discovery contains three modules: discovery of initial models, navigation in the space of causal structures, and validation for model selection and evaluation. We present one manner of formulating the problem and implementing the approach assuming a rational, Bayesian expert which assumption we use to model the user in simulated experiments. The expert navigates greedily in the structure space using their prior information and the structures’ fit to data to find a local maximum a posteriori structure. Existing algorithms provide initial models for the navigation. Through simulated user experiments with synthetic data and use cases with real-world data, we find that the results of causal analysis can be improved by adding prior knowledge. Additionally, different initial models can lead to the expert finding different causal models and model validation helps detect overfitting and concept drift.
  • Ylitalo, Markku (2023)
    This Master’s thesis covers the visualization process of Finnish housing and mortgage markets by referring Tamara Munzner’s nested visualization process model [25]. This work is implemented as an assignment for the Bank of Finland that is the national monetary authority and central bank of Finland. The thesis includes a literature survey in which the different stages of the visualization task are examined by referring them to previous studies, and an experimental part that describes the actual implementation steps of the visualization ensemble, that is an encompassing collection of interactive dashboard sheets regarding Finnish housing and mortgage markets, which was made as a supporting analysis tool for the economists of the Bank of Finland. The domain aspects of this visualization task are validated by arranging an end user survey for the economists of the Bank of Finland. Nearly a hundred open answers were collected and processed from which the fundamental guidelines of the desired end product were formed. By following these guidelines and leaning on the know-how of the previous studies, the concrete visualization task was completed successfully. According to the gathered feedback, the visualization ensemble managed to correspond the expectations of end users comprehensively, and to fulfill its essential purpose as a macroeconomic analysis tool laudably. ACM Computing Classification System (CCS): Human-centered computing → Visualization → Visualization techniques Human-centered computing → Visualization → Empirical studies in visualization
  • Ersalan, Muzaffer Gür (2019)
    In this thesis, Convolutional Neural Networks (CNN) and Inverse Mathematic methods will be discussed for automated defect detection in materials that are used for radiation detectors. The first part of the thesis is dedicated to the literature review on the methods that are used. These include a general overview of Neural Networks, computer vision algorithms and Inverse Mathematics methods, such as wavelet transformations, or total variation denoising. In the Materials and Methods section, how these methods can be utilized in this problem setting will be examined. Results and Discussions part will reveal the outcomes and takeaways from the experiments. A focus of this thesis is put on the CNN architecture that fits the task best, how to optimize that chosen CNN architecture and discuss, how selected inputs created by Inverse Mathematics influence the Neural Network and it's performance. The results of this research reveal that the initially chosen Retina-Net is well suited for the task and the Inverse Mathematics methods utilized in this thesis provided useful insights.
  • Niemi, Roope Oskari (2022)
    DeepRx is a deep learning receiver which replaces much of the functionality of a traditional 5G receiver. It is a deep model which uses residual connections and a fully convolutional architecture to process an incoming signal, and it outputs log-likelihood ratios for each bit. However, the deep model can be computationally too heavy to use in a real environment. Nokia Bell Labs has recently developed an iterative version of the DeepRx, where a model with fewer layers is used iteratively. This thesis focuses on developing a neural network which determines how many iterations the iterative DeepRx needs to use. We trained a separate neural network, the stopping condition neural network, which will be used together with the iterative model. It predicts the number of iterations the model requires to process the input correctly, with the aim that each inference uses as few iterations as possible. The model also stops the inference early if it predicts that the required number of iterations is greater than the maximum amount. Our results show that an iterative model with a stopping condition neural network has significantly fewer parameters than the deep model. The results also show that while the stopping condition neural network could predict with a high accuracy which samples could be decoded, using it also increased the uncoded bit error rate of the iterative model slightly. Therefore, using a stopping condition neural network together with an iterative model seems to be a flexible lightweight alternative to the DeepRx model.
  • Holmberg, Daniel (2022)
    The LHC particle accelerator at CERN probes the elementary building blocks of matter by colliding protons at a center-of-mass energy of √s = 13 TeV. Collimated sprays of particles arise when quarks and gluons are produced at high energies, that are reconstructed from measured data and clustered together into jets. Accurate measurements of the energy of jets are paramount for sensitive particle physics analyses at the CMS experiment. Jet energy corrections are for that reason used to map measurements towards Monte Carlo simulated truth values, which are independent of detector response. The aim of this thesis is to improve upon the standard jet energy corrections by utilizing deep learning. Recent advancements on learning from point clouds in the machine learning community have been adopted in particle physics studies to improve jet flavor classification accuracy. This includes representing jet constituents as an unordered set, or a so-called “particle cloud”. Two highly performant models suitable for such data are the set-based Particle Flow Network and the graph-based ParticleNet. A natural next step in the advancement of jet energy corrections is to adopt a similar methodology, only changing the problem statement from classification to regression. The deep learning models developed in this work provide energy corrections that are generically applicable to differently flavored jets. Their performance is presented in the form of jet energy response resolution and reduction in flavor dependence. The models achieve state of the art performance for both metrics, significantly surpassing the standard corrections benchmark.
  • Martikainen, Jussi-Pekka (2019)
    Wood is the fuel for the forest industry. Fellable wood is collected from the forests and requires transportation to the mills. The distance to the mills is quite often very long. The most used long- distance transportation means of wood in Finland is by road transportation with wood-trucks. The poor condition of the lower road network increases the transportation costs not only for the forest industry but for the whole natural resources industry. Timely information about the conditions of the lower road network is considered beneficial for the wood transportation and for the road maintenance planning to reduce the transportation related costs. Acquisition of timely information about the conditions of the lower road network is a laborious challenge to the industry specialists due to the vast size of the road network in Finland. Until the recent development in ubiquitous mobile computing collecting the road measurement data and the detection of certain road anomalies from the measurements has traditionally required expensive and specialized equipment. Crowdsensing with the capabilities of a modern smartphone is seen as inexpensive means with high potential to acquire timely information about the conditions of the lower road network. In this thesis a literature review is conducted to find out the deteriorative factors behind the conditions of the lower road network in Finland. Initial assumptions are drawn about the detectability of such factors from the inertial sensor data of a smartphone. The literature on different computational methods for detecting the road anomalies based on the obtained accelerometer and gyroscope measurement data is reviewed. As a result a summary about the usability of the reviewed computational methods for detecting the reviewed deteriorative factors is presented. And finally suggestions for further analysis for obtaining more training data for machine learning methods and for predicting the road conditions are presented.
  • Moisio, Mikko (2021)
    Semantic textual similarity (STS), the procedure of determining how similar pieces of text are in terms of their meaning, is an important problem in the rapidly evolving field of natural language processing (NLP). STS accelerates major information retrieval applications dealing with natural language text, such as web search engines. For computational efficiency reasons, text pieces are often encoded into semantically meaningful real-valued vectors, sentence embeddings, that can be compared with similarity metrics. Majority of recent NLP research has focused on a small set of largest Indo-European languages and Chinese. Although much of the research is machine learning oriented and is thus often applicable across languages, languages with lesser speaker population, such as Finnish, often lack annotated data required to train, or even evaluate, complex models. BERT, a language representation framework building on transfer learning, is one of the recent quantum leaps in NLP research. BERT-type models take advantage of unsupervised pre-training reducing annotated data demands for supervised tasks. Furthermore, a BERT modification called Sentence-BERT enables us to extend and train BERT-type models to derive semantically meaningful sentence embeddings. However, yet the annotated data demands for conventional training of a Sentence-BERT is relatively low, often such data is unavailable for low-resourced languages. Multilingual knowledge distillation has been shown to be a working strategy for extending mono- lingual Sentence-BERT models to new languages. This technique allows transferring and merging desired properties of two language models, and, instead of annotated data, consumes bilingual parallel samples. In this thesis we study using knowledge distillation to transfer STS properties learnt from English into a model pre-trained on Finnish while bypassing the lack of annotated Finnish data. Further, we experiment distillation with different types of data, English-Finnish bilingual, English monolingual and random pseudo samples, to observe which properties of training data are really necessary. We acquire a bilingual English-Finnish test dataset by translating an existing annotated English dataset and use this set to evaluate the fit of our resulting models. We evaluate the performance of the models in different tasks, English, Finnish and English-Finnish cross-lingual STS, to observe how well the properties being transferred are captured, and how well the models retain the desired properties they already have. We find that knowledge distillation is indeed a feasible approach for obtaining a relatively high quality Sentence-BERT for Finnish. Surprisingly, in all setups large portion of desired properties are transferred to the Finnish model, and, training with English-Finnish bilingual data yields best Finnish sentence embedding model we are aware of.
  • Saukkoriipi, Mikko (2022)
    Two factors define the success of a deep neural network (DNN) based application; the training data and the model. Nowadays, many state-of-the-art DNN models are available free of charge, and training and deploying these models is easier than ever before. As a result, anyone can set up a state-of-the-art DNN algorithm within days or even hours. In the past, most of the focus has been given to the model when researchers were building faster and more accurate deep learning architectures. These research groups commonly use large and high-quality datasets in their work, which is not the case when one wants to train a new model for a specific use case. Training a DNN algorithm for a specific task requires collecting a vast amount of unlabelled data and then labeling the training data. To train a high-performance model, the labeled training dataset must be large and diverse to cover all relevant scenarios of the intended use case. This thesis will present an efficient and straightforward active learning method to sample the most informative images to train a powerful anchor-free Intersection over Union (IoU) predicting objector detector. Our method only uses classification confidences and IoU predictions to estimate the image informativeness. By collecting the most informative images, we can cover the whole diversity of the images with fewer human-annotated training images. This will save time and resources, as we avoid labeling images that would not be beneficial.
  • Lehtoranta, Selina (2020)
    Tutkielma on toteutettu suomalaisen elintarvike- ja logistiikkayrityksen pyynnöstä, ja heidän pää-asiallisena tavoitteena on saada vastaus kysymykseen "Voidaanko toimitusketjun lämpötilaa soveltaa toimitusasiakkaan velvoittamaan vastaanotto-lämpötilan mittaukseen?" Tutkielmassa esitetään ja sovelletaan kahta eri klusterointitekniikkaa, jotka ovat k-means -klusterointi ja EM-algoritmi Gaussin sekoitemalleille. Tutkielmassa hieman verrataan näitä kahta klusterointitekniikkaa ja selvitetään, kumpi niistä on parempi tällaisessa tutkimuksessa. Perinteisen EM-GMM lähestymistavan lisäksi EM-algoritmia sovelletaan Gaussin sekoitemalleille hyödyntäen pääkomponenttianalyysia. Näiden lisäksi vastataan tutkimuskysymykseen käyttäen suhteellista muutosta.
  • Huttunen, Mika (2021)
    Arvopaperin tulevan hinnanmuodostuksen ennustaminen on mielenkiintoista niin sijoittajan kuin aktiivisesti kauppaa tekevän markkinatoimijan näkökulmasta. Tarpeeksi hyvällä tarkkuudella arvopaperin tulevaa hinnanmuodostusta ennustamalla voi markkinatoimija ostaa arvopaperia ennen sen mahdollista markkinahinnan nousua, tai suojata salkkuaan sitä jo omistaessaan, mikäli on vaara, että arvopaperin markkinahinta laskee ajan mittaan merkittävästi. Tutkielmassani käsittelen koneoppimisen soveltamista tekniseen analyysiin. Tutkin, voidaanko tekniseen analyysiin pohjautuen markkinan tai arvopaperin tulevaa hinnanmuodostusta ennustaa lyhyellä aikavälillä tarpeeksi hyvällä tarkkuudella. Selvitän arvopaperimarkkinoiden toimintaa ja käyn läpi, miten tarkasteltavan markkinan tulevaa kysynnän ja tarjonnan suhdetta voidaan teknistä analyysiä hyödyntäen pyrkiä ennustamaan. Taustoitan myös omassa tutkimuksessa käyttämieni teknisen analyysin indikaattorien sekä koneoppimisen menetelmien toimintaa ja esitän aiempaa tutkimusta ongelman parissa. Havaitsin, että markkinoiden tulevan hinnanmuodostuksen ennustaminen on haastavaa. Käyttämilläni ohjatun oppimisen menetelmillä en onnistunut generoimaan mallia, joka olisi osannut ennustaa S&P 500-osakeindeksille, onko tarkasteltavaa ajanhetkeä seuraavan lyhyen aikavälin päätteeksi markkinahinta korkeammalla vai enintään yhtä korkealla kuin tarkasteluajankohtana. Opetetut mallit saavuttivat parhaimmillaan vain 50.8 − 51.4 % ennustetarkkuuden, kun taas naiivi luokittelija, joka ennustaa jokaisen aikavälin päätteeksi markkinahinnan kohonneen saavuttaa 53.0 %:n tarkkuuden. Vehnäfutuurisopimusmarkkinalle saamani tulokset olivat lupaavampia ja opetetut mallit saavuttivat edellä mainitun ongelmanratkaisuun parhaimmillaan 51.7 − 52.5 % ennustetarkkuuden, joka ylitti naiivin luokittelijan 50.9 % tarkkuuden. Analysoin saamiani tuloksia ja esitin jatkotutkimusmahdollisuuksia mallien tehostamiseksi.
  • Lampinen, Sebastian (2022)
    Modeling customer engagement assists a business in identifying the high risk and high potential customers. A way to define high risk and high potential customers in a Software-as-a-Service (SaaS) business is to define them as customers with high potential to churn or upgrade. Identifying the high risk and high potential customers in time can help the business retain and grow revenue. This thesis uses churn and upgrade prediction classifiers to define a customer engagement score for a SaaS business. The classifiers used and compared in the research were logistic regression, random forest and XGBoost. The classifiers were trained using data from the case-company containing customer data such as user count and feature usage. To tackle class imbalance, the models were also trained with oversampled training data. The hyperparameters of each classifier were optimised using grid search. After training the models, performance of the classifiers on a test data was evaluated. In the end, the XGBoost classifiers outperformed the other classifiers in churn prediction. In predicting customer upgrades, the results were more mixed. Feature importances were also calculated, and the results showed that the importances differ for churn and upgrade prediction.
  • Hytönen, Jimi (2022)
    In recent years, significant progress has been made in computer vision regarding object detection and tracking which has allowed the emergence of various applications. These often focus on identifying and tracking people in different environments such as buildings. Detecting people allows us to get a more comprehensive view of people flow as traditional IoT data from elevators cannot track individual people and their trajectories. In this thesis, we concentrate on people detection in elevator lobbies which we can use to improve the efficiency of the elevators and the convenience of the building. We compare the performance and speed of various object detection algorithms. Additionally, we research an edge device's capability to run an object detection model on multiple cameras and whether a single device can cover the target building. We were able to train an object detection algorithm suitable for our application. This allowed accurate people detection that can be used for people counting. We found that out of the three object detection algorithms we trained, YOLOv3 was the only one capable of generalizing to unseen environments, which is essential for general purpose application. The performances of the other two models (SSD and Faster R-CNN) were poor in terms of either accuracy or speed. Based on these, we chose to deploy YOLOv3 to the edge device. We found that the edge device's inference time is linearly dependent on the number of cameras. Therefore, we can conclude that one edge device should be sufficient for our target building, allowing two cameras for each floor. We also demonstrated that the edge device allows easy addition of an object tracking layer, which is required for the solution to work in a real-life office building.
  • Muiruri, Dennis (2021)
    Ubiquitous sensing is transforming our societies and how we interact with our surrounding envi- ronment; sensors provide large streams of data while machine learning techniques and artificial intelligence provide the tools needed to generate insights from the data. These developments have taken place in almost every industry sector with topics such as smart cities and smart buildings becoming key topical issues as societies seek more sustainable ways of living. Smart buildings are the main context of this thesis. These are buildings equipped with various sensors used to collect data from the surrounding environment allowing the building to adapt itself and increasing its operational efficiency. Previously, most efforts in realizing smart buildings have focused on energy management and au- tomation where the goal is to improve costs associated with heating, ventilation, and air condi- tioning. A less studied area involves smart buildings and their indoor environments especially relative to sub-spaces within a building. Increased developments in low-cost sensor technologies have created new opportunities to sense indoor environments in more granular ways that provide new possibilities to model finer attributes of spaces within a building. This thesis focuses on modeling indoor environment data obtained from a multipurpose building that serves primarily as a school. The aim is to explore the quality of the indoor environment relative to regulatory guidelines and also exploring suitable predictive models for thermal comfort and indoor air quality. Additionally, design science methodology is applied in the creation of a proof of concept software system. This system is aimed at demonstrating the use of Web APIs to provide sensor data to clients that may use the data to render analytics among other insights to a building’s stakeholders. Overall, the main technical contributions of this thesis are twofold: (i) a potential web-application design for indoor air quality IoT data and (ii) an exposition of modeling of indoor air quality data based on a variety of sensors and multiple spaces within the same building. Results indicate a software-based tool that supports monitoring the indoor environment of a building would be beneficial in maintaining the correct levels of various indoor parameters. Further, modeling data from different spaces within the building shows a need for heterogeneous models to predict variables in these spaces. This implies parameters used to predict thermal comfort and air quality are different in varying spaces especially where the spaces differ in size, indoor climate control settings, and other attributes such as occupancy control.
  • Noykova, Neli (2022)
    This work is focused on Bayesian hierarchical modeling of geographical distribution of marine species Coregonus lavaretus L. s.l. along the Gulf of Bothnia. Spatial dependences are modeled by Gaussian processes. The main modeling objective is to predict whitefish larvae distribution for previously unobserved spatial locations along the Gulf of Bothnia. In order to achieve this objective, we have to solve two main tasks: to investigate the sensitivity of posterior parameters estimates with respect to different parameter priors, and to solve model selection task. In model selection, among all candidate models, we have to choose the model with best predictive performance. The candidate models were divided into two main groups: models that describe spatial effects, and models without such description. The candidates in each group involved different number (6 or 8) and expressions of environmental variables. In the group describing spatial effects, we analyzed four different models of Gaussian mean, and for every mean model we used four different prior parameters combinations. The same four models of latent function were used in the candidates where spatial dependences were not described. For every such model we assigned four different priors of overdispersion parameter. Thus, all at all, 32 candidate models were analyzed. All candidate models were estimated with Hamiltonian Monte Carlo MCMC algorithm. Model checks were conducted using the posterior predictive distributions. The predictive distributions were evaluated using the logarithmic score with 10 fold cross validation. The analysis of posterior estimates in models describing spatial effects revealed, that these estimates were very sensitive to prior parameters choices. The provided sensitivity analysis helped us to choose the most suitable priors combination. The results from model selection showed that the model, which showed best predictive performance, does not need to be very complicated and to involve description of spatial effects when the data are not informative enough to detect well the spatial effects. Although the selected model was simpler, the corresponding predictive maps of log larvae intensity correctly predicted the larvae distribution along the Gulf of Bothnia.
  • Niska, Päivö (2024)
    This thesis delves into the complex world of multi-model database migration, investigating its theoretical foundations, strategic implementation, and implications for modern data management. The research utilizes a mixed-methods approach, combining quantitative benchmarking tests with qualitative insights from industry practitioners to give a comprehensive knowledge of the migration process. The importance of smart migration techniques, as well as the crucial function of schema mapping in assuring data consistency are highlighted as key results. Success examples from a variety of industries highlight the practical relevance and advantages of multi-model database migration, while implications for theoretical advances and practical issues in organizational contexts are discussed. The strategic implementation framework leads businesses via rigorous project planning, schema mapping, and iterative optimization, stressing the joint efforts of multiple stakeholders. Future concerns include the influence of developing technologies, the dynamic interaction between migration and data security, and industry-specific subtleties impacting migration tactics as the technological environment advances. The synthesis of ideas leads to a common knowledge base, defining the data management strategy discourse. This investigation serves as a road map for informed decision-making, iterative optimization, and continual adaptation in database management, developing a better knowledge of multi-model database migration in the context of modern data ecosystems.
  • Chen, Cheng (2022)
    How to store data is an enduring topic in the computer science field, and traditional relational databases have done this well and are still widely used today. However, with the growth of non-relational data and the challenges in the big data era, a series of NoSQL databases have come into view. Thus, comparing, evaluating, and choosing a better database has become a worthy topic of research. In this thesis, an experiment that can store the same data set and execute the same tasks or workload on the relational, graph and multi-model databases is designed. The investigation proposes how to adapt relational data, tables on a graph database and, conversely, store graph data on a relational database. Similarly, the tasks performed are unified across query languages. We conducted exhaustive experiments to compare and report the performance of the three databases. In addition, we propose a workload classification method to analyze the performance of the databases and compare multiple aspects of the database from an end-user perspective. We have selected PostgreSQL, ArangoDB, Neo4j as representatives. The comparison in terms of task execution time does not have any database that completely wins. The results show that relational databases have performance advantages for tasks such as data import, but the execution of multi-table join tasks is slow and graph algorithm support is lacking. The multi-model databases have impressive support for simultaneous storage of multiple data formats and unified language queries, but the performance is not outstanding. The graph database has strong graph algorithm support and intuitive support for graph query language, but it is also important to consider whether the format and interrelationships of the original data, etc. can be well converted into graph format.
  • Gierlach, Mateusz Tadeusz (2020)
    Visual fashion understanding (VFU) is a discipline which aims to solve tasks related to clothing recognition, such as garment categorization, garment’s attributes prediction or clothes retrieval, with the use of computer vision algorithms trained on fashion-related data. Having surveyed VFU- related scientific literature, I conclude that, because of the fact that at the heart of all VFU tasks is the same issue of visually understanding garments, those VFU tasks are in fact related. I present a hypothesis that building larger multi-task learning models dedicated to predicting multiple VFU tasks at once might lead to better generalization properties of VFU models. I assess the validity of my hypothesis by implementing two deep learning solutions dedicated primarily to category and attribute prediction. First solution uses multi-task learning concept of sharing features from ad- ditional branch dedicated to localization task of landmarks’ position prediction. Second solution does not share knowledge from localization branch. Comparison of those two implementations con- firmed my hypothesis, as sharing knowledge between tasks increased category prediction accuracy by 53% and attributes prediction recall by 149%. I conclude that multi-task learning improves generalization properties of deep learning-based visual fashion understanding models across tasks.
  • Hätönen, Vili (2020)
    Recently it has been shown that sparse neural networks perform better than dense networks with similar number of parameters. In addition, large overparameterized networks have been shown to contain sparse networks which, while trained in isolation, reach or exceed the performance of the large model. However, the methods to explain the success of sparse networks are still lacking. In this work I study the performance of sparse networks using network’s activation regions and patterns, concepts from the neural network expressivity literature. I define network specialization, a novel concept that considers how distinctly a feed forward neural network (FFNN) has learned to processes high level features in the data. I propose Minimal Blanket Hypervolume (MBH) algorithm to measure the specialization of a FFNN. It finds parts of the input space that the network associates with some user-defined high level feature, and compares their hypervolume to the hypervolume of the input space. My hypothesis is that sparse networks specialize more to high level features than dense networks with the same number of hidden network parameters. Network specialization and MBH also contribute to the interpretability of deep neural networks (DNNs). The capability to learn representations on several levels of abstraction is at the core of deep learning, and MBH enables numerical evaluation of how specialized a FFNN is w.r.t. any abstract concept (a high level feature) that can be embodied in an input. MBH can be applied to FFNNs in any problem domain, e.g. visual object recognition, natural language processing, or speech recognition. It also enables comparison between FFNNs with different architectures, since the metric is calculated in the common input space. I test different pruning and initialization scenarios on the MNIST Digits and Fashion datasets. I find that sparse networks approximate more complex functions, exploit redundancy in the data, and specialize to high level features better than dense, fully parameterized networks with the same number of hidden network parameters.
  • Li, Yinong (2024)
    The thesis is about developing a new neural network-based simulation-based inference (SBI) method for performing flexible point estimation; we call this method Neural Amortization of Bayesian Point Estimation (NBPE). Firstly, using neural networks, we can achieve amortized inference so that most of the computation cost is spent on training the neural network while performing inference only costs a few milliseconds. In this thesis, we utilize an encoder-decoder architecture; we use an encoder as a summary network to extract informative features from raw data and then feed them to a decoder as an inference network to output point estimations. Moreover, with a novel training method, the utilization of a variable \( \alpha \) in the loss function \( |\theta_i - \theta_{\text{pred}}|^\alpha \) enables the prediction of different statistics (mean, median, mode) of the posterior distribution. Thus, with our method, at inference time, we can get a fast point estimation, and if we want to get different statistics of the posterior, we have to specify the value of the power of the loss $\alpha$. When $\alpha = 2$, the result will be the mean; when $\alpha = 1$, the result will be the median; and when $\alpha$ is getting closer to 0, the result will approach the mode. We conducted comprehensive experiments on both toy and simulator models to demonstrate these features. In the first part of the analysis, we focused on testing the accuracy and efficiency of our method, NBPE. We compared it to the established method called Neural Posterior Estimation (NPE) in the BayesFlow SBI software. NBPE performs with competitive accuracy compared to NPE and can perform faster inference than NPE. In the second part of the analysis, we concentrated on the flexible point estimation capabilities of NBPE. We conducted experiments on three conjugate models since most of these models' posterior mean, median, and mode have analytical expressions, which leads to more straightforward analysis. The results show that at inference time, the different choices of $\alpha$ can influence the output exactly, and the results align with our expectations. In summary, in this thesis, we propose a new neural SBI method, NBPE, that can perform fast, accurate, and flexible point estimation, broadening the application of SBI in downstream tasks of Bayesian inference.