Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Subject "machine learning"

Sort by: Order: Results:

  • Lehtonen, Leevi (2021)
    Quantum computing has an enormous potential in machine learning, where problems can quickly scale to be intractable for classical computation. A Boltzmann machine is a well-known energy-based graphical model suitable for various machine learning tasks. Plenty of work has already been conducted for realizing Boltzmann machines in quantum computing, all of which have somewhat different characteristics. In this thesis, we conduct a survey of the state-of-the-art in quantum Boltzmann machines and their training approaches. Primarily, we examine variational quantum Boltzmann machine, a specific variant of quantum Boltzmann machine suitable for the near-term quantum hardware. Moreover, as variational quantum Boltzmann machine heavily relies on variational quantum imaginary time evolution, we effectively analyze variational quantum imaginary time evolution to a great extent. Compared to the previous work, we evaluate the execution of variational quantum imaginary time evolution with a more comprehensive collection of hyperparameters. Furthermore, we train variational quantum Boltzmann machines using a toy problem of bars and stripes, representing more multimodal probability distribution than the Bell states and the Greenberger-Horne-Zeilinger states considered in the earlier studies.
  • Hämäläinen, Kreetta (2021)
    Personalized medicine tailors therapies for the patient based on predicted risk factors. Some tools used for making predictions on the safety and efficacy of drugs are genetics and metabolomics. This thesis focuses on identifying biomarkers for the activity level of the drug transporter organic anion transporting polypep-tide 1B1 (OATP1B1) from data acquired from untargeted metabolite profiling. OATP1B1 transports various drugs, such as statins, from portal blood into the hepatocytes. OATP1B1 is a genetically polymorphic influx transporter, which is expressed in human hepatocytes. Statins are low-density lipoprotein cholesterol-lowering drugs, and decreased or poor OATP1B1 function has been shown to be associated with statin-induced myopathy. Based on genetic variability, individuals can be classified to those with normal, decreased or poor OATP1B1 function. These activity classes were employed to identify metabolomic biomarkers for OATP1B1. To find the most efficient way to predict the activity level and find the biomarkers that associate with the activity level, 5 different machine learning models were tested with a dataset that consisted of 356 fasting blood samples with 9152 metabolite features. The models included both a Random Forest regressor and a classifier, Gradient Boosted Decision Tree regressor and classifier, and a Deep Neural Network regressor. Hindrances specific for this type of data was the collinearity between the features and the large amount of features compared to the number of samples, which lead to issues in determining the important features of the neural network model. To adjust to this, the data was clustered according to their Spearman’s rank-order correlation ranks. Feature importances were calculated using two methods. In the case of neural network, the feature importances were calculated with permutation feature importance using mean squared error, and random forest and gradient boosted decision trees used gini impurity. The performance of each model was measured, and all classifiers had a poor ability to predict decreasead and poor function classes. All regressors performed very similarly to each other. Gradient boosted decision tree regressor performed the best by a slight margin, but random forest regressor and neural network regressor performed nearly as well. The best features from all three models were cross-referenced with the features found from y-aware PCA analysis. The y-aware PCA analysis indicated that 14 best features cover 95% of the explained variance, so 14 features were picked from each model and cross-referenced with each other. Cross-referencing highest scoring features reported by the best models found multiple features that showed up as important in many models.Taken together, machine learning methods provide powerful tools to identify potential biomarkers from untargeted metabolomics data.
  • Kahilakoski, Marko (2022)
    Various Denial of Service (DoS) attacks are common phenomena in the Internet. They can consume resources of servers, congest networks, disrupt services, or even halt systems. There are many machine learning approaches that attempt to detect and prevent attacks on multiple levels of abstraction. This thesis examines and reports different aspects of creating and using a dataset for machine learning purposes to detect attacks in a web server environment. We describe the problem field, origins and reasons behind the attacks, typical characteristics, and various types of attacks. We detail ways to mitigate the attacks and provide a review of current benchmark datasets. For the dataset used in this thesis, network traffic was captured in a real-world setting, and flow records were labeled. Experiments performed include selecting important features, comparing two supervised learning algorithms, and observing how a classifier model trained on network traffic on a specific date performs in detecting new malicious records over time in the same environment. The model was also tested with a recent benchmark dataset.
  • Huertas, Andres (2020)
    Investment funds are continuously looking for new technologies and ideas to enhance their results. Lately, with the success observed in other fields, wealth managers are taking a closes look at machine learning methods. Even if the use of ML is not entirely new in finance, leveraging new techniques has proved to be challenging and few funds succeed in doing so. The present work explores de usage of reinforcement learning algorithms for portfolio management for the stock market. It is well known the stochastic nature of stock and aiming to predict the market is unrealistic; nevertheless, the question of how to use machine learning to find useful patterns in the data that enable small market edges, remains open. Based on the ideas of reinforcement learning, a portfolio optimization approach is proposed. RL agents are trained to trade in a stock exchange, using portfolio returns as rewards for their RL optimization problem, thus seeking optimal resource allocation. For this purpose, a set of 68 stock tickers in the Frankfurt exchange market was selected, and two RL methods applied, namely Advantage Actor-Critic(A2C) and Proximal Policy Optimization (PPO). Their performance was compared against three commonly traded ETFs (exchange-traded funds) to asses the algorithm's ability to generate returns compared to real-life investments. Both algorithms were able to achieve positive returns in a year of testing( 5.4\% and 9.3\% for A2C and PPO respectively, a European ETF (VGK, Vanguard FTSE Europe Index Fund) for the same period, reported 9.0\% returns) as well as healthy risk-to-returns ratios. The results do not aim to be financial advice or trading strategies, but rather explore the potential of RL for studying small to medium size stock portfolios.
  • Romppainen, Jonna (2020)
    Surface diffusion in metals can be simulated with the atomistic kinetic Monte Carlo (KMC) method, where the evolution of a system is modeled by successive atomic jumps. The parametrisation of the method requires calculating the energy barriers of the different jumps that can occur in the system, which poses a limitation to its use. A promising solution to this are machine learning methods, such as artificial neural networks, which can be trained to predict barriers based on a set of pre-calculated ones. In this work, an existing neural network based parametrisation scheme is enhanced by expanding the atomic environment of the jump to include more atoms. A set of surface diffusion jumps was selected and their barriers were calculated with the nudged elastic band method. Artificial neural networks were then trained on the calculated barriers. Finally, KMC simulations of nanotip flattening were run using barriers which were predicted by the neural networks. The simulations were compared to the KMC results obtained with the existing scheme. The additional atoms in the jump environment caused significant changes to the barriers, which cannot be described by the existing model. The trained networks also showed a good prediction accuracy. However, the KMC results were in some cases more realistic or as realistic as the previous results, but often worse. The quality of the results also depended strongly on the selection of training barriers. We suggest that, for example, active learning methods can be used in the future to select the training data optimally.
  • Holopainen, Markus (2023)
    Context: Over the past years, the development of machine learning (ML) enabled software has seen a rise in popularity. Alongside this trend, new challenges have been identified, such as growing concerns about the use, including the ethical concerns, of ML models, as misuse can lead to severe consequences for human beings. To alleviate this problem, more comprehensive model documentation has been suggested, but how can that documentation be made part of a modern, continuous development process? Objective: We design and develop a solution, which consists of a software artefact and its surrounding process, which enables and moderates continuous documentation of ML models. The solution needs to comply with the modern way-of-working of software development. Method: We apply the design science research methodology to divide the design and development into six separate tasks, i.e., problem identification, objective definition, design and development, demonstration, evaluation, and communication. Results: The solution uses model cards for storing model details. These model cards are tested automatically and manually, forming a quality gate and ensuring integrity of the documentation. The software artefact is implemented in the form of a GitHub Action. Conclusion: We conclude that the software artefact supports and assures proper model documentation in the form of a model card. The artefact allows for customization by the user, thereby supporting domain-specific use cases.
  • Nygren, Saara (2020)
    A relational database management system’s configuration is essential while optimizing database performance. Finding the optimal knob configuration for the database requires tuning of multiple interdependent knobs. Over the past few years, relational database vendors have added machine learning models to their products and Oracle announced the first autonomous (i.e self-driving) database in 2017. This thesis clarifies the autonomous database concept and surveys the latest research on machine learning methods for relational database knob tuning. The study aimed to find solutions that can tune multiple database knobs and be applied to any relational database. The survey found three machine learning implementations that tune multiple knobs at a time. These are called OtterTune, CDBTune, and QTune. Ottertune uses traditional machine learning techniques, while CDBTune and QTune rely on deep reinforcement learning. These implementations are presented in this thesis, along with a discussion of the features they offer. The thesis also presents an autonomic system’s basic concepts like self-CHOP and MAPE-K feedback loop and a knowledge model to define the knowledge needed to implement them. These can be used in the autonomous database contexts along with Intelligent Machine Design and Five Levels of AI-Native Database to present requirements for the autonomous database.
  • Mukhtar, Usama (2020)
    Sales forecasting is crucial for run any retail business efficiently. Profits are maximized if popular products are available to fulfill the demand. It is also important to minimize the loss caused by unsold stock. Fashion retailers face certain challenges which make sales forecasting difficult for the products. Some of these challenges are the short life cycle of products and introduction of new products all around the year. The goal of this thesis is to study forecasting methods for fashion. We use the product attributes for products in a season to build a model that can forecast sales for all the products in the next season. Sales for different attributes are analysed for three years. Sales for different variables vary for values which indicate that a model fitted on product attributes may be used for forecasting sales. A series of experiments are conducted with multiple variants of the datasets. We implemented multiple machine learning models and compared them against each other. Empirical results are reported along with the baseline comparisons to answer research questions. Results from first experiment indicate that machine learning models are almost doing as good as the baseline model that uses mean values as predictions. The results may improve in the upcoming years when more data is available for training. The second experiment shows that models built for specific product groups are better than the generic models that are used to predict sales for all kinds of products. Since we observed a heavy tail in the data, a third experiment was conducted to use logarithmic sales for predictions, and the results do not improve much as compared to results from previous methods. The conclusion of the thesis is that machine learning methods can be used for attribute-based sales forecasting in fashion industry but more data is needed, and modeling specific groups of products bring better results.
  • Mylläri, Juha (2022)
    Anomaly detection in images is the machine learning task of classifying inputs as normal or anomalous. Anomaly localization is the related task of segmenting input images into normal and anomalous regions. The output of an anomaly localization model is a 2D array, called an anomaly map, of pixel-level anomaly scores. For example, an anomaly localization model trained on images of non-defective industrial products should output high anomaly scores in image regions corresponding to visible defects. In unsupervised anomaly localization the model is trained solely on normal data, i.e. without labelled training observations that contain anomalies. This is often necessary as anomalous observations may be hard to obtain in sufficient quantities and labelling them is time-consuming and costly. Student-teacher feature pyramid matching (STFPM) is a recent and powerful method for unsupervised anomaly detection and localization that uses a pair of convolutional neural networks of identical architecture. In this thesis we propose two methods of augmenting STFPM to produce better segmentations. Our first method, discrepancy scaling, significantly improves the segmentation performance of STFPM by leveraging pre-calculated statistics containing information about the model’s behaviour on normal data. Our second method, student-teacher model assisted segmentation, uses a frozen STFPM model as a feature detector for a segmentation model which is then trained on data with artificially generated anomalies. Using this second method we are able to produce sharper anomaly maps for which it is easier to set a threshold value that produces good segmentations. Finally, we propose the concept of expected goodness of segmentation, a way of assessing the performance of unsupervised anomaly localization models that, in contrast to current metrics, explicitly takes into account the fact that a segmentation threshold needs to be set. Our primary method, discrepancy scaling, improves segmentation AUROC on the MVTec AD dataset over the base model by 13%, measured in the shrinkage of the residual (1.0 − AUROC). On the image-level anomaly detection task, a variant of the discrepancy scaling method improves performance by 12%.
  • Jääskeläinen, Matias (2020)
    This thesis is about exploring descriptors for atmospheric molecular clusters. Descriptors are needed for applying machine learning methods for molecular systems. There is a collection of descriptors readily available in the DScribe-library developed in Aalto University for custom machine learning applications. The question of which descriptors to use is up to the user to decide. This study takes the first steps in integrating machine learning into existing procedure of configurational sampling that aims to find the optimal structure for any given molecular cluster of interest. The structure selection step forms a bottleneck in the configurational sampling procedure. A new structure selection method presented in this study uses k-means clustering to find structures that are similar to each other. The clustering results can be used to discard redundant structures more effectively than before which leaves fewer structures to be calculated with more expensive computations. Altogether that speeds up the configurational sampling procedure. To aid the selection of suitable descriptor for this application, a comparison of four descriptors available in DScribe is made. A procedure for structure selection by representing atmospheric clusters with descriptors and labeling them into groups with k-means was implemented. The performance of descriptors was compared with a custom score suitable for this application, and it was found that MBTR outperforms the other descriptors. This structure selection method will be utilized in the existing configurational sampling procedure for atmospheric molecular clusters but it is not restricted to that application.
  • Davis, Keith III (2020)
    We study the use of data collected via electroencephalography (EEG) to classify stimuli presented to subjects using a variety of mathematical approaches. We report an experiment with three objectives: 1) To train individual classifiers that reliably infer the class labels of visual stimuli using EEG data collected from subjects; 2) To demonstrate brainsourcing, a technique to combine brain responses from a group of human contributors each performing a recognition task to determine classes of stimuli; 3) To explore collaborative filtering techniques applied to data produced by individual classifiers to predict subject responses for stimuli in which data is unavailable or otherwise missing. We reveal that all individual classifier models perform better than a random baseline, while a brainsourcing model using data from as few as four participants achieves performance superior to any individual classifier. We also show that matrix factorization applied to classifier outputs as a collaborative filtering approach achieves predictive results that perform better than random. Although the technique is fairly sensitive to the sparsity of the dataset, it nonetheless demonstrates a viable proof-of-concept and warrants further investigation.
  • Alcantara, Jose Carlos (2020)
    A recent machine learning technique called federated learning (Konecny, McMahan, et. al., 2016) offers a new paradigm for distributed learning. It consists of performing machine learning on multiple edge devices and simultaneously optimizing a global model for all of them, without transmitting user data. The goal for this thesis was to prove the benefits of applying federated learning to forecasting telecom key performance indicator (KPI) values from radio network cells. After performing experiments with different data sources' aggregations and comparing against a centralized learning model, the results revealed that a federated model can shorten the training time for modelling new radio cells. Moreover, the amount of transferred data to a central server is minimized drastically while keeping equivalent performance to a traditional centralized model. These experiments were performed with multi-layer perceptron as model architecture after comparing its performance against LSTM. Both, input and output data were sequences of KPI values.
  • Mäkinen, Sasu (2021)
    Deploying machine learning models is found to be a massive issue in the field. DevOps and Continuous Integration and Continuous Delivery (CI/CD) has proven to streamline and accelerate deployments in the field of software development. Creating CI/CD pipelines in software that includes elements of Machine Learning (MLOps) has unique problems, and trail-blazers in the field solve them with the use of proprietary tooling, often offered by cloud providers. In this thesis, we describe the elements of MLOps. We study what the requirements to automate the CI/CD of Machine Learning systems in the MLOps methodology. We study if it is feasible to create a state-of-the-art MLOps pipeline with existing open-source and cloud-native tooling in a cloud provider agnostic way. We designed an extendable and cloud-native pipeline covering most of the CI/CD needs of Machine Learning system. We motivated why Machine Learning systems should be included in the DevOps methodology. We studied what unique challenges machine learning brings to CI/CD pipelines, production environments and monitoring. We analyzed the pipeline’s design, architecture, and implementation details and its applicability and value to Machine Learning projects. We evaluate our solution as a promising MLOps pipeline, that manages to solve many issues of automating a reproducible Machine Learning project and its delivery to production. We designed it as a fully open-source solution that is relatively cloud provider agnostic. Configuring the pipeline to fit the client needs uses easy-to-use declarative configuration languages (YAML, JSON) that require minimal learning overhead.
  • Roy, Suravi Saha (2020)
    A global pandemic, COVID-19 began in December 2019 in Wuhan, China. Since then it has expanded all around the globe and was declared a global pandemic in early March by the World Health Organization (WHO). Ever since this pandemic started, the number of infections grew exponentially. Currently, there is a global rise in COVID-19 cases with 3.6 million new cases and new deaths with a weekly growth of 21%. The disease outbreak caused over 55.6 million infected cases and more than 1.34 million deaths worldwide since the beginning of this pandemic. Reverse transcription polymerase chain reaction (RT-PCR) test is the best protocol currently in use to detect COVID-19 positive patients. In a setup with low resources especially in developing countries with huge populations, RT-PCR test is not always a viable option for being expensive, time-consuming and it requires trained professionals. With the overwhelming number of infected cases, there is a significant need for a substitute that is cheaper, faster and accessible. In that regard, machine learning classification models are developed in this study to detect COVID-19 positive patients and predict the patient deterioration in the presence of missing data using a dataset published by hospital Israelita Albert Einstein, at São Paulo, Brazil. The dataset consists of 5644 anonymous patient samples who visited the hospital and tested for RT-PCR along with additional laboratory test results providing 111 clinical features. Additionally, there are more than 90% missing values in this dataset. To explore missing data analysis on COVID-19 clinical data, a comparison between a complete case analysis and imputed case analysis is reported in this study. It is established that the logistic regression model with multivariate imputations by chained equations (MICE) on the data, provides 91% and 85% sensitivity respectively for detecting COVID-19 positive patients and predicting the patient deterioration. The area under the receiver operating characteristics curve (AUC) score is reported at 93% and 89% for both tasks respectively. Sensitivity and AUC scores are selected for evaluating the model’s performance as false negatives are harmful for patient screening and triaging. The proposed pipeline is an alternative approach towards COVID-19 diagnosis and prognosis. Clinicians can employ this pipeline for early screening of COVID-19 suspected patients, triaging the medical procedures and as a secondary diagnostic tool for deciding patient’s priority for treatments by utilizing low-cost, readily available laboratory test results.
  • Suihkonen, Sini (2023)
    The importance of protecting sensitive data from information breaches has increased in recent years due to companies and other institutions gathering massive datasets about their customers, including personally identifiable information. Differential privacy is one of the state-of-the-art methods for providing provable privacy to these datasets, protecting them from adversarial attacks. This thesis focuses on studying existing differentially private random forest (DPRF) algorithms, comparing them, and constructing a version of the DPRF algorithm based on these algorithms. Twelve articles from the late 2000s to 2022, each implementing a version of the DPRF algorithm, are included in the review of previous work. The created algorithm, called DPRF_thesis , uses a privatized median as a method for splitting internal nodes of the decision trees. The class counts of the leaf-nodes are made with the exponential mechanism. Tests on the DPRF_thesis algorithm were run on three binary classification UCI datasets, and the accuracy results were mostly comparable with the two existing DPRF algorithms DPRF_thesis was compared to. ACM Computing Classification System (CCS): Computing methodologies → Machine learning → Machine learning approaches → Classification and regression trees Security and privacy → Database and storage security → Data anonymization and sanitization
  • Jokinen, Olli (2024)
    The rise of large language models (LLMs) has revolutionized natural language processing, par- ticularly through transfer learning and fine-tuning paradigms that enhance the understanding of complex textual data. This thesis builds upon the concept of fine-tuning to improve the under- standing of Finnish Wikipedia articles. Specifically, a BERT-based language model is fine-tuned to create high-quality document representations from Finnish texts. The learned representations are applied to downstream tasks, where the model’s performance is evaluated against baseline models. This thesis draws on the SPECTER paper, published in 2020, which introduced a training frame- work for fine-tuning a general-purpose document embedder. SPECTER was trained using a document-level training objective that leveraged document link information. Originally, SPECTER was designed for scientific articles, utilizing citations between articles. The training instances con- sisted of triplets of query, positive, and negative papers, with the aim of capturing the semantic similarity of the documents. This work extends the SPECTER framework to Finnish Wikipedia data. While scientific articles have citations, Wikipedia’s cross-references are used to build a document graph that captures the relatedness between articles. Additionally, Wikipedia data is publicly available as a full data dump, making it an attractive choice for the dataset in this thesis. One of the objectives is to demonstrate the flexibility of the SPECTER framework on a new dataset that has a similar networked structure to that of scientific articles. The fine-tuned model can be used as a general-purpose tool for various tasks and applications; however, in this thesis, its performance is measured in topic classification and cross-reference ranking. The Transformer-based language model produces fixed-length embeddings, which are used as features in the topic classification task and as vectors to measure the L2 distance of article vectors in the cross-reference prediction task. This thesis shows that the proposed model, WikiSpecter, optimized with a document-level objective, outperformed baseline models in both tasks. The performance indicates that Finnish Wikipedia provides relevant cross-references that help the model capture relationships across a range of topics.
  • Bortolussi, Federica (2022)
    The exploration of mineral resources is a major challenge in a world that seeks sustainable energy, renewable energy, advanced engineering, and new commercial technological devices. The rapid decrease in mineral reserves shifted the focus to under-explored and low accessibility areas that led to the use of on-site portable techniques for mineral mapping purposes, such as near infrared hyperspectral image sensors. The large datasets acquired with these instruments needs data pre-processing, a series of mathematical manipulations that can be achieved using machine learning. The aim of this thesis is to improve an existing method for mineralogy mapping, by focusing on the mineral classification phase. More specifically, a spectral similarity index was utilized to support machine learning classifiers. This was introduced because of the inability of the employed classification models to recognize samples that are not part of a given database; the models always classified samples based on one of the known labels of the database. This could be a problem in hyperspectral images as the pure component found in a sample could correspond to a mineral but also to noise or artefacts due to a variety of reasons, such as baseline correction. The spectral similarity index calculates the similarity between a sample spectrum and its assigned database class spectrum; this happens through the use of a threshold that defines whether the sample belongs to a class or not. The metrics utilized in the spectral similarity index were the spectral angler mapper, the correlation coefficient and five different distances. The machine learning classifiers used to evaluate the spectral similarity index were the decision tree, k-nearest neighbor, and support vector machine. Simulated distortions were also introduced in the dataset to test the robustness of the indexes and to choose the best classifier. The spectral similarity index was assessed with a dataset of nine minerals acquired from the Geological Survey of Finland retrieved from a Specim SWIR camera. The validation of the indexes was assessed with two mine samples obtained with a VTT active hyperspectral sensor prototype. The support vector machine was chosen after the comparison between the three classifiers as it showed higher tolerance to distorted data. With the evaluation of the spectral similarity indexes, was found out that the best performances were achieved with SAM and Chebyshev distance, which maintained high stability with smaller and bigger threshold changes. The best threshold value found is the one that, in the dataset analysed, corresponded to the number of spectra available for each class. As for the validation procedure no reference was available; because of this reason, the results of the mine samples obtained with the spectral similarity index were compared with results that can be obtained through visual interpretation, which were in agreement. The method proposed can be useful to future mineral exploration as it is of great importance to correctly classify minerals found during explorations, regardless the database utilized.
  • Suomela, Samu (2021)
    Large graphs often have labels only for a subset of nodes. Node classification is a semi-supervised learning task where unlabeled nodes are assigned labels utilizing the known information of the graph. In this thesis, three node classification methods are evaluated based on two metrics: computational speed and node classification accuracy. The three methods that are evaluated are label propagation, harmonic functions with Gaussian fields, and Graph Convolutional Neural Network (GCNN). Each method is tested on five citation networks of different sizes extracted from a large scientific publication graph, MAG240M-LSC. For each graph, the task is to predict the subject areas of scientific publications, e.g., cs.LG (Machine Learning). The motivation of the experiments is to give insight on whether the methods would be suitable for automatic labeling of scientific publications. The results show that label propagation and harmonic functions with Gaussian fields reach mediocre accuracy in the node classification task, while GCNN had a low accuracy. Label propagation was computationally slow compared to the other methods, whereas harmonic functions were exceptionally fast. Training of the GCNN took a long time compared to harmonic functions, but computational speed was acceptable. However, none of the methods reached a high enough classification accuracy to be utilized in automatic labeling of scientific publications.
  • Saarinen, Tuomo (2020)
    The use of machine learning and algorithms in decision making processes in our every day lifehas been growing rapidly. The uses range from bank loans and taxation to criminal sentencesand child care decisions. Because of the possible high importance of such decisions, we need tomake sure that the algorithms used are as unbiased as possible.The purpose of this thesis is to provide an overview of the possible biases in algorithm assisteddecision making, how these biases affect the decision making process, and go through someproposes on how to tackle these biases. Some of the proposed solutions are more technical,including algorithms and different ways to filter bias from the machine learning phase. Othersolutions are more societal and legal and address the things we need to take into account whendeciding what can be done to reduce bias by legislation or by enlightening people on the issuesof data mining and big data.
  • Säkkinen, Niko (2020)
    Predicting patient deterioration in an Intensive Care Unit (ICU) effectively is a critical health care task serving patient health and resource allocation. At times, the task may be highly complex for a physician, yet high-stakes and time-critical decisions need to be made based on it. In this work, we investigate the ability of a set of machine learning models to algorithimically predict future occurrence of in hospital death based on Electronic Health Record (EHR) data of ICU-patients. For one, we will assess the generalizability of the models. We do this by evaluating the models on hospitals the data of which has not been considered when training the models. For another, we consider the case in which we have access to some EHR data for the patients treated at a hospital of interest. In this setting, we assess how EHR data from other hospitals can be used in the optimal way to improve the prediction accuracy. This study is important for the deployment and integration of such predictive models in practice, e.g., for real-time algorithmic deterioration prediction for clinical decision support. In order to address these questions, we use the eICU collaborative research database, which is a database containing EHRs of patients treated at a heterogeneous collection of hospitals in the United States. In this work, we use the patient demographics, vital signs and Glasgow coma score as the predictors. We devise and describe three computational experiments to test the generalization in different ways. The used models are the random forest, gradient boosted trees and long short-term memory network. In our first experiment concerning the generalization, we show that, with the chosen limited set of predictors, the models generalize reasonably across hospitals but that only a small data mismatch is observed. Moreover, with this setting, our second experiment shows that the model performance does not significantly improve when increasing the heterogeneity of the training set. Given these observations, our third experiment shows that