Skip to main content
Login | Suomeksi | På svenska | In English

Detection of COVID-19 infected patients and patient deterioration from regular laboratory test results with Machine Learning

Show full item record

Title: Detection of COVID-19 infected patients and patient deterioration from regular laboratory test results with Machine Learning
Author(s): Roy, Suravi Saha
Contributor: University of Helsinki, Faculty of Science, Tietojenkäsittelytieteen osasto
Discipline: Tietojenkäsittelytiede
Language: English
Acceptance year: 2020
A global pandemic, COVID-19 began in December 2019 in Wuhan, China. Since then it has expanded all around the globe and was declared a global pandemic in early March by the World Health Organization (WHO). Ever since this pandemic started, the number of infections grew exponentially. Currently, there is a global rise in COVID-19 cases with 3.6 million new cases and new deaths with a weekly growth of 21%. The disease outbreak caused over 55.6 million infected cases and more than 1.34 million deaths worldwide since the beginning of this pandemic. Reverse transcription polymerase chain reaction (RT-PCR) test is the best protocol currently in use to detect COVID-19 positive patients. In a setup with low resources especially in developing countries with huge populations, RT-PCR test is not always a viable option for being expensive, time-consuming and it requires trained professionals. With the overwhelming number of infected cases, there is a significant need for a substitute that is cheaper, faster and accessible. In that regard, machine learning classification models are developed in this study to detect COVID-19 positive patients and predict the patient deterioration in the presence of missing data using a dataset published by hospital Israelita Albert Einstein, at São Paulo, Brazil. The dataset consists of 5644 anonymous patient samples who visited the hospital and tested for RT-PCR along with additional laboratory test results providing 111 clinical features. Additionally, there are more than 90% missing values in this dataset. To explore missing data analysis on COVID-19 clinical data, a comparison between a complete case analysis and imputed case analysis is reported in this study. It is established that the logistic regression model with multivariate imputations by chained equations (MICE) on the data, provides 91% and 85% sensitivity respectively for detecting COVID-19 positive patients and predicting the patient deterioration. The area under the receiver operating characteristics curve (AUC) score is reported at 93% and 89% for both tasks respectively. Sensitivity and AUC scores are selected for evaluating the model’s performance as false negatives are harmful for patient screening and triaging. The proposed pipeline is an alternative approach towards COVID-19 diagnosis and prognosis. Clinicians can employ this pipeline for early screening of COVID-19 suspected patients, triaging the medical procedures and as a secondary diagnostic tool for deciding patient’s priority for treatments by utilizing low-cost, readily available laboratory test results.
Keyword(s): COVID-19 detection RT-PCR test laboratory results machine learning missing data imputation

Files in this item

Files Size Format View
Roy_Suravi_thesis_2020.pdf 1.935Mb PDF

This item appears in the following Collection(s)

Show full item record