Skip to main content
Login | Suomeksi | På svenska | In English

Correcting the Effects of Missing Data in Helsinki Psychotherapy Study using Multiple Imputation

Show full item record

Title: Correcting the Effects of Missing Data in Helsinki Psychotherapy Study using Multiple Imputation
Author(s): Yi, Xinxin
Contributor: University of Helsinki, Faculty of Science, Department of Mathematics and Statistics
Language: English
Acceptance year: 2015
Abstract:
Problem: Helsinki psychotherapy study (HPS) is a quasi-experimental clinical trial, which is designed to compare the effects of different treatments (i.e. psychotherapy and psychoanalysis) on patients with mood and anxiety disorders. During its 5-year follow-ups from the year 2000 to 2005, repeated measurements were carried out at 0, 12, 24, 36, 48, 60 months. However, some individuals did not show up at certain data collection points or dropped out of the study forever, leading to the occurrence of missing values. This will prevent the applications of further statistical methods and violate the intention-to-treat (ITT) principle in longitudinal clinical trials (LCT). Method: Multiple Imputation (MI) has many claimed advantages in handling missing values. This research will compare different MI methods i.e. Markov chain Monte Carlo (MCMC), Bayesian Linear Regression (BLR), Predictive Mean Matching (PMM), Regression Tree (RT), Random Forest (RF) in their treatments of HPS missing data. The statistical software is SAS PROC MI procedure (version 9.3) and R MICE package (version 2.9). Results: MI has better performance than the ad-hoc methods such as listwise deletion in the detections of potential relationships and the reduction of potential biases in parameter estimations if missing completely at random (MCAR) assumption is not satisfied. PMM, RT and RF have better performance in generating imputed values inside the range of the observed data than BLR and MCMC. The machine learning methods i.e. RT and RF are preferable than the regression methods such as PMM and BLR since the imputed data have quite similar distribution curves and other features (e.g. median, interguatile, skewness of distribution) as the observed data. Implications: It is suggestive to use MI methods to replace those ad-hoc methods in the treatments of missing data, if additional efforts and time are not a problem. The machine learning methods such as RT and RF are more preferable than those relatively arbitrary user-specified regression methods such as PMM and BLR according to our data, but further research are required to approve this indication. R is more flexible than SAS where RT and RF can be applied.


Files in this item

Files Size Format View
Master Thesis Yi Xinxin 014302752.pdf 15.78Mb PDF

This item appears in the following Collection(s)

Show full item record