Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Yi, Xinxin"

Sort by: Order: Results:

  • Yi, Xinxin (2015)
    Problem: Helsinki psychotherapy study (HPS) is a quasi-experimental clinical trial, which is designed to compare the effects of different treatments (i.e. psychotherapy and psychoanalysis) on patients with mood and anxiety disorders. During its 5-year follow-ups from the year 2000 to 2005, repeated measurements were carried out at 0, 12, 24, 36, 48, 60 months. However, some individuals did not show up at certain data collection points or dropped out of the study forever, leading to the occurrence of missing values. This will prevent the applications of further statistical methods and violate the intention-to-treat (ITT) principle in longitudinal clinical trials (LCT). Method: Multiple Imputation (MI) has many claimed advantages in handling missing values. This research will compare different MI methods i.e. Markov chain Monte Carlo (MCMC), Bayesian Linear Regression (BLR), Predictive Mean Matching (PMM), Regression Tree (RT), Random Forest (RF) in their treatments of HPS missing data. The statistical software is SAS PROC MI procedure (version 9.3) and R MICE package (version 2.9). Results: MI has better performance than the ad-hoc methods such as listwise deletion in the detections of potential relationships and the reduction of potential biases in parameter estimations if missing completely at random (MCAR) assumption is not satisfied. PMM, RT and RF have better performance in generating imputed values inside the range of the observed data than BLR and MCMC. The machine learning methods i.e. RT and RF are preferable than the regression methods such as PMM and BLR since the imputed data have quite similar distribution curves and other features (e.g. median, interguatile, skewness of distribution) as the observed data. Implications: It is suggestive to use MI methods to replace those ad-hoc methods in the treatments of missing data, if additional efforts and time are not a problem. The machine learning methods such as RT and RF are more preferable than those relatively arbitrary user-specified regression methods such as PMM and BLR according to our data, but further research are required to approve this indication. R is more flexible than SAS where RT and RF can be applied.