Browsing by Subject "Cross-validation"
Now showing items 1-3 of 3
-
(2020)It is challenging to identify causal genes and pathways explaining the associations with diseases and traits found by genome-wide association studies (GWASs). To solve this problem, a variety of methods that prioritize genes based on the variants identified by GWASs have been developed. In this thesis, the methods Data-driven Expression Prioritized Integration for Complex Traits (DEPICT) and Multi-marker Analysis of GenoMic Annotation (MAGMA) are used to prioritize causal genes based on the most recently published publicly available schizophrenia GWAS summary statistics. The two methods are compared using the Benchmarker framework, which allows an unbiased comparison of gene prioritization methods. The study has four aims. Firstly, to explain what are the differences between the gene prioritization methods DEPICT and MAGMA and how the two methods work. Secondly, to explain how the Benchmarker framework can be used to compare gene prioritization methods in an unbiased way. Thirdly, to compare the performance of DEPICT and MAGMA in prioritizing genes based on the latest schizophrenia summary statistics from 2018 using the Benchmarker framework. Lastly, to compare the performance of DEPICT and MAGMA on a schizophrenia GWAS with a smaller sample size by using Benchmarker. Firstly, the published results of the Benchmarker analyses using schizophrenia GWAS from 2014 were replicated to make sure that the framework is run correctly. The results were very similar and both the original and the replicated results show that DEPICT and MAGMA do not perform significantly differently. Furthermore, they show that the intersection of genes prioritized by DEPICT and MAGMA outperforms the outersection, which is defined as genes prioritized by only one of these methods. Secondly, Benchmarker was used to compare the performance of DEPICT and MAGMA on prioritizing genes using the schizophrenia GWAS from 2018. The results of the Benchmarker analyses suggest that DEPICT and MAGMA perform similarly with the GWAS from 2018 compared to the GWAS from 2014. Furthermore, an earlier schizophrenia GWAS from 2011 was used to check if the performance of DEPICT and MAGMA differs when a GWAS with lower statistical power is used. The results of the Benchmarker analyses make clear that MAGMA performs better than DEPICT in prioritizing genes using this smaller data set. Furthermore, for the schizophrenia GWAS from 2011 the outersection of genes prioritized by DEPICT and MAGMA outperforms the intersection. To conclude, the Benchmarker framework is a useful tool for comparing gene prioritization methods in an unbiased way. For the most recently published schizophrenia GWAS from 2018 there is no significant difference between the performance of DEPICT and MAGMA in prioritizing genes according to Benchmarker. For the smaller schizophrenia GWAS from 2011, however, MAGMA outperformed DEPICT.
-
(2019)This study is a cross-validation of a hierarchical theory-based model of personality trait factors that comprises hypotheses regarding which personality constructs predict specific job performance criteria. The personality measures include the Big Five dimensions together with the Need for Achievement factor. The predictor variables have been conceptually aligned with specific criterion variables that are clusters of competencies. The model consists of six one-to-one predictor-criterion relationships that are paired up into three higher-order relationships which in turn are aggregated into a single score of General Factor of Personality (GFP) on the predictor side and overall work performance on the criterion side. The original study conducted in 2015 (N=929) was based on an international sample of participants from various organisations, whereas this sample consists of employees from a single global company (N=109). The aim was to explore the similarities and differences in the results in comparison to the original data. All the participants completed the same online personality self-assessment with 31 psychometric scales and a 360-feedback tool measuring 22 competencies. At least one external reviewer nominated by the participant completed a review rating on those competencies. Principal Components were extracted to investigate how well the model fits this data and the results compared to the results from the original study. Correlations between the first-order and second-order (composite) variables were also checked. Finally, regression analyses were conducted to test nine hypotheses derived from the theoretical model. The results of this study show that there is a clear relationship between the GFP and the overall performance as the observed validity is r = .39 which is even higher than in the original study were this value was r = .23. Out of the six personality factors, Extraversion and Conscientiousness are the only significant predictors of various job performance outcome in this data and, all in all, three hypotheses out of nine are fully confirmed and a fourth one partially. The results are also discussed with view to what kind of a role a specific company culture or expected behaviours of people working in certain job roles might play on the results.
-
(2022)Comparison of amphetamine profiles is a task in forensic chemistry and its goal is to make decisions on whether two samples of amphetamine originate from the same source or not. These decisions help identifying and prosecuting the suppliers of amphetamine, which is an illicit drug in Finland. The traditional approach of comparing amphetamine samples involves computation of the Pearson correlation coefficient between two real-valued sample vectors obtained by gas chromatography-mass spectrometry analysis. A two-sample problem, such as the problem of comparing drug samples, can also be tackled with methods such as a t-test or Bayes factors. Recently, a newer method called predictive agreement (PA) has been applied in the comparison of amphetamine profiles, comparing the posterior predictive distributions induced by two samples. In this thesis, we did a statistical validation of the use of this newer method in amphetamine profile comparison. In this thesis, we compared the performance of the predictive agreement method to the traditional method involving computation of the Pearson correlation coefficient. Techniques such as simulation and cross-validation were used in the validation. In the simulation part, we simulated enough data to compute 10 000 PA and correlation values between sample pairs. Cross-validation was used in a case-study, where a repeated 5-fold group cross-validation was used to study the effect of changes in the data used in training of the model. In the cross-validation, performance of the models was measured with area under curve (AUC) values of receiver operating characteristics (ROC) and precision-recall (PR) curves. For the validation, two separate datasets collected by the National Bureau of Investigation of Finland (NBI), were available. One of the datasets was a larger collection of amphetamine samples, whereas the other dataset was a more curated group of samples, of which we also know which samples are somehow linked to each other. On top of these datasets, we simulated data representing amphetamine samples that were either from different or same source. The results showed that with the simulated data, predictive agreement outperformed the traditional method in terms of distinguishing sample pairs consisting of samples from different sources, from sample pairs consisting of samples from the same source. The case-study showed that changes in the training data have quite a marginal effect on the performance of the predictive agreement method, and also that with real world data, the PA method outperformed the traditional method in terms of AUC-ROC and AUC-PR values. Additionally, we concluded that the PA method has the benefit of interpretation, where the PA value between two samples can be interpreted as the probability of these samples originating from the same source.
Now showing items 1-3 of 3