dc.date.accessioned |
2022-06-15T08:49:14Z |
|
dc.date.available |
2022-06-15T08:49:14Z |
|
dc.date.issued |
2022-06-15 |
|
dc.identifier.uri |
http://hdl.handle.net/123456789/41566 |
|
dc.title |
Comparison of high-dimensional Bayesian variable selection methods with application in genetics |
en |
ethesis.faculty |
Matemaattis-luonnontieteellinen tiedekunta |
fi |
ethesis.faculty |
Faculty of Science |
en |
ethesis.faculty |
Matematisk-naturvetenskapliga fakulteten |
sv |
ethesis.faculty.URI |
http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca |
|
ethesis.university.URI |
http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97 |
|
ethesis.university |
Helsingin yliopisto |
fi |
ethesis.university |
University of Helsinki |
en |
ethesis.university |
Helsingfors universitet |
sv |
dct.creator |
Laiho, Aleksi |
|
dct.issued |
2022 |
xx |
dct.abstract |
In statistics, data can often be high-dimensional with a very large number of variables,
often larger than the number of samples themselves. In such cases, selection of a
relevant configuration of significant variables is often needed. One such case is in
genetics, especially genome-wide association studies (GWAS).
To select the relevant variables from high-dimensional data, there exists various
statistical methods, with many of them relating to Bayesian statistics. This thesis aims
to review and compare two such methods, FINEMAP and Sum of Single Effects
(SuSiE). The methods are reviewed according to their accuracy of identifying the
relevant configurations of variables and their computational efficiency, especially in the
case where there exists high inter-variable correlations within the dataset. The
methods were also compared to more conventional variable selection methods, such
as LASSO.
The results show that both FINEMAP and SuSiE outperform LASSO in terms of
selection accuracy and efficiency, with FINEMAP producing sligthly more accurate
results with the expense of computation time compared to SuSiE. These results can be
used as guidelines in selecting an appropriate variable selection method based on the
study and data. |
en |
dct.subject |
statistics |
|
dct.subject |
variable selection |
|
dct.subject |
bayesian |
|
dct.subject |
gwas |
|
ethesis.isPublicationLicenseAccepted |
true |
|
ethesis.language.URI |
http://data.hulib.helsinki.fi/id/languages/eng |
|
ethesis.language |
englanti |
fi |
ethesis.language |
English |
en |
ethesis.language |
engelska |
sv |
ethesis.thesistype |
pro gradu -tutkielmat |
fi |
ethesis.thesistype |
master's thesis |
en |
ethesis.thesistype |
pro gradu-avhandlingar |
sv |
ethesis.thesistype.URI |
http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis |
|
dct.identifier.ethesis |
E-thesisID:631ac5d5-0a52-4511-93a8-52f6b1cd3c34 |
|
dct.identifier.urn |
URN:NBN:fi:hulib-202206152701 |
|
dct.alternative |
Suuriulotteisten bayesilaisten muuttujanvalintamenetelmien vertailu sovellettuna genetiikkaan |
fi |
ethesis.facultystudyline |
Tilastotiede |
fi |
ethesis.facultystudyline |
Statistics |
en |
ethesis.facultystudyline |
Statistik |
sv |
ethesis.facultystudyline.URI |
http://data.hulib.helsinki.fi/id/SH50_051 |
|
ethesis.mastersdegreeprogram |
Matematiikan ja tilastotieteen maisteriohjelma |
fi |
ethesis.mastersdegreeprogram |
Master 's Programme in Mathematics and Statistics |
en |
ethesis.mastersdegreeprogram |
Magisterprogrammet i matematik och statistik |
sv |
ethesis.mastersdegreeprogram.URI |
http://data.hulib.helsinki.fi/id/MH50_001 |
|