Independent Component Analysis for Binary Data
dc.date.accessioned | 2021-07-26T06:28:54Z | |
dc.date.available | 2021-07-26T06:28:54Z | |
dc.date.issued | 2021-07-26 | |
dc.identifier.uri | http://hdl.handle.net/123456789/37514 | |
dc.title | Independent Component Analysis for Binary Data | en |
ethesis.faculty | Matemaattis-luonnontieteellinen tiedekunta | fi |
ethesis.faculty | Faculty of Science | en |
ethesis.faculty | Matematisk-naturvetenskapliga fakulteten | sv |
ethesis.faculty.URI | http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca | |
ethesis.university.URI | http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97 | |
ethesis.university | Helsingin yliopisto | fi |
ethesis.university | University of Helsinki | en |
ethesis.university | Helsingfors universitet | sv |
dct.creator | Barin Pacela, Vitória | |
dct.issued | 2021 | |
dct.language.ISO639-2 | eng | |
dct.abstract | Independent Component Analysis (ICA) aims to separate the observed signals into their underlying independent components responsible for generating the observations. Most research in ICA has focused on continuous signals, while the methodology for binary and discrete signals is less developed. Yet, binary observations are equally present in various fields and applications, such as causal discovery, signal processing, and bioinformatics. In the last decade, Boolean OR and XOR mixtures have been shown to be identifiable by ICA, but such models suffer from limited expressivity, calling for new methods to solve the problem. In this thesis, "Independent Component Analysis for Binary Data", we estimate the mixing matrix of ICA from binary observations and an additionally observed auxiliary variable by employing a linear model inspired by the Identifiable Variational Autoencoder (iVAE), which exploits the non-stationarity of the data. The model is optimized with a gradient-based algorithm that uses second-order optimization with limited memory, resulting in a training time in the order of seconds for the particular study cases. We investigate which conditions can lead to the reconstruction of the mixing matrix, concluding that the method is able to identify the mixing matrix when the number of observed variables is greater than the number of sources. In such cases, the linear binary iVAE can reconstruct the mixing matrix up to order and scale indeterminacies, which are considered in the evaluation with the Mean Cosine Similarity Score. Furthermore, the model can reconstruct the mixing matrix even under a limited sample size. Therefore, this work demonstrates the potential for applications in real-world data and also offers a possibility to study and formalize identifiability in future work. In summary, the most important contributions of this thesis are the empirical study of the conditions that enable the mixing matrix reconstruction using the binary iVAE, and the empirical results on the performance and efficiency of the model. The latter was achieved through a new combination of existing methods, including modifications and simplifications of a linear binary iVAE model and the optimization of such a model under limited computational resources. | en |
dct.subject | independent component analysis | |
dct.subject | neural networks | |
dct.subject | variational autoencoder | |
dct.subject | binary data | |
dct.subject | ICA | |
dct.subject | deep learning | |
dct.subject | VAE | |
dct.language | en | |
ethesis.isPublicationLicenseAccepted | true | |
ethesis.language.URI | http://data.hulib.helsinki.fi/id/languages/eng | |
ethesis.language | englanti | fi |
ethesis.language | English | en |
ethesis.language | engelska | sv |
ethesis.thesistype | pro gradu -tutkielmat | fi |
ethesis.thesistype | master's thesis | en |
ethesis.thesistype | pro gradu-avhandlingar | sv |
ethesis.thesistype.URI | http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis | |
dct.identifier.ethesis | E-thesisID:f3da1a05-c2e3-4a71-b2bd-cb4fd8a41f8b | |
dct.identifier.urn | URN:NBN:fi:hulib-202107263423 | |
dc.type.dcmitype | Text | |
ethesis.facultystudyline | Mathematics / Computer and data science / Physics / Chemistry | fi |
ethesis.facultystudyline | Mathematics / Computer and data science / Physics / Chemistry | en |
ethesis.facultystudyline | Mathematics / Computer and data science / Physics / Chemistry | sv |
ethesis.facultystudyline.URI | http://data.hulib.helsinki.fi/id/SH50_147 | |
ethesis.mastersdegreeprogram | Datatieteen maisteriohjelma | fi |
ethesis.mastersdegreeprogram | Master's Programme in Data Science | en |
ethesis.mastersdegreeprogram | Magisterprogrammet i data science | sv |
ethesis.mastersdegreeprogram.URI | http://data.hulib.helsinki.fi/id/MH50_010 |
Files in this item
Files | Size | Format | View |
---|---|---|---|
BarinPacela_Vitoria_thesis_2021.pdf | 1.471Mb |
This item appears in the following Collection(s)
-
Faculty of Science [3595]