Skip to main content
Login | Suomeksi | På svenska | In English
Independent Component Analysis for Binary Data

Show simple item record 2021-07-26T06:28:54Z 2021-07-26T06:28:54Z 2021-07-26
dc.title Independent Component Analysis for Binary Data en
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty.URI Helsingin yliopisto fi University of Helsinki en Helsingfors universitet sv
dct.creator Barin Pacela, Vitória
dct.issued 2021
dct.language.ISO639-2 eng
dct.abstract Independent Component Analysis (ICA) aims to separate the observed signals into their underlying independent components responsible for generating the observations. Most research in ICA has focused on continuous signals, while the methodology for binary and discrete signals is less developed. Yet, binary observations are equally present in various fields and applications, such as causal discovery, signal processing, and bioinformatics. In the last decade, Boolean OR and XOR mixtures have been shown to be identifiable by ICA, but such models suffer from limited expressivity, calling for new methods to solve the problem. In this thesis, "Independent Component Analysis for Binary Data", we estimate the mixing matrix of ICA from binary observations and an additionally observed auxiliary variable by employing a linear model inspired by the Identifiable Variational Autoencoder (iVAE), which exploits the non-stationarity of the data. The model is optimized with a gradient-based algorithm that uses second-order optimization with limited memory, resulting in a training time in the order of seconds for the particular study cases. We investigate which conditions can lead to the reconstruction of the mixing matrix, concluding that the method is able to identify the mixing matrix when the number of observed variables is greater than the number of sources. In such cases, the linear binary iVAE can reconstruct the mixing matrix up to order and scale indeterminacies, which are considered in the evaluation with the Mean Cosine Similarity Score. Furthermore, the model can reconstruct the mixing matrix even under a limited sample size. Therefore, this work demonstrates the potential for applications in real-world data and also offers a possibility to study and formalize identifiability in future work. In summary, the most important contributions of this thesis are the empirical study of the conditions that enable the mixing matrix reconstruction using the binary iVAE, and the empirical results on the performance and efficiency of the model. The latter was achieved through a new combination of existing methods, including modifications and simplifications of a linear binary iVAE model and the optimization of such a model under limited computational resources. en
dct.subject independent component analysis
dct.subject neural networks
dct.subject variational autoencoder
dct.subject binary data
dct.subject ICA
dct.subject deep learning
dct.subject VAE
dct.language en
ethesis.isPublicationLicenseAccepted true
ethesis.language englanti fi
ethesis.language English en
ethesis.language engelska sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype pro gradu-avhandlingar sv
dct.identifier.ethesis E-thesisID:f3da1a05-c2e3-4a71-b2bd-cb4fd8a41f8b
dct.identifier.urn URN:NBN:fi:hulib-202107263423
dc.type.dcmitype Text
ethesis.facultystudyline Mathematics / Computer and data science / Physics / Chemistry fi
ethesis.facultystudyline Mathematics / Computer and data science / Physics / Chemistry en
ethesis.facultystudyline Mathematics / Computer and data science / Physics / Chemistry sv
ethesis.mastersdegreeprogram Datatieteen maisteriohjelma fi
ethesis.mastersdegreeprogram Master's Programme in Data Science en
ethesis.mastersdegreeprogram Magisterprogrammet i data science sv

Files in this item

Files Size Format View
BarinPacela_Vitoria_thesis_2021.pdf 1.471Mb PDF

This item appears in the following Collection(s)

Show simple item record