Skip to main content
Login | Suomeksi | På svenska | In English

Newton-based Optimization Methods for Noise-Contrastive Estimation

Show simple item record

dc.date.accessioned 2015-03-04T17:30:09Z und
dc.date.accessioned 2017-10-24T12:23:58Z
dc.date.available 2015-03-04T17:30:09Z und
dc.date.available 2017-10-24T12:23:58Z
dc.date.issued 2015-03-04T17:30:09Z
dc.identifier.uri http://radr.hulib.helsinki.fi/handle/10138.1/4521 und
dc.identifier.uri http://hdl.handle.net/10138.1/4521
dc.title Newton-based Optimization Methods for Noise-Contrastive Estimation en
ethesis.department.URI http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department Institutionen för datavetenskap sv
ethesis.department Department of Computer Science en
ethesis.department Tietojenkäsittelytieteen laitos fi
ethesis.faculty Matematisk-naturvetenskapliga fakulteten sv
ethesis.faculty Matemaattis-luonnontieteellinen tiedekunta fi
ethesis.faculty Faculty of Science en
ethesis.faculty.URI http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university Helsingfors universitet sv
ethesis.university University of Helsinki en
ethesis.university Helsingin yliopisto fi
dct.creator Qaiser, Beenish
dct.issued 2015
dct.language.ISO639-2 eng
dct.abstract The main focus of this thesis is on the use of Newton-based optimization methods for the optimization of an objective function that is used in the estimation of unnormalized statistical models. For these models, the probability density function (pdf) is known only up to a multiplicative normalizing factor. A properly normalized pdf is essential in maximum likelihood estimation (MLE) for density estimation. An unnormalized model can be converted into a normalized one by diving it by its integral (or sum) known as the partition function. We can compute the partition function analytically or approximate it using numerical integration methods. Here, we assume that the partition function is not available in a closed form. This makes MLE unsuitable for density estimation. We use a method known as noise-contrastive estimation (NCE) for density estimation of unnormalized models. This method does not rely on numerical integration to approximate the partition function. It estimates the normalizing constant along with the other unknown quantities of the model. The estimation process is based on the optimization of a well-defined objective function also known as the cross-entropy error function. There are no constraints in the optimization and hence, powerful optimization methods designed for unconstrained optimization can be used. Currently, a first-order optimization method known as the non-linear conjugate gradient (CG) method is being used. However, it has been shown that this method converges at a slow rate in case of large datasets. It is possible to use only a fraction of input samples (data and noise) in order to reduce the computation time of the algorithm. This technique is known as sample average approximation (SAA). However, accuracy of the estimates is compromised when random subsets of input samples are used in order to improve the computational performance of the non-linear CG method. There exists a trade-off between statistical accuracy of the estimates and computational performance of the algorithm. We propose to use the second-order Newton-based optimization methods such as the line search Newton-CG and the trust region Newton-CG methods. These methods produce better search directions than the non-linear CG method as they employ both the gradient and the Hessian of the objective function. However, the Newton method requires the Hessian to be positive definite in order to make progress. Thus, we use the Gauss-Newton approximation to the Hessian to avoid directions of negative curvature in case they occur. Furthermore, every iteration of the Newton method is computationally intensive as it requires computation of the Hessian and its inverse. We integrate the Newton-CG methods with the SAA framework to provide an efficient solution. The gradient is computed using whole sets of input samples whereas the Hessian is computed using random subsets. As a result, we are able to reduce the computation times of the Newton-CG algorithms without losing the statistical accuracy of the estimates. It is shown that the trust region strategy computationally performs better than the line search strategy. The Newton-CG methods converge faster and do not compromise the accuracy of the estimates even when random subsets consisting of 10% of the input samples only are used during the optimization. This is a considerable improvement over the currently employed non-linear CG method. en
dct.language en
ethesis.language.URI http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language English en
ethesis.language englanti fi
ethesis.language engelska sv
ethesis.thesistype pro gradu-avhandlingar sv
ethesis.thesistype pro gradu -tutkielmat fi
ethesis.thesistype master's thesis en
ethesis.thesistype.URI http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram Algorithms and Machine Learning en
dct.identifier.urn URN:NBN:fi-fe2017112251289
dc.type.dcmitype Text

Files in this item

Files Size Format View
BeenishQaiser15022015.pdf 2.456Mb PDF

This item appears in the following Collection(s)

Show simple item record