Newton-based Optimization Methods for Noise-Contrastive Estimation

Newton-based Optimization Methods for Noise-Contrastive Estimation

dc.date.accessioned	2015-03-04T17:30:09Z	und
dc.date.accessioned	2017-10-24T12:23:58Z
dc.date.available	2015-03-04T17:30:09Z	und
dc.date.available	2017-10-24T12:23:58Z
dc.date.issued	2015-03-04T17:30:09Z
dc.identifier.uri	http://radr.hulib.helsinki.fi/handle/10138.1/4521	und
dc.identifier.uri	http://hdl.handle.net/10138.1/4521
dc.title	Newton-based Optimization Methods for Noise-Contrastive Estimation	en
ethesis.department.URI	http://data.hulib.helsinki.fi/id/225405e8-3362-4197-a7fd-6e7b79e52d14
ethesis.department	Institutionen för datavetenskap	sv
ethesis.department	Department of Computer Science	en
ethesis.department	Tietojenkäsittelytieteen laitos	fi
ethesis.faculty	Matematisk-naturvetenskapliga fakulteten	sv
ethesis.faculty	Matemaattis-luonnontieteellinen tiedekunta	fi
ethesis.faculty	Faculty of Science	en
ethesis.faculty.URI	http://data.hulib.helsinki.fi/id/8d59209f-6614-4edd-9744-1ebdaf1d13ca
ethesis.university.URI	http://data.hulib.helsinki.fi/id/50ae46d8-7ba9-4821-877c-c994c78b0d97
ethesis.university	Helsingfors universitet	sv
ethesis.university	University of Helsinki	en
ethesis.university	Helsingin yliopisto	fi
dct.creator	Qaiser, Beenish
dct.issued	2015
dct.language.ISO639-2	eng
dct.abstract	The main focus of this thesis is on the use of Newton-based optimization methods for the optimization of an objective function that is used in the estimation of unnormalized statistical models. For these models, the probability density function (pdf) is known only up to a multiplicative normalizing factor. A properly normalized pdf is essential in maximum likelihood estimation (MLE) for density estimation. An unnormalized model can be converted into a normalized one by diving it by its integral (or sum) known as the partition function. We can compute the partition function analytically or approximate it using numerical integration methods. Here, we assume that the partition function is not available in a closed form. This makes MLE unsuitable for density estimation. We use a method known as noise-contrastive estimation (NCE) for density estimation of unnormalized models. This method does not rely on numerical integration to approximate the partition function. It estimates the normalizing constant along with the other unknown quantities of the model. The estimation process is based on the optimization of a well-defined objective function also known as the cross-entropy error function. There are no constraints in the optimization and hence, powerful optimization methods designed for unconstrained optimization can be used. Currently, a first-order optimization method known as the non-linear conjugate gradient (CG) method is being used. However, it has been shown that this method converges at a slow rate in case of large datasets. It is possible to use only a fraction of input samples (data and noise) in order to reduce the computation time of the algorithm. This technique is known as sample average approximation (SAA). However, accuracy of the estimates is compromised when random subsets of input samples are used in order to improve the computational performance of the non-linear CG method. There exists a trade-off between statistical accuracy of the estimates and computational performance of the algorithm. We propose to use the second-order Newton-based optimization methods such as the line search Newton-CG and the trust region Newton-CG methods. These methods produce better search directions than the non-linear CG method as they employ both the gradient and the Hessian of the objective function. However, the Newton method requires the Hessian to be positive definite in order to make progress. Thus, we use the Gauss-Newton approximation to the Hessian to avoid directions of negative curvature in case they occur. Furthermore, every iteration of the Newton method is computationally intensive as it requires computation of the Hessian and its inverse. We integrate the Newton-CG methods with the SAA framework to provide an efficient solution. The gradient is computed using whole sets of input samples whereas the Hessian is computed using random subsets. As a result, we are able to reduce the computation times of the Newton-CG algorithms without losing the statistical accuracy of the estimates. It is shown that the trust region strategy computationally performs better than the line search strategy. The Newton-CG methods converge faster and do not compromise the accuracy of the estimates even when random subsets consisting of 10% of the input samples only are used during the optimization. This is a considerable improvement over the currently employed non-linear CG method.	en
dct.language	en
ethesis.language.URI	http://data.hulib.helsinki.fi/id/languages/eng
ethesis.language	English	en
ethesis.language	englanti	fi
ethesis.language	engelska	sv
ethesis.thesistype	pro gradu-avhandlingar	sv
ethesis.thesistype	pro gradu -tutkielmat	fi
ethesis.thesistype	master's thesis	en
ethesis.thesistype.URI	http://data.hulib.helsinki.fi/id/thesistypes/mastersthesis
ethesis.degreeprogram	Algorithms and Machine Learning	en
dct.identifier.urn	URN:NBN:fi-fe2017112251289
dc.type.dcmitype	Text

Files in this item

Files	Size	Format	View
BeenishQaiser15022015.pdf	2.456Mb	PDF

This item appears in the following Collection(s)

Faculty of Science [4203]

Show simple item record

Newton-based Optimization Methods for Noise-Contrastive Estimation

Files in this item

This item appears in the following Collection(s)

Yhteystiedot

HELSINGIN YLIOPISTO