Show simple item record

dc.contributor.advisor Jermaine, Christopher
dc.creatorGao, Zekai
dc.date.accessioned 2016-01-15T21:32:33Z
dc.date.available 2016-01-15T21:32:33Z
dc.date.created 2014-12
dc.date.issued 2014-09-26
dc.date.submitted December 2014
dc.identifier.citation Gao, Zekai. "Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices." (2014) Master’s Thesis, Rice University. https://hdl.handle.net/1911/87863.
dc.identifier.urihttps://hdl.handle.net/1911/87863
dc.description.abstract Computation of covariance matrices from observed data is an important problem, as such matrices are used in applications such as PCA, LDA, and increasingly in the learning and application of probabilistic graphical models. One of the most challenging aspects of constructing and managing covariance matrices is that they can be huge and the size makes then expensive to compute. For a p-dimensional data set with n rows, the covariance matrix will have p(p-1)/2 entries and the naive algorithm to compute the matrix will take O(np^2) time. For large p (greater than 10,000) and n much greater than p, this is debilitating. In this thesis, we consider the problem of computing a large covariance matrix efficiently in a distributed fashion over a large data set. We begin by considering the naive algorithm in detail, pointing out where it will and will not be feasible. We then consider reducing the time complexity using sampling-based methods to compute to compute an approximate, thresholded version of the covariance matrix. Here “thresholding” means that all of the unimportant values in the matrix have been dropped and replaced with zeroes. Our algorithms have probabilistic bounds which imply that with high probability, all of the top K entries in the matrix have been retained.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectDistributed algorithms
covariance matrices
dc.title Distributed Algorithms for Computing Very Large Thresholded Covariance Matrices
dc.type Thesis
dc.contributor.committeeMember Nakhleh, Luay
dc.contributor.committeeMember Subramanian, Devika
dc.date.updated 2016-01-15T21:32:33Z
dc.type.material Text
thesis.degree.department Computer Science
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Masters
thesis.degree.name Master of Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record