Show simple item record

dc.contributor.advisor Koushanfar, Farinaz
dc.creatorMirhoseini, Azalia
dc.date.accessioned 2016-01-27T22:46:20Z
dc.date.available 2016-01-27T22:46:20Z
dc.date.created 2015-05
dc.date.issued 2015-04-24
dc.date.submitted May 2015
dc.identifier.citation Mirhoseini, Azalia. "A Data and Platform-Aware Framework For Large-Scale Machine Learning." (2015) Diss., Rice University. https://hdl.handle.net/1911/88212.
dc.identifier.urihttps://hdl.handle.net/1911/88212
dc.description.abstract This thesis introduces a novel framework for execution of a broad class of iterative machine learning algorithms on massive and dense (non-sparse) datasets. Several classes of critical and fast-growing data, including image and video content, contain dense dependencies. Current pursuits are overwhelmed by the excessive computation, memory access, and inter-processor communication overhead incurred by processing dense data. On the one hand, solutions that employ data-aware processing techniques produce transformations that are oblivious to the overhead created on the underlying computing platform. On the other hand, solutions that leverage platform-aware approaches do not exploit the non-apparent data geometry. My work is the first to develop a comprehensive data- and platform-aware solution that provably optimizes the cost (in terms of runtime, energy, power, and memory usage) of iterative learning analysis on dense data. My solution is founded on a novel tunable data transformation methodology that can be customized with respect to the underlying computing resources and constraints. My key contributions include: (i) introducing a scalable and parametric data transformation methodology that leverages coarse-grained parallelism in the data to create versatile and tunable data representations, (ii) developing automated methods for quantifying platform-specific computing costs in distributed settings, (iii) devising optimally-bounded partitioning and distributed flow scheduling techniques for running iterative updates on dense correlation matrices, (iv) devising methods that enable transforming and learning on streaming dense data, and (v) providing user-friendly open-source APIs that facilitate adoption of my solution on multiple platforms including (multi-core and many-core) CPUs and FPGAs. Several learning algorithms such as regularized regression, cone optimization, and power iteration can be readily solved using my APIs. My solutions are evaluated on a number of learning applications including image classification, super-resolution, and denoising. I perform experiments on various real-world datasets with up to 5 billion non-zeros on a range of computing platforms including Intel i7 CPUs, Amazon EC2, IBM iDataPlex, and Xilinx Virtex-6 FPGAs. I demonstrate that my framework can achieve up to 2 orders of magnitude performance improvement in comparison with current state-of-the-art solutions.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectBig Data
Machine Learning
Data-Aware
Platform-Aware
Distributed optimization
Dense Data
dc.title A Data and Platform-Aware Framework For Large-Scale Machine Learning
dc.contributor.committeeMember Aazhang, Behnaam
dc.contributor.committeeMember Baraniuk, Richard
dc.contributor.committeeMember Jermaine, Christopher
dc.date.updated 2016-01-27T22:46:20Z
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Electrical and Computer Engineering
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record