Show simple item record

dc.contributor.advisor Shrivastava, Anshumali
dc.creatorChen, Beidi
dc.date.accessioned 2020-08-11T21:58:07Z
dc.date.available 2020-08-11T21:58:07Z
dc.date.created 2020-08
dc.date.issued 2020-08-11
dc.date.submitted August 2020
dc.identifier.citation Chen, Beidi. "Locality Sensitive Sampling for Extreme-Scale Optimization and Deep Learning." (2020) Diss., Rice University. https://hdl.handle.net/1911/109187.
dc.identifier.urihttps://hdl.handle.net/1911/109187
dc.description.abstract The exponential growth of data poses a number of challenges for scaling learning algorithms in machine learning and deep learning problems. This thesis aims to explore and tackle the computational challenges with randomized hashing algorithms and shed new light on Locality Sensitive Hashing (LSH) as an adaptive sampler for large-scale estimations. We first introduce the chicken-and-the-egg loop problem in large-scale optimization algorithms. SGD estimates the gradient by uniform sampling with sample size one. There have been several other works that suggest faster epoch-wise convergence by using weighted non-uniform sampling for better gradient estimates. Unfortunately, the per-iteration cost of maintaining this adaptive distribution for gradient estimation is more than calculating the full gradient itself. We break this barrier by providing the first demonstration of an LSH sampled stochastic gradient descent (LGD) that leads to superior gradient estimation while keeping the sampling cost per iteration similar to that of the uniform sampling. Then we demonstrate the power of LSH Sampling in our SLIDE (SUb-LInear Deep learning Engine), which drastically reduces the computations of extreme-scale neural network training and outperforms an optimized implementation of Tensorflow (TF) on the best available GPU with only a CPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. In addition to exploring these new possibilities for LSH, we also develop several classical hashing algorithms in the literature.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectDeep Learning
Locality Sensitive Hashing
dc.title Locality Sensitive Sampling for Extreme-Scale Optimization and Deep Learning
dc.type Thesis
dc.date.updated 2020-08-11T21:58:07Z
dc.type.material Text
thesis.degree.department Computer Science
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record