A Resource-Aware Streaming-based Framework for Big Data Analysis
Darvish Rouhani, Bita
Master of Science
The ever growing body of digital data is challenging conventional analytical techniques in machine learning, computer vision, and signal processing. Traditional analytical methods have been mainly developed based on the assumption that designers can work with data within the confines of their own computing environment. The growth of big data, however, is changing that paradigm especially in scenarios where severe memory and computational resource constraints exist. This thesis aims at addressing major challenges in big data learning problem by devising a new customizable computing framework that holistically takes into account the data structure and underlying platform constraints. It targets a widely used class of analytical algorithms that model the data dependencies by iteratively updating a set of matrix parameters, including but not limited to most regression methods, expectation maximization, and stochastic optimizations, as well as the emerging deep learning techniques. The key to our approach is a customizable, streaming-based data projection methodology that adaptively transforms data into a new lower-dimensional embedding by simultaneously considering both data and hardware characteristics. It enables scalable data analysis and rapid prototyping of an arbitrary matrix-based learning task using a sparse-approximation of the collection that is constantly updated inline with the data arrival. Our work is supported by a set of user-friendly Application Programming Interfaces (APIs) that ensure automated adaptation of the proposed framework to various datasets and System on Chip (SoC) platforms including CPUs, GPUs, and FPGAs. Proof of concept evaluations using a variety of large contemporary datasets corroborate the practicability and scalability of our approach in resource-limited settings. For instance, our results demonstrate 50-fold improvement over the best known prior-art in terms of memory, energy, power, and runtime for training and execution of deep learning models in deployment of different sensing applications including indoor localization and speech recognition on constrained embedded platforms used in today's IoT enabled devices such as autonomous vehicles, robots, and smartphone.