Show simple item record

dc.contributor.advisor Jermaine, Christopher M
dc.creatorPerez, Luis Leopoldo
dc.date.accessioned 2016-02-05T22:16:50Z
dc.date.available 2016-02-05T22:16:50Z
dc.date.created 2014-12
dc.date.issued 2014-12-03
dc.date.submitted December 2014
dc.identifier.citation Perez, Luis Leopoldo. "Query Processing and Optimization for Database Stochastic Analytics." (2014) Diss., Rice University. http://hdl.handle.net/1911/88437.
dc.identifier.urihttp://hdl.handle.net/1911/88437
dc.description.abstract The application of relational database systems to analytical processing has been an active area of research for about two decades, motivated by constant surges in the scale of the data and in the complexity of the analysis tasks. Simultaneously, stochastic techniques have become commonplace in large-scale data analytics. This work is concerned with the application of relational database systems to support stochastic analytical tasks, particularly with the query evaluation and optimization phases. In this work, three problems are addressed in the context of MCDB/SimSQL, a relational database system for uncertain data management and analytics. The first contribution is a set of efficient techniques for evaluating queries that require satisfying a probability threshold, such as "Which pending orders are estimated to be processed and shipped by the end of the month, with a probability of at least 95%?" where the processing and shipment times of each order are generated by an arbitrary stochastic process. Results show that these techniques make sensible use of resources, weeding out data elements that require relatively few samples during the early stages of query evaluation. The second problem is concerned with recycling the materialized intermediate results of a query to optimize other queries in the future. Taking the assumption that a history of past queries provides an accurate picture of the workload, I describe techniques for query optimization that evaluate the costs and benefits of materializing intermediate results, with the objective of minimizing the hypothetical costs of future queries, subject to constraints on disk space. Results show a substantial improvement over conventional query caching techniques in workload and average query execution time. Finally, this work addresses the problem of evaluating queries for stochastic generative models, specified in a high level notation that treats random variables as first-class objects and allows operations with structured objects such as vectors and matrices. I describe a notation that, relying on the syntax of comprehensions, provides a language for denoting generative models that guarantees correspondence with relational algebra expressions, and techniques for translating a model into a database schema and set of relational queries.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectComputing
Databases
Optimization
Analytics
dc.title Query Processing and Optimization for Database Stochastic Analytics
dc.contributor.committeeMember Ng, T.S. Eugene
dc.contributor.committeeMember Varman, Peter J
dc.date.updated 2016-02-05T22:16:50Z
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Computer Science
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record