Show simple item record

dc.contributor.advisor Wickham, Hadley
dc.creatorGrolemund, Garrett
dc.date.accessioned 2013-07-24T19:29:56Z
dc.date.accessioned 2013-07-24T19:30:02Z
dc.date.available 2013-07-24T19:29:56Z
dc.date.available 2013-07-24T19:30:02Z
dc.date.created 2012-12
dc.date.issued 2013-07-24
dc.date.submitted December 2012
dc.identifier.urihttp://hdl.handle.net/1911/71653
dc.description.abstract This thesis proposes a scientific model to explain the data analysis process. I argue that data analysis is primarily a procedure to build un- derstanding and as such, it dovetails with the cognitive processes of the human mind. Data analysis tasks closely resemble the cognitive process known as sensemaking. I demonstrate how data analysis is a sensemaking task adapted to use quantitative data. This identification highlights a uni- versal structure within data analysis activities and provides a foundation for a theory of data analysis. The model identifies two competing chal- lenges within data analysis: the need to make sense of information that we cannot know and the need to make sense of information that we can- not attend to. Classical statistics provides solutions to the first challenge, but has little to say about the second. However, managing attention is the primary obstacle when analyzing big data. I introduce three tools for managing attention during data analysis. Each tool is built upon a different method for managing attention. ggsubplot creates embedded plots, which transform data into a format that can be easily processed by the human mind. lubridate helps users automate sensemaking out- side of the mind by improving the way computers handle date-time data. Visual Inference Tools develop expertise in young statisticians that can later be used to efficiently direct attention. The insights of this thesis are especially helpful for consultants, applied statisticians, and teachers of data analysis.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.subjectData analysis
Data science
Sensemaking
Grammar of graphics
Embedded plots
dc.title Tools and theory to improve data analysis
dc.contributor.committeeMember Scott, David W.
dc.contributor.committeeMember Lane, David M.
dc.date.updated 2013-07-24T19:30:02Z
dc.identifier.slug 123456789/ETD-2012-12-265
dc.type.genre Thesis
dc.type.material Text
thesis.degree.department Statistics
thesis.degree.discipline Engineering
thesis.degree.grantor Rice University
thesis.degree.level Doctoral
thesis.degree.name Doctor of Philosophy
dc.identifier.citation Grolemund, Garrett. "Tools and theory to improve data analysis." (2013) Diss., Rice University. http://hdl.handle.net/1911/71653.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record