Rice Univesrity Logo
    • FAQ
    • Deposit your work
    • Login
    View Item 
    •   Rice Scholarship Home
    • Rice University Graduate Electronic Theses and Dissertations
    • Rice University Electronic Theses and Dissertations
    • View Item
    •   Rice Scholarship Home
    • Rice University Graduate Electronic Theses and Dissertations
    • Rice University Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Mining Massive-Scale Time Series Data using Hashing

    Thumbnail
    Name:
    LUO-DOCUMENT-2017.pdf
    Size:
    1.742Mb
    Format:
    PDF
    View/Open
    Author
    Luo, Chen
    Date
    2017-05-09
    Advisor
    Shrivastava, Anshumali
    Degree
    Master of Science
    Abstract
    Similarity search on time series is a frequent operation in large-scale data-driven applications. Sophisticated similarity measures are standard for time series matching, as they are usually misaligned. Dynamic Time Warping or DTW is the most widely used similarity measure for time series because it combines alignment and matching at the same time. However, the alignment makes DTW slow. To speed up the expensive similarity search with DTW, branch and bound based pruning strategies are adopted. However, branch and bound based pruning are only useful for very short queries (low dimensional time series), and the bounds are quite weak for longer queries. Due to the loose bounds branch and bound pruning strategy boils down to a brute-force search. To circumvent this issue, we design SSH (Sketch, Shingle, & Hashing), an efficient and approximate hashing scheme which is much faster than the state-of-the-art branch and bound searching technique: the UCR suite. SSH uses a novel combination of sketching, shingling and hashing techniques to produce (probabilistic) indexes which align (near perfectly) with DTW similarity measure. The generated indexes are then used to create hash buckets for sub-linear search. Empirical results on two large-scale benchmark time series data show that our proposed method prunes around 95% time series candidates and can be around 20 times faster than the state-of-the-art package (UCR suite) without any significant loss in accuracy.
    Keyword
    Time Series; Searching; Data Mining; Machine Learning
    Citation
    Luo, Chen. "Mining Massive-Scale Time Series Data using Hashing." (2017) Master’s Thesis, Rice University. https://hdl.handle.net/1911/96124.
    Metadata
    Show full item record
    Collections
    • Rice University Electronic Theses and Dissertations [13409]

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map

     

    Searching scope

    Browse

    Entire ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeThis CollectionBy Issue DateAuthorsTitlesSubjectsType

    My Account

    Login

    Statistics

    View Usage Statistics

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map