Rice Univesrity Logo
    • FAQ
    • Deposit your work
    • Login
    View Item 
    •   Rice Scholarship Home
    • Rice University Graduate Electronic Theses and Dissertations
    • Rice University Electronic Theses and Dissertations
    • View Item
    •   Rice Scholarship Home
    • Rice University Graduate Electronic Theses and Dissertations
    • Rice University Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Code Similarity Search in a Latent Space

    Thumbnail
    Name:
    QI-DOCUMENT-2017.pdf
    Size:
    1.700Mb
    Format:
    PDF
    View/Open
    Author
    Qi, Letao
    Date
    2017-04-21
    Advisor
    Jermaine, Christopher
    Degree
    Master of Science
    Abstract
    A huge database of program source codes that supports fast search via code similarity would be useful for several applications, including automated program synthesis and debugging, and user-facing code search in an integrated development environment. Here, "similar" is defined with respect to a set of application-defined similarity functions. The key difficulty in realizing this goal is that standard database indexing techniques cannot be applied to the problem of querying based on arbitrary similarity functions. To address this difficulty, I propose a dictionary-based approach where I represent each piece of code by a vector of similarities to a set of example database codes. Cosine similarity between the vector representing a query code and the vector representing a database code can be used to measure closeness. However, the dictionary may need to be very high dimensional if the goal is to accurately index a wide variety of database codes. Hence, I explore the idea of using projection matrix to the reduce dimensionality of the problem. One approach is to use random projection. The other approach that I explore is learning the projection matrix by developing a machine learning algorithm that is supervised using the text/code pairs provided by StackOverflow, a question-answering website for programmers.
    Keyword
    code similarity search; latent space
    Citation
    Qi, Letao. "Code Similarity Search in a Latent Space." (2017) Master’s Thesis, Rice University. http://hdl.handle.net/1911/96022.
    Metadata
    Show full item record
    Collections
    • Rice University Electronic Theses and Dissertations [12052]

    Home | FAQ | Contact Us
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
     

     

    Searching scope

    Browse

    Entire ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeThis CollectionBy Issue DateAuthorsTitlesSubjectsType

    My Account

    Login

    Statistics

    View Usage Statistics

    Home | FAQ | Contact Us
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892