Rice Univesrity Logo
    • FAQ
    • Deposit your work
    • Login
    View Item 
    •   Rice Scholarship Home
    • Faculty & Staff Research
    • George R. Brown School of Engineering
    • Computer Science
    • Computer Science Technical Reports
    • View Item
    •   Rice Scholarship Home
    • Faculty & Staff Research
    • George R. Brown School of Engineering
    • Computer Science
    • Computer Science Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Synchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors

    Thumbnail
    Name:
    TR09-8.pdf
    Size:
    170.6Kb
    Format:
    PDF
    View/Open
    Author
    Youssefi, Annahita
    Date
    November 14, 2009
    Abstract
    The potential for higher performance from increasing on-chip transistor densities, on the one hand, and the limitations in instruction-level parallelism of sequential applications and in the scalability of increasingly complicated superscalar and multithreaded architectures, on the other, are leading the microprocessor industry to embrace chip multi-processors as a cost-effective solution for the general-purpose computing market. Multicore processors allow manufacturers to integrate larger numbers of simpler processing cores onto the same chip, thereby shortening design time and costs. They provide higher throughput for multi-programmed workloads by enabling simultaneous processing of independent jobs, and can improve the performance of parallel applications by exploiting thread-level parallelism. Additionally, the individual cores used might be superscalar or multithreaded, thereby exploiting more finely-grained levels of parallelism as well. While many design alternatives exist for multicore processors, one common choice is sharing the lower levels of the on-chip memory hierarchy among multiple processing cores. Although larger, shared caches cause higher access latencies and more complex logic, they provide a larger aggregate pool and reduce duplicate cache lines, thereby generally reducing capacity misses. However, sharing the cache can also negatively impact performance when the cache use behaviors of concurrent processes interfere with each other. Thus a good balance can be achieved from combining small, private first-level caches for fast, contention free access with large, shared lower-level on-chip caches for flexible workload tolerance. Performance on multicore processors is impacted by many of the same factors that impact performance on other shared-memory parallel architectures. However, the tighter coupling of on-chip resources changes some of the cost ratios that influence the design of parallel algorithms. Shared-cache multicore architectures introduce the potential for cheap inter-core communication, synchronization, and data sharing. They also introduce greater potential for cache contention. One alternative to the data-parallel programming model is pipelining a computation across multiple processors, effectively treating the processors as high-level vector units. Allen and Kennedy discuss pipelined parallelism in the context of the do across loop [7]. Vadlamani and Jenks refer to this method as the Synchronized Pipelined Parallelism Model [12]. In this paper, we examine the opportunities a shared-cache multicore processor presents for pipelined parallelism. Using the dual-core shared-cache Intel Core Duo architecture as our experimental setting, we first analyze inter-core synchronization costs using a simple synchronization micro benchmark. Then we evaluate a pipelined parallel version of Recursive Prismatic Time Skewing [6] on a 2D Gauss-Seidel kernel benchmark. RPTS is an optimization technique for iterative stencil computations that increases temporal locality by skewing spatial domains across a time domain and blocking in both domains. In the next subsection, we introduce our experimental setting. In Section 2, we discuss background issues including factors impacting performance in shared-cache architectures, the effects of cache-sharing contention on application performance, and the synchronized pipelined parallelism approach to programming for shared-cache environments. In Section 3, we present a simple synchronization micro benchmark and analyze its performance on the Intel Core Duo. In Section 4, we discuss optimization techniques for iterative stencil computations, then analyze their parallelization for a shared-cache multicore context. Finally, we present experimental results from a pipelined parallel implementation of RPTS in Section 5.
    Citation
    Youssefi, Annahita. "Synchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors." (2009) https://hdl.handle.net/1911/96380.
    Type
    Technical report
    Citable link to this page
    https://hdl.handle.net/1911/96380
    Rights
    You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
    Metadata
    Show full item record
    Collections
    • Computer Science Technical Reports [245]

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map

     

    Searching scope

    Browse

    Entire ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeThis CollectionBy Issue DateAuthorsTitlesSubjectsType

    My Account

    Login

    Statistics

    View Usage Statistics

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map