Now showing items 106-125 of 245

    • Implementing a Static Debugger for a First-Order Functional Programming Language 

      Felleisen, Matthias; Steckler, Paul A. (2001-04)
      A static debugger assists a programmer in finding potential errors in programs. The key to a static debugger is set-based analysis (SBA). Many authors have described formulations of SBA, but leave open gaps among that theory, its implementation, and its use for a particular purpose. An implementation needs to confront these practical issues. While ...
    • Implementing linear algebra algorithms on high performance architectures 

      Aleksandrov, Lyudmil; Candev, Michael; Djidjev, Hristo N. (1997-07-25)
      In this paper we consider the data distribution and data movement issues related to the solution of the basic linear algebra problems on high performance systems. The algorithms we discuss in details are the Gauss andGauss-Jordan methods for solving a system of linear equations, the Cholesky's algorithm for LL^T-factorization, and QR-factorization ...
    • Implementing the Top-Down Close Algorithm on the TI 6200 Architecture 

      Dasgupta, Anshuman (2002-12-12)
      Partitioned register-set architectures pose a challenge to standard scheduling algorithms. To create an efficient schedule, an instruction scheduler for such an architecture must consider the location of an operand in the register file, the availability of the inter-cluster data bus, and the profitability of a inter-cluster copy instruction. This ...
    • Implicitly Heterogeneous Multi-stage Programming 

      Eckhardt, Jason; Kaiabachev, Roumen; Pašalić, Emir; Swadi, Kedar; Taha, Walid (2005-04-16)
      Previous work on semantics-based multi-stage programming (MSP) language design focused on homogeneous languages designs, where the generating and the generated languages are the same. Homogeneous designs simply add a hygienic quasi-quotation and evaluation mechanism to a base language. An apparent disadvantage of this approach is that the programmer ...
    • Improving Effective Bandwidth through Compiler Enhancement of Global and Dynamic Cache Reuse 

      Ding, Chen (2000-01-21)
      While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth has increased by a factor of only 139 during the same period. Consequently, on modern machines the limited data supply simply cannot keep a CPU busy, and applications often utilize only a few percent of peak CPU performance. The hardware solution, which ...
    • Improving Memory Hierarchy Performance for Irregular Applications 

      Kennedy, Ken; Mellor-Crummey, John; Whalley, David (1999-03-10)
      The gap between CPU speed and memory speed in modern computer systems is widening as new generations of hardware are introduced. Loop blocking and prefetching transformations help bridge this gap for regular applications; however, these techniques don't deal well with irregular applications. This paper investigates using data and computation reordering ...
    • Improving Performance with Integrated Program Transformations 

      Jin, Guohua; Mellor-Crummey, John; Qasem, Apan (2004-09-09)
      Achieving a high fraction of peak performance on today’s computer systems is difficult for complex scientific applications. To do so, an application’s characteristics must be tailored to exploit the characteristics of its target architecture. Today, commercial compilers do not adequately tailor programs automatically; thus, application scientists ...
    • Improving TLB Miss Handling with Page Table Pointer Caches 

      Wu, Michael; Zwaenepoel, Willy (1997-12-16)
      Page table pointer caches are a hardware supplement for TLBs that cache pointers to pages of page table entries rather than page table entries themselves. A PTPC traps and handles most TLB misses in hardware with low overhead (usually a single memory access). PTPC misses are filled in software, allowing for an easy hardware implementation, similar ...
    • Input vector control for post-silicon leakage current minimization under manufacturing variations 

      Alkabani, Yousra; Koushanfar, Farinaz; Massey, Tammara; Potkonjak, Miodrag (2008-02-04)
      We present the first approach for post-silicon leakage power reduction through input vector control (IVC) that takes into account the impact of the manufacturing variability (MV). Because of the MV, the integrated circuits (ICs) implementing one design require different input vectors to achieve their lowest leakage states. There are two major challenges ...
    • Inside Time-based Software Transactional Memory 

      Zhang, Rui; Budimlić, Zoran; Scherer, William N., III (2007-07-06)
      We present a comprehensive analysis and experimental evaluation of time-based validation techniques for Software Transactional Memory (STM). Time-based validation techniques emerge recently as an effective way to reduce the validation overhead for STM systems. In a time-based strategy, information based on global time enables the system to avoid a ...
    • Interprocedural Pointer Analysis for C 

      Lu, John (1998-05-20)
      Many powerful code optimization techniques rely on accurate information connecting the definitions and uses of values in a program. This information is difficult to produce for programs written with pointer-based languages such as C. For values in memory locations, accurate information is difficult to obtain at call sites and pointer-based memory ...
    • Interprocedural Strength Reduction of Critical Sections in Explicitly-Parallel Programs 

      Barik, Rajkishore; Sarkar, Vivek; Zhao, Jisheng (2013-05-01)
      In this paper, we introduce novel compiler optimization techniques to reduce the number of operations performed in critical sections that occur in explicitly-parallel programs. Specifically, we focus on three code transformations: 1) Partial Strength Reduction (PSR) of critical sections to replace critical sections by non-critical sections on certain ...
    • Interprocedural Symbolic Analysis 

      Havlak, Paul (1994-05)
      Compiling for efficient execution on advanced computer architectures requires extensive program analysis and transformation. Most compilers limit their analysis to simple phenomena within single procedures, limiting effective optimization of modular codes and making the programmer’s job harder. We present methods for analyzing array side effects and ...
    • IO-Lite: A Copy-free UNIX I/O System 

      Pai, Vivek (1997-01-11)
      Memory copy speed is known to be a significant barrier to high-speed communication. We perform an analysis of the requirements for a copy-free buffer system, develop an implementation-independent applications programming interface (API) based on those requirements, and then implement a system that conforms to the API. In addition, we design and ...
    • IO-Lite: A unified I/O buffering and caching system 

      Druschel, Peter; Pai, Vivek; Zwaenepoel, Willy (1997-10-27)
      This paper presents the design, implementation, and evaluation ofIO-Lite, a unified I/O buffering and caching system. IO-Lite unifies all buffering and caching in the system, to the extent permitted by the hardware. In particular, it allows applications, interprocess communication, the file system, the file cache, and the network subsystem to share ...
    • Issues in Instruction Scheduling 

      Schielke, Philip (1998-09-15)
      Instruction scheduling is a code reordering transformation that attempts to hide latencies present in modern day microprocessors. Current applications of these microprocessors and the microprocessors themselves present new parameters under which the scheduler must operate. For example, some multiple functional unit processors have partitioned register ...
    • Iterative Data-flow Analysis, Revisited 

      Cooper, Keith D.; Harvey, Timothy J.; Kennedy, Ken (2004-03-26)
      The iterative algorithm is widely used to solve instances of data-flow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory behind the algorithm shows that, for a broad class of problems, it terminates and produces correct results. The theory also establishes a set of conditions where ...
    • Lazy Release Consistency for Distributed Shared Memory 

      Keleher, Peter (1995-01)
      A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communication requirements than previous DSM protocols, and can consequently achieve higher performance. The lazy release consistent protocols achieve this reduction ...
    • Leaky Buffer: A Novel Abstraction for Relieving Memory Pressure form Cluster Data Processing Frameworks 

      Liu, Zhaolei; Ng, T. S. Eugene (2016-03-25)
      The shift to the in-memory data processing paradigm has had a major influence on the development of cluster data processing frameworks. Numerous frameworks from the industry, open source community and academia are adopting the in-memory paradigm to achieve functionalities and performance breakthroughs. However, despite the advantages of these in ...
    • Lifetime Optimization Using Energy Allocation in Wireless Ad-hoc Networks 

      Koushanfar, Farinaz; Shamsi, Davood (2008-02-12)
      We develop energy-balancing strategies for wireless ad-hoc networks energy resource allocation and deployment. The objective is to extend the network lifetime. We find the amount of energy storage that each node requires for having a balanced energy consumption throughout the network. For a limited set of energy resources in the deployment area, we ...