Rice Univesrity Logo
    • FAQ
    • Deposit your work
    • Login
    View Item 
    •   Rice Scholarship Home
    • Faculty & Staff Research
    • George R. Brown School of Engineering
    • Computer Science
    • Computer Science Technical Reports
    • View Item
    •   Rice Scholarship Home
    • Faculty & Staff Research
    • George R. Brown School of Engineering
    • Computer Science
    • Computer Science Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Manetho: Fault Tolerance in Distributed Systems Using Rollback-Recovery and Process Replication

    Thumbnail
    Name:
    TR93-212.pdf
    Size:
    5.176Mb
    Format:
    PDF
    View/Open
    Author
    Elnozahy, Elmootazbellah
    Date
    October 1993
    Abstract
    This dissertation presents a new protocol that allows rollback-recovery and process replication to co-exist in a distributed system. The protocol relies on a novel data structure called the antecedence graph, which tracks the nondeterministic events during failure-free operation and provides information for recreating them if a failure occurs. The rollback-recovery part of the protocol combines the low failure-free overhead of optimistic rollback-recovery with the advantages of pessimistic rollback-recovery, namely fast output commit, limited rollback, and failure-containment. The process replication part of the protocol features anew multicast protocol designed specifically to support process replication. Unlike previous work, the new protocol provides high throughput and low latency in message delivery without relying on the application semantics. The protocol has been implemented in the Manetho prototype. Experience with a number of long-running, compute-intensive parallel applications confirms the performance advantages of the new protocol. The implementation also features several performance optimizations that are applicable to other rollback-recovery and multicast protocols.
    Description
    This work was also published as a Rice University thesis/dissertation: http://hdl.handle.net/1911/19117
    Citation
    Elnozahy, Elmootazbellah. "Manetho: Fault Tolerance in Distributed Systems Using Rollback-Recovery and Process Replication." (1993) https://hdl.handle.net/1911/96435.
    Type
    Technical report
    Citable link to this page
    https://hdl.handle.net/1911/96435
    Rights
    You are granted permission for the noncommercial reproduction, distribution, display, and performance of this technical report in any format, but this permission is only for a period of forty-five (45) days from the most recent time that you verified that this technical report is still available from the Computer Science Department of Rice University under terms that include this permission. All other rights are reserved by the author(s).
    Metadata
    Show full item record
    Collections
    • Computer Science Technical Reports [245]

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map

     

    Searching scope

    Browse

    Entire ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsTypeThis CollectionBy Issue DateAuthorsTitlesSubjectsType

    My Account

    Login

    Statistics

    View Usage Statistics

    Home | FAQ | Contact Us | Privacy Notice | Accessibility Statement
    Managed by the Digital Scholarship Services at Fondren Library, Rice University
    Physical Address: 6100 Main Street, Houston, Texas 77005
    Mailing Address: MS-44, P.O.BOX 1892, Houston, Texas 77251-1892
    Site Map