Accountability for distributed systems
Doctor of Philosophy
Nodes in a distributed system can fail for many reasons, such as bugs, misconfigurations, hardware failures, intrusions, or insider attacks. Once a node has become faulty, its behavior can change arbitrarily. In benign cases, the node might simply stop; in less benign cases, it might actively try to subvert the rest of the system. A reliable distributed system must have a way to handle such faults. In this thesis, we explore a novel approach to this problem, which is based on accountability. In an accountable system, each node records its past actions in a tamper-evident log, and nodes inspect each other's log for signs of misbehavior. When nodes become faulty, the other nodes can eventually detect this, and they can obtain evidence that irrefutably links the fault to a faulty node. At the same time, correct nodes can always defend themselves against any false accusations. We characterize the class of faults that can be detected with our approach, and we show that it includes any fault that causally affects at least one correct node. We also present a set of techniques for enforcing accountability, including an algorithm for tamper-evident logs, and two techniques for detecting faults in the log: One relies on state machine replay to check a node's behavior against a reference implementation, while the other checks the logs against a declarative specification of the expected behavior. Each of these techniques can be applied to a wide range of distributed systems. To demonstrate that accountability is widely applicable, we have added it to several different types of systems, including a decentralized email system, a server-based file system, a peer-to-peer content distribution system, the Internet's interdomain routing system, and two multi-player games. In each case, accountability was able to detect a variety of problems that were previously reported in the literature. This shows that accountability is very general and can supersede a number of existing defenses. Our evaluation shows that accountability is practical, that its overhead is reasonable, and that it can scale to large numbers of nodes.