fault tolerant replication

Zookeeper, Byzantine fault tolerance, eventual linearizability

My prior research investigated fundamental issues of distributed systems design, such as coordination and replication. Large-scale distributed systems made of inexpensive servers constitute the core computing infrastructure of companies like Google, Yahoo, or Facebook. Because of the large number of servers, failures of single servers are commonplace.

The bulk of my work is on fault-tolerant consensus and replication. I formalized and analyzed the correctness of the Zookeeper atomic broadcast protocol (Zab), the replication component of the Zookeeper coordination service, which is a fundamental building block of many “big data” systems, such as Apache Yarn (the resource manager of Hadoop), Apache HBase (a key-value store built on top of the Hadoop Distributed File System), Apache Kafka (the messaging source of many stream processing systems), and Apache Solr (a distributed search system). The work received the Test- of-Time Award at the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) 2021.

I also worked on the tolerance of adversarial (or Byzantine) faults, such as intrusions or network partitions. My Ph.D. thesis introduced Scrooge, a Byzantine-fault tolerant algorithm that achieves minimal performance overhead with low replication cost even in the presence of faults. In systems that need to tolerate only one Byzantine fault and any number of crashes, Scrooge uses a minimal number of replicas.

I introduced a novel fault model for hardware data corruption faults, called Arbitrary State Corruption (ASC). I developed ASC hardening, the first generic technique that provably makes crash-tolerant distributed systems tolerant to data corruption faults with low performance overhead. ASC hardening avoids paying the cost of Byzantine fault tolerant replication to tolerate non-silent but non-malicious faults.

The thesis also defined new weak consistency semantics. It formalized eventual linearizability, the correctness condition of a replicated system that degrades consistency only in the presence of network partitions.

References

2015

  1. Scalable Error Isolation for Distributed Systems
    Diogo Behrens, Marco Serafini, Flavio P Junqueira, Sergei Arnautov, and Christof Fetzer
    In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015

2012

  1. Practical Hardening of Crash-Tolerant Systems
    Miguel Correia, Daniel Gómez Ferro, Flavio P Junqueira, and Marco Serafini
    In 2012 USENIX Annual Technical Conference (USENIX ATC), 2012

2011

  1. Zab: High-Performance Broadcast for Primary-Backup Systems
    Flavio P Junqueira, Benjamin C. Reed, and Marco Serafini
    In 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), 2011

2010

  1. Scrooge: Reducing the costs of fast Byzantine replication in presence of unresponsive replicas
    Marco Serafini, Péter Bokor, Dan Dobre, Matthias Majuntke, and Neeraj Suri
    In 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), 2010
  2. Eventually linearizable shared objects
    Marco Serafini, Dan Dobre, Matthias Majuntke, Péter Bokor, and Neeraj Suri
    In Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of Distributed Computing (PODC), 2010