Phone: 413 577 0354
Office: CS 348
Fall ‘19: Systems for Data Science (COMPSCI 532)
Fall ‘18: Systems for Data Science (COMPSCI 590S)
I am currently active in two main areas:
Graph mining algorithms such as pattern matching, frequent subgraph mining, motif counting, or finding dense subgraphs, are powerful tools to extract interesting subgraphs from a large graph. They are very important for areas such as social networks, semantic web, and bioinformatics. However, they have high computational complexity, which is mainly due to the need for enumerating a huge number of subgraphs of the input graph.
Arabesque is a system that helps programmers implement graph mining algorithms by abstracting away subgraph enumeration. It offers a simple functional API where applications just have to specify a boolean filter function, which takes a subgraph as input and returns whether the subgraph is interesting. Arabesque also transparently distributes and optimizes the enumeration process across multiple servers running Hadoop. It keeps all its state in memory in order to avoid the cost of random accesses to disk. After the appearance of Arabesque, developing generic systems for scalable graph mining has become a new exciting area of research that has seen several recent developments from different research groups across the world.
Graph search is a different problem than graph mining: in this case, we have a specific pattern of interest (i.e., a query graph) and we want to find all of its instances in a large data graph. In QFrag, we implement graph search by partitioning computation, not data: every worker has a complete copy of the data graph, like in Arabesque. This enables using state-of-the-art sequential subgraph isomorphism algorithms for distributed graph search. QFrag introduces a technique called task fragmentation to balance load while preserving the sequential nature of the subgraph isomorphism algorithms it uses.
Arabesque and QFrag are open source systems running on top of Spark. Check out the project homepage at arabesque.io
Cloud computing platforms gives the opportunity to reduce the hardware cost of running a database management system by dynamically add and remove servers from a distributed cluster according to load changes. In order to leverage this flexibility, applications need to be designed to be elastic, i.e., to transfer data and computation to and from servers whenever it is needed. Elastic databases abstract away the complexity of elasticity from applications. They are able to detect changes in load on-line and devise a data and load migration plan that can accommodate these changes. I have worked on four seminal systems for database elasticity. Accordion partitions the database in small blocks and transfers them as needed based on online monitoring information. E-Store uses a two-tiered approach to identify and move hot tuples at very high granularity. Clay extends E-Store to support arbitrary transactional access patterns involving multiple tuples. All these systems use a reactive approach, where reconfigurations are triggered whenever a workload change is detected. P-Store uses workload prediction to start reconfigurations ahead of time. This avoids reconfiguring the system at peak load.
Distributed stream processing system allow running user-defined operators on an input stream of key-value pairs. Operators can also produce tuples that form the input stream for other operators. Each input stream is partitioned on multiple sub-streams based on the keys of the tuples. For scalability, sub-streams can be assigned to different physical servers running the same operator. Input streams can be very skewed so it is important to balance load carefully by assigning the right sub-streams to the right operators. However, it is often impossible to predict which key will be more expensive than others. In this project, we identify online mechanisms to detect hot keys and balance load even in presence of very high skew. We first used a technique called partial key grouping (aka “the power of both choices”), which involves replicating each operator instance on two servers and selecting the right replica dynamically on a tuple-by-tuple basis. The technique has been integrated in the main release of Apache Storm. Our second technique deals with settings with a large number of servers, where balancing load is more difficult and more than two choices may be needed.
Both choices: Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, David García-Soriano, Nicolas Kourtellis, Marco Serafini, “The Power of Both Choices: Practical Load Balancing for Distributed Stream Processing Engines”. IEEE Int. Conf. on Data Engineering (ICDE) 2015.
N choices: Muhammad Anis Uddin Nasir, Gianmarco De Francisci Morales, Nicolas Kourtellis and Marco Serafini. “When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing.” IEEE Int. Conf. on Data Engineering (ICDE), 2016
Prior to joining QCRI, Marco worked at Yahoo! Research in Barcelona where he studied the problem of protecting large scale distributed systems from hardware data corruption, which is expected to becoming more and more frequent in future generations of CPUs because of reduced geometry and voltage levels. Unreliable commodity servers are typically used to store data that is critical from an infrastructure viewpoint (e.g., configuration metadata), personal viewpoint (e.g., emails), or financial viewpoint (e.g., ad clicks and billing information). Critical systems like Google’s Mesa, for example, detect corruption errors through application-specific replication techniques. I proposed a new formal model for Arbitrary State Corruption (ASC), which captures the essential aspects of data corruption faults in an application-independent manner. I used the model to develop PASC, a runtime library that can provably detect data corruption errors in any event-based distributed system without using replication. SEI makes PASC more practical by supporting multiple threads, substantially eliminating memory overhead, and making it easier to transparently harden existing applications.
Zookeeper is a popular open-source system use to store metadata and to coordinate distributed systems. It is used by many products and open-source projects. While at Yahoo! Research, I also worked on the Zookeeper Atomic Broadcast (Zab) protocol, which is at the heart of Zookeeper. People often have a hard time understanding the difference between active replication protocols like Paxos, which exchange operations, and passive ones like Zab, which exchange state updates. I introduced the notion of barrier to provide a simple discrimination criteria.
At Yahoo! I also developed social piggybacking, a technique that uses graph mining techniques on social graphs to increase the throughput of social event feed systems.
I received my PhD at TU Darmstadt. My thesis introduced the concept of Eventual Linearizability, a correctness condition for systems that are usually strongly consistent and degrade consistency only when necessary. It also introduced Scrooge, a fast Byzantine fault tolerant algorithm running on a small set of replicas.
Eventual linearizability: M. Serafini, D. Dobre, M. Majuntke, P. Bokor and N. Suri, “Eventually Linearizable Shared Objects”. ACM Symp on Principles of Distributed Computing (PODC), 2010.
Applications of EL: Marco Serafini and Flavio Junqueira. “Weak Consistency as a Last Resort”. Workshop on Large-Scale Distributed Systems and Middleware (LADIS) 2010.
Scrooge: M. Serafini, P. Bokor, D. Dobre, M. Majuntke, and N. Suri, “Scrooge: Reducing the cost of fast Byzantine replication in presence of unresponsive replicas”. IEEE Int. Conf. on Dependable Systems and Networks (DSN), 2010.
Program Committee member: SIGMOD 2020, VLDB 2019 (Demo), SIGMOD 2019 (Industry), DSN 2019, DASFAA 2019, VLDB 2018, ICDE 2018, APSys 2018, OPODIS 2018, SOSP 2017, VLDB 2017, Eurosys 2017, WWW 2017, DSN 2017, IC2E 2017, APSys 2017, Eurosys 2016, ICDCS 2016, ICDE 2016, SRDS 2106, IC2E 2016, SIGMOD 2016 demo, ICDCS 2015, WWW 2015, SRDS 2015, OPODIS 2014, SSS 2014, Middleware 2012 (Industry), EDCC 2012, SOFSEM 2011.
Program Committee chair: LADIS 2018 workshop (Large-Scale Distributed Systems and Middleware), colocated with PODC 2018; PaPoC 2015 workshop (Principles and Practice of Consistency for Distributed Data), colocated with Eurosys 2015.
Workshop Chair: IEEE Symp. on Reliable Distributed Systems 2019.
Award Committee member: Eurosys Roger Needham PhD Thesis Award 2019.
My PhD thesis was nominated for the “Best PhD Thesis of 2010” by the German, Swiss and Austrian Computer Science societies and the German chapter of ACM for PhD thesis titled “Efficient and Low-Cost Fault-Tolerance for Web-Based Systems”.