Marco Serafini

Office: LGRC A335

740 N Pleasant St

Amherst, MA 01003, USA

I am an Associate Professor in the College of Information and Computer Sciences at the University of Massachusetts Amherst. I am a member of the Data systems Research for Exploration, Analytics, and Modeling (DREAM) lab.

I lead the Data Systems group, which works on systems for machine learning and data science, data management systems, and parallel and distributed systems. We focus on performance, scalability, fault tolerance, and programming abstractions. My research areas and projects are listed here.

Before joining UMass, I was with Yahoo Research and QCRI. I got my PhD from the Technical University of Darmstadt, Germany.

news

Sep 2025	I was promoted to Associate Professor with tenure. Many thanks to my students, collaborators, the department, as well as mentors and peers across institutions who have helped me along the way!
Apr 2025	GSplit published at MLSys. GSplit is a multi-GPU Graph Neural Network training system that introduces split parallelism and probabilistic splitting algorithms to reduce sampling, loading, and training overheads.
Dec 2024	The paper reporting our extensive comparison of full-graph and mini-batch GNN training systems was accepted at VLDB 2025. We found that mini-batch training systems achieve lower time-to-accuracy in all scenarios we considered and comparable accuracy. More interesting results in the paper!
Nov 2024	I gave an invited talk titled Harnessing Structure with Parallelism: Scalable Systems for GNN Training at the UC Berkeley SkyLab Seminar series, Meta, and the Microsoft Gray Systems Lab.
Jul 2024	Our paper FlexPushdownDB: Rethinking Computation Pushdown for Cloud OLAP DBMSs was accepted for publication at the VLDB Journal. We propose a new adaptive mechanism to avoid overloading the computational capacity of the storage layer during computation pushdown, and identify new opportunities for pushing down query operators.
Mar 2024	I received a new Adobe Research Collaboration Grant on using Temporal Graph Neural Networks for query prediction. Many thanks to Adobe for the continued support!
Jan 2024	Our paper GMorph: Accelerating Multi-DNN Inference via Model Fusion accepted at Eurosys’24. The paper proposes “model fusion”, a new approach to fuse multiple task-specific, pre-trained, and heterogeneous DNNs into a single multi-task model to reduce inference latency.
Aug 2023	GraphMini paper accepted at PACT’23. GraphMini speeds up graph pattern matching, a key step in graph mining, by up to on order of magnitude compared to GraphPi and Dryadic. It builds auxiliary graphs by proactively pruning the input graph during query execution time.
Oct 2022	Amazon Research Award on split-parallel graph neural network training (PI).
Jul 2022	NSF CNS Core Small grant on split-parallel graph neural network training (PI).
Aug 2021	FlexPushDownDB paper appeared at VLDB. It investigates the tradeoff between caching data at the query execution server vs. pushing computation to storage in analytical query workloads.
Jul 2021	Test-of-time award for the Zookeeper Atomic Broadcast (Zab) paper at DSN’21.
Jun 2021	Our paper on scalable graph neural network training using sampling appeared in the ACM SIGOPS Operating Systems Reviews.
Apr 2021	NextDoor paper appeared at Eurosys. NextDoor proposes pushing graph sampling to the GPU in order to significantly speed up end-to-end training time for GNNs and graph ML.
Jan 2021	Adobe Research Collaboration Grant on distributed data caching (PI).
Dec 2020	I became an ACM Senior Member.
Aug 2020	Our paper on finding optimal resource configurations on the cloud appeared at VLDB. We evaluate and compare several commonly used black-box optimization algorithms.
Aug 2020	LiveGraph paper appeared at VLDB. LiveGraph is the first graph storage system that supports transactions.
Jul 2020	Facebook Systems for ML Research Award on the NextDoor project, which pushes graph sampling to the GPU for graph machine learning (PI).
Apr 2020	PushDownDB paper appeared at ICDE. It studies the effectiveness of pushing parts of DBMS analytics queries onto the storage layer, specifically the S3 service by AWS.
Aug 2019	Our paper on choosing cloud DBMS appeared at VLDB. We discuss the tradeoffs involved in using shared-nothing vs. shared-storage designs on the cloud, considering different databases.
Mar 2019	I gave a keynote at the DataStax 2019 Product and Engineering Summit.

students

current

Md. Ashraful Islam

Juelin Liu

Hojae Son

alumni

Abhinav Jangda (with Arjun Guha) - Microsoft Research

Sandeep Polisetty (with Hui Guan) - Annapurna Labs

awards

DSN 2021 Test-of-Time Award for the paper “Zab: High-Performance Broadcast for Primary-Backup Systems”.

Nomination for the “Dissertationspreis” (Doctoral Dissertation Award) by the German, Swiss and Austrian Computer Science societies and the German chapter of ACM.

ACM Senior Member.