distributed database management systems

Elasticity, caching, storage disaggregation

Cloud systems are influencing the architecture of distributed database management systems (DBMSs) in significant ways. One important dimension is elastic provisioning: cloud deployments enable adding and removing servers of a distributed DBMS cluster on demand. The second dimension is resource disaggregation, and in particular storage disaggregation: large cloud storage systems are hosted by a dedicated set of servers, which are different from the ones performing computation. These two architectural trends are at odds with the shared-nothing model adopted by traditional DBMSs, where there is a static cluster of servers executing queries on locally-attached storage.

My previous work developed elastic DBMSs that autonomously add or remove servers to the cluster and migrate data to adapt to changing workload volume and skew. In particular, I developed DBMSs that support On-Line Transactional Processing (OLTP) workloads and ACID transactions. In these systems, it is important to store data accessed by the same transaction on the same server as much as possible. When this requirement is violated, the DBMS must execute distributed transactions, which are very costly. My work extended DBMSs with low-overhead fine-grained monitoring components and with controllers that use this monitoring information to determine a reconfiguration plan.

I developed several systems to address these issues. The Accordion system divides the database tuples into small chunks and uses Mixed-Integer Linear Programming to elastically migrate them when needed while minimizing distributed transactions. The E-Store system performs finer-grained monitoring and migration. It uses two types of monitoring. During regular execution, it monitors aggregate metrics that are cheap to obtain. When regular monitoring detects an overload, it turns on fine-grained monitoring, which is more expensive, for a short interval of time. This is sufficient to detect workload skew. E-Store, however, does not monitor for co-accesses among tuples, assuming instead that the database has a tree schema and transactions mostly access tuples in the same subtree. The Clay system removes this assumption and monitors tuple co-accesses online. It shows that fine-grained monitoring can be lightweight but still capture sufficient information to build a graph of co-accesses among tuples. Clay then uses this graph to migrate data while keeping frequently co-accessed data together. The P-Store system uses workload prediction to proactively reconfigure the system before it is overloaded. A proactive approach is preferable to a reactive one because reconfigurations are themselves costly, so they should be completed before the system starts becoming overloaded.

More recently, I worked on cloud databases and on how they can make effective use of disaggregated storage, including by pushing computation to the storage layer. PushDownDB explores the use of S3 select for pushing down computation. FlexPushDownDB combines data caching and computation pushdown.

References

2021

  1. Flexpushdowndb: Hybrid Pushdown and Caching in a Cloud DBMS
    Yifei Yang, Matt Youill, Matthew Woicik, Yizhou Liu, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker
    Proceedings of the VLDB Endowment, 2021

2020

  1. PushdownDB: Accelerating a DBMS using S3 computation
    Xiangyao Yu, Matt Youill, Matthew Woicik, Abdurrahman Ghanem, Marco Serafini, Ashraf Aboulnaga, and Michael Stonebraker
    In 2020 IEEE 36th International Conference on Data Engineering (ICDE), 2020

2019

  1. Choosing a Cloud DBMS: Architectures and Tradeoffs
    Junjay Tan, Thanaa Ghanem, Matthew Perron, Xiangyao Yu, Michael Stonebraker, David DeWitt, Marco Serafini, Ashraf Aboulnaga, and Tim Kraska
    Proceedings of the VLDB Endowment, 2019

2018

  1. P-store: An Elastic Database System with Predictive Provisioning
    Rebecca Taft, Nosayba El-Sayed, Marco Serafini, Yu Lu, Ashraf Aboulnaga, Michael Stonebraker, Ricardo Mayerhofer, and Francisco Andrade
    In Proceedings of the 2018 International Conference on Management of Data (SIGMOD), 2018

2016

  1. Clay: Fine-Grained Adaptive Partitioning for General Database Schemas
    Marco Serafini, Rebecca Taft, Aaron J. Elmore, Andrew Pavlo, Ashraf Aboulnaga, and Michael Stonebraker
    Proceedings of the VLDB Endowment, 2016

2014

  1. Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions
    Marco Serafini, Essam Mansour, Ashraf Aboulnaga, Kenneth Salem, Taha Rafiq, and Umar Farooq Minhas
    Proceedings of the VLDB Endowment, 2014
  2. E-store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
    Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J Elmore, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker
    Proceedings of the VLDB Endowment, 2014