distributed database management systems
Elasticity, caching, storage disaggregation
Cloud systems are influencing the architecture of distributed database management systems (DBMSs) in significant ways. One important dimension is elastic provisioning: cloud deployments enable adding and removing servers of a distributed DBMS cluster on demand. The second dimension is resource disaggregation, and in particular storage disaggregation: large cloud storage systems are hosted by a dedicated set of servers, which are different from the ones performing computation. These two architectural trends are at odds with the shared-nothing model adopted by traditional DBMSs, where there is a static cluster of servers executing queries on locally-attached storage.
My previous work developed elastic DBMSs that autonomously add or remove servers to the cluster and migrate data to adapt to changing workload volume and skew. In particular, I developed DBMSs that support On-Line Transactional Processing (OLTP) workloads and ACID transactions. In these systems, it is important to store data accessed by the same transaction on the same server as much as possible. When this requirement is violated, the DBMS must execute distributed transactions, which are very costly. My work extended DBMSs with low-overhead fine-grained monitoring components and with controllers that use this monitoring information to determine a reconfiguration plan.
I developed several systems to address these issues. The Accordion system divides the database tuples into small chunks and uses Mixed-Integer Linear Programming to elastically migrate them when needed while minimizing distributed transactions. The E-Store system performs finer-grained monitoring and migration. It uses two types of monitoring. During regular execution, it monitors aggregate metrics that are cheap to obtain. When regular monitoring detects an overload, it turns on fine-grained monitoring, which is more expensive, for a short interval of time. This is sufficient to detect workload skew. E-Store, however, does not monitor for co-accesses among tuples, assuming instead that the database has a tree schema and transactions mostly access tuples in the same subtree. The Clay system removes this assumption and monitors tuple co-accesses online. It shows that fine-grained monitoring can be lightweight but still capture sufficient information to build a graph of co-accesses among tuples. Clay then uses this graph to migrate data while keeping frequently co-accessed data together. The P-Store system uses workload prediction to proactively reconfigure the system before it is overloaded. A proactive approach is preferable to a reactive one because reconfigurations are themselves costly, so they should be completed before the system starts becoming overloaded.
More recently, I worked on cloud databases and on how they can make effective use of disaggregated storage, including by pushing computation to the storage layer. PushDownDB explores the use of S3 select for pushing down computation. FlexPushDownDB combines data caching and computation pushdown.
References
2021
-
FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMSProceedings of the VLDB Endowment, 2021
2020
-
PushdownDB: Accelerating a DBMS using S3 computationIn Proceedings of the IEEE 36th International Conference on Data Engineering (ICDE), 2020
2019
2018
-
P-store: An Elastic Database System with Predictive ProvisioningIn Proceedings of the 2018 ACM International Conference on Management of Data (SIGMOD), 2018
2016
-
Clay: Fine-Grained Adaptive Partitioning for General Database SchemasProceedings of the VLDB Endowment, 2016
2014
-
Accordion: Elastic Scalability for Database Systems Supporting Distributed TransactionsProceedings of the VLDB Endowment, 2014
-
E-store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing SystemsProceedings of the VLDB Endowment, 2014