distributed database management systems

Cloud systems are influencing the architecture of distributed database management systems (DBMSs) in significant ways. One important dimension is elastic provisioning: cloud deployments enable adding and removing servers of a distributed DBMS cluster on demand. The second dimension is resource disaggregation, and in particular storage disaggregation: large cloud storage systems are hosted by a dedicated set of servers, which are different from the ones performing computation. These two architectural trends are at odds with the shared-nothing model adopted by traditional DBMSs, where there is a static cluster of servers executing queries on locally-attached storage.

My previous work developed elastic DBMSs that autonomously add or remove servers to the cluster and migrate data to adapt to changing workload volume and skew. In particular, I developed DBMSs that support On-Line Transactional Processing (OLTP) workloads and ACID transactions. In these systems, it is important to store data accessed by the same transaction on the same server as much as possible. When this requirement is violated, the DBMS must execute distributed transactions, which are very costly. My work extended DBMSs with low-overhead fine-grained monitoring components and with controllers that use this monitoring information to determine a reconfiguration plan.

I developed several systems to address these issues. The Accordion system divides the database tuples into small chunks and uses Mixed-Integer Linear Programming to elastically migrate them when needed while minimizing distributed transactions. The E-Store system performs finer-grained monitoring and migration. It uses two types of monitoring. During regular execution, it monitors aggregate metrics that are cheap to obtain. When regular monitoring detects an overload, it turns on fine-grained monitoring, which is more expensive, for a short interval of time. This is sufficient to detect workload skew. E-Store, however, does not monitor for co-accesses among tuples, assuming instead that the database has a tree schema and transactions mostly access tuples in the same subtree. The Clay system removes this assumption and monitors tuple co-accesses online. It shows that fine-grained monitoring can be lightweight but still capture sufficient information to build a graph of co-accesses among tuples. Clay then uses this graph to migrate data while keeping frequently co-accessed data together. The P-Store system uses workload prediction to proactively reconfigure the system before it is overloaded. A proactive approach is preferable to a reactive one because reconfigurations are themselves costly, so they should be completed before the system starts becoming overloaded.

More recently, I worked on cloud databases and on how they can make effective use of disaggregated storage, including by pushing computation to the storage layer. PushDownDB explores the use of S3 select for pushing down computation. FlexPushDownDB combines data caching and computation pushdown.

distributed database management systems

References

2021

2020

2019

2018

2016

2014