Publications
2025
- GRewriter: Practical Query Rewriting with Automatic Rule Set Expansion in GaussDB (to appear)Zhe Jiang, Zhaoguo Wang, Haoning Lan, Chuzhe Tang, Haoran Ding, Lefeng Wang, Zhuoran Wei, Yongcun Liu, Xiang Yu, Yang Ren, Guoliang Li, and Haibo ChenPVLDB, Volume 18, Issue 12, Aug 2025
- Sonata: Multi-Database Transactions Made Fast and SerializableChuzhe Tang, Zhaoguo Wang, Jinyang Li, and Haibo ChenPVLDB, Volume 18, Issue 10, Jun 2025
Today, the wide adoption of distributed service-oriented applications has rendered multi-database transactions increasingly important. They protect cross-service workflows that access multiple database systems from concurrency anomalies and failures. This paper presents Sonata, a new multi-database transaction system that provides high performance, global serializability, and seamless integration with existing applications and database systems. Sonata builds on the theory of commitment ordering to ensure global serializability and uses two-phase commit for atomicity and durability. Instead of treating database systems as black box storage, Sonata reuses existing database systems’ concurrency control yet refrains from exposing or modifying their internals. It performs additional non-blocking coordination only at prepare time via applicationlevel shim layers, allowing applications to incorporate Sonata without changing their existing queries or database systems. Evaluation using TPC-C shows that Sonata incurs 7.1% coordination overhead on average and outperforms prior work by up to 1114.3%.
- Many Faces of Ad Hoc TransactionsChuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, and Haibo ChenCACM, Volume 68, Issue 4, Mar 2025CACM Research Highlight and Cover Article
Transactions are the fundamental database abstraction for ensuring application correctness under concurrency and failures. Using database transactions, developers are only tasked with demarcating critical business logic into transactions, and the underlying database system is responsible for coordinating their execution. Yet, in today’s Web applications, transactions are often constructed in an ad hoc manner. That is, developers might explicitly use locking primitives or manual validation to coordinate critical code fragments, instead of relying on database transactions. We refer to such application-coordinated transactions as ad hoc transactions.
This paper presents the first comprehensive study on ad hoc transactions. Examining 91 ad hoc transaction instances taken from eight popular open-source Web applications, we find that (i) every application studied uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, coordination strategies of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone—53 of them have correctness issues, and 33 of them are confirmed by developers; and (iv) ad hoc transactions have the potential to improve performance in contentious workloads by utilizing application semantics such as access patterns. Finally, this paper concludes with a discussion about the implications of ad hoc transactions and opportunities for future research.
2024
- WeBridge: Synthesizing Stored Procedures for Large-Scale Real-World Web ApplicationsGansen Hu, Zhaoguo Wang, Chuzhe Tang, Jiahuan Shen, Zhiyuan Dong, Sheng Yao, and Haibo ChenPACMMOD, Volume 2, Issue 1, Mar 2024
Modern web applications use databases to store their data. When processing user requests, these applications retrieve and store data in the database server, which incurs network round trips. These round trips significantly increase the application’s latency. Previous approaches have attempted to reduce these round trips by prefetching query results or batching database accesses. However, neither method can efficiently reduce the latency when some queries depend on previous queries’ results. In real-world applications, nearly 50% of the queries depend on the result of other queries.This paper presents WeBridge, the first system capable of synthesizing stored procedures for large-scale real-world web applications. First, WeBridge employs concolic execution technique to analyze the applications and generate stored procedures for hot program paths. Then, it seamlessly integrates the stored procedures into the application by extending the database access library. Finally, it improves the efficiency of the stored procedures with speculative execution. Evaluation using real-world web applications and workloads show that WeBridge achieves up to 79.8% median latency reduction and up to 2× peak throughput.
- Ad Hoc Transactions through the Looking Glass: An Empirical Study of Application-Level Transactions in Web ApplicationsZhaoguo Wang, Chuzhe Tang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, and Haibo ChenACM TODS, Volume 49, Issue 1, Feb 2024A “Best of SIGMOD 2022” paper
Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transactions among eight popular open-source web applications, we found that (i) every studied application uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, concurrency control of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone—53 of them have correctness issues, and 33 of them are confirmed by developers; and (iv) ad hoc transactions have the potential for improving performance in contentious workloads by utilizing application semantics such as access patterns. Based on these findings, we discuss the implications of ad hoc transactions to the database research community.
2023
- Empowering System Software with Machine Learning Methods: Challenges, Practice, and Prospects (in Chinese)Chuzhe Tang, Zhaoguo Wang, and Haibo ChenJCRD, Volume 60, Issue 5, May 2023
Machine learning methods have brought new opportunities for building system software that fully utilizes hardware resources to support emerging applications. However, in order to adapt to the demands of various application scenarios, system software design and implementation need continuous improvement and evolution. Meanwhile, machine learning methods have the potential to extract patterns from data and automatically optimize system performance. Despite this potential, applying machine learning methods to empower system software faces several challenges, such as customizing models for system software, obtaining training data with sufficient quality and quantity, reducing the impact of model execution costs on system performance, and avoiding the hindrance of model errors on system correctness. We present the practical experience of the Institute of Parallel and Distributed Systems (IPADS) at Shanghai Jiao Tong University in applying machine learning methods to optimize system software for index structures, key-value storage systems, and concurrency control protocols. The lessons learned from the practice in model design, system integration, and practitioner knowledge are summarized. Additionally, we briefly review relevant research at home and abroad, and propose prospects and suggestions for this line of research, including collaboration between systems and machine learning experts, building modular, reusable system prototypes, and exploring model optimization techniques dedicated to systems context. The aim is to offer references and help for future work.
- Ad Hoc Transactions: What They Are and Why We Should CareChuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, and Haibo ChenSIGMOD Record, Volume 52, Issue 1, Mar 2023SIGMOD Research Highlight
Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transactions among 8 popular open-source web applications, we find that (i) every studied application uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, concurrency control of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone—53 of them have correctness issues, and 33 of them are confirmed by developers; and (iv) ad hoc transactions have the potential to improve performance in contentious workloads by utilizing application semantics such as access patterns. Finally, implications of ad hoc transactions to the database research community are discussed.
2022
- Ad Hoc Transactions in Web Applications: The Good, the Bad, and the UglyChuzhe Tang, Zhaoguo Wang, Xiaodong Zhang, Qianmian Yu, Binyu Zang, Haibing Guan, and Haibo ChenSIGMOD 2022, Philadelphia, PA, USA, Jun 2022Best Paper Honorable Mention
Many transactions in web applications are constructed ad hoc in the application code. For example, developers might explicitly use locking primitives or validation procedures to coordinate critical code fragments. We refer to database operations coordinated by application code as ad hoc transactions. Until now, little is known about them. This paper presents the first comprehensive study on ad hoc transactions. By studying 91 ad hoc transactions among 8 popular open-source web applications, we find that (i) every studied application uses ad hoc transactions (up to 16 per application), 71 of which play critical roles; (ii) compared with database transactions, concurrency control of ad hoc transactions is much more flexible; (iii) ad hoc transactions are error-prone—53 of them have correctness issues, and 33 of them are confirmed by developers; and (iv) ad hoc transactions have the potential to improve performance in contentious workloads by utilizing application semantics such as access patterns. Based on the findings, we discuss the implications of ad hoc transactions to the database research community.
- WeTune: Automatic Discovery and Verification of Query Rewrite RulesZhaoguo Wang, Zhou Zhou, Yicun Yang, Haoran Ding, Gansen Hu, Ding Ding, Chuzhe Tang, Haibo Chen, and Jinyang LiSIGMOD 2022, Philadelphia, PA, USA, Jun 2022
Query rewriting transforms a relational database query into an equivalent but more efficient one, which is crucial for the performance of database-backed applications. Such rewriting relies on pre-specified rewrite rules. In existing systems, these rewrite rules are discovered through manual insights and accumulate slowly over the years. In this paper, we present WeTune, a rule generator that automatically discovers new rewrite rules. Inspired by compiler superoptimization, WeTune enumerates all valid logical query plans up to a certain size and tries to discover equivalent plans that could potentially lead to more efficient rewrites. The core challenge is to determine which set of conditions (aka constraints) allows one to prove the equivalence between a pair of query plans. We address this challenge by enumerating combinations of "interesting" constraints that relate tables and their attributes between each pair of queries. We also propose a new SMT-based verifier to verify the equivalence of a query pair under different enumerated constraints. To evaluate the usefulness of rewrite rules discovered by WeTune, we apply them on the SQL queries collected from the 20 most popular open-source web applications on GitHub. WeTune successfully optimizes 247 queries that existing databases cannot optimize, resulting in substantial performance improvements.
- The Concurrent Learned Indexes for Multicore Data StorageZhaoguo Wang, Haibo Chen, Youyun Wang, Chuzhe Tang, and Huan WangACM TOS, Volume 18, Issue 1, Feb 2022
We present XIndex, which is a concurrent index library and designed for fast queries. It includes a concurrent ordered index (XIndex-R) and a concurrent hash index (XIndex-H). Similar to a recent proposal of the learned index, the indexes in XIndex use learned models to optimize index efficiency. Compared with the learned index, for the ordered index, XIndex-R is able to handle concurrent writes effectively and adapts its structure according to runtime workload characteristics. For the hash index, XIndex-H is able to avoid the resize operation blocking concurrent writes. Furthermore, the indexes in XIndex can index string keys much more efficiently than the learned index. We demonstrate the advantages of XIndex with YCSB, TPC-C (KV), which is a TPC-C-inspired benchmark for key-value stores, and micro-benchmarks. Compared with ordered indexes of Masstree and Wormhole, XIndex-R achieves up to 3.2× and 4.4× performance improvement on a 24-core machine. Compared with hash indexes of Intel TBB HashMap, XIndex-H achieves up to 3.1× speedup. The performance further improves by 91% after adding the optimizations on indexing string keys. The library is open-sourced.
2020
- SIndex: A Scalable Learned Index for String KeysYouyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo ChenAPSys 2020, Tsukuba, Japan, Aug 2020
The learned index structures have reshaped our perspectives on the design of traditional data structures. With machine learning (ML) techniques, they can achieve better lookup performance than existing indexes. However, current learned indexes primarily focus on integer-key workloads and failed to efficiently index variable-length string keys. We introduce SIndex, a concurrent learned index specialized in variable-length string key workloads. To reduce the cost of model inference and data accesses, SIndex groups keys with shared prefixes and use each key’s unique part for model training. We evaluate SIndex with both real-world and synthesized datasets. The result shows that SIndex can achieve up to 91% better performance compared with other state-of-the-art index structures. We have open-sourced our implementation.
- Optimistic Transaction Processing in Deterministic DatabaseZhiyuan Dong, Chuzhe Tang, Jiachen Wang, Zhaoguo Wang, Haibo Chen, and Binyu ZangJCST, Volume 35, Issue 2, Mar 2020
Deterministic databases can improve the performance of distributed workload by eliminating the distributed commit protocol and reducing the contention cost. Unfortunately, the current deterministic scheme does not consider the performance scalability within a single machine. In this paper, we describe a scalable deterministic concurrency control, Deterministic and Optimistic Concurrency Control (DOCC), which is able to scale the performance both within a single node and across multiple nodes. The performance improvement comes from enforcing the determinism lazily and avoiding read-only transaction blocking the execution. The evaluation shows that DOCC achieves 8× performance improvement than the popular deterministic database system, Calvin.
- XIndex: A Scalable Learned Index for Multicore Data StorageChuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo ChenPPoPP 2020, San Diego, California, Feb 2020
We present XIndex, a concurrent ordered index designed for fast queries. Similar to a recent proposal of the learned index, XIndex uses learned models to optimize index efficiency. Comparing with the learned index, XIndex is able to effectively handle concurrent writes without affecting the query performance by leveraging fine-grained synchronization and a new compaction scheme, Two-Phase Compaction. Furthermore, XIndex adapts its structure according to run-time workload characteristics to support dynamic workload. We demonstrate the advantages of XIndex with both YCSB and TPC-C (KV), a TPC-C variant for key-value stores. XIndex achieves up to 3.2× and 4.4× performance improvement comparing with Masstree and Wormhole, respectively, on a 24-core machine, and it is open-sourced.