An In-Network Replica Selection Framework for Latency-Critical Distributed Data Stores

Published in IEEE Transactions on Cloud Computing, Early Access, 2020, ISSN 2168-7161, 2020

Download paper here

Recommended Citation: Yi Su, Dan Feng, Yu Hua, Zhan Shi, Tingwei Zhu, An In-Network Replica Selection Framework for Latency-Critical Distributed Data Stores, IEEE Transactions on Cloud Computing, Early Access, 2020, Pages 1-1, ISSN 2168-7161, https://doi.org/10.1109/TCC.2020.2976008.

Abstract: In distributed data stores, performance fluctuations generally occur across servers, especially when the servers are deployed in a cloud environment. Hence, the replica selected for a reading request will directly affect the response latency. However, replica selection is challenging in latency-critical data stores (e.g. key-value stores). Such data stores generally deal with small size data, and clients have to select replicas independently. Even the state-of-the-art algorithm of replica selection still has considerable room for improving the response latency. In this paper, we first present the fundamental factors that prevent replica selection algorithms from being effective. Then, we address these factors by proposing NetRS, a framework that enables in-network replica selection for distributed data stores. NetRS exploits emerging network devices, including programmable switches and network accelerators, to select replicas for requests. NetRS supports diverse algorithms of replica selection and is suited to the network topology of modern data centers. According to our extensive evaluations, compared with the conventional scheme of clients selecting replicas for requests, NetRS reduces the mean latency by up to 50.3%, and the 99th latency by up to 69.7%. Moreover, NetRS could effectively cut the response latency even when unexpected events (e.g. workload changes, network device failures) occur.