Better Together: Combining Sketching and Sampling for Effective Stream Processing
Monitoring large data streams and maintaining statistics about them is a challenging task, revolving around the tradeoff triangle between memory frugality, computational complexity, and accuracy. The two common approaches for addressing these problems are sketching and sampling. In this talk, I will present a couple of examples of how an effective combination of the two can yield better results than either of them.
The first example is NitroSketch, a generic framework that boosts the performance of all sketches that employ multiple counter arrays, including, e.g., the famous count-min sketch, count-sketch, and Univmon. NitroSketch systematically addresses the performance bottlenecks of sketches without sacrificing robustness and generality. Its key contribution is the careful synthesis of rigorous, yet practical solutions to reduce the number of per-packet CPU and memory operations. NitroSketch is implemented on three popular software switch platforms (Open vSwitch-DPDK, FD.io-VPP, and BESS). Our performance evaluation shows that accuracy is comparable to unmodified sketches while attaining up to two orders of
magnitude speedup, and up to 45% reduction in CPU usage.
The second example is SQUAD, a novel algorithm for tracking quantiles (e.g., tail latencies) of significant items within a stream, where an item can be the source IP + destination IP addresses in a networking application, a URI or a user ID in a web service, or an object ID in a key-value store. While quantile sketches have been studied in the past, naively applying one instance of such sketches to each item is very memory wasteful. Similarly, applying sampling alone also requires prohibitive amounts of memory. In contrast, SQUAD addresses this problem by combining sampling and sketching in a way that improves the asymptotic space complexity. Intuitively, SQUAD allocates a sketch only to items identified as likely to be significant and uses a background sampling process to capture the behavior of the quantiles of an item before it is allocated with a sketch. This allows SQUAD to use fewer samples and sketches. An empirical evaluation demonstrates SQUAD’s superiority using extensive simulations on real-world traces.
* Based on joint works with Ran Ben-Basat, Vladimir Braverman, Gil Einziger, Yaron Kassner, Zaoxing Liu, Vyas Sekar, and Rana Shahout