Sizing, Benchmarking and Performance Tuning Apache Flink Clusters

Breakout Session

A common question when adopting Apache Flink is about sizing the workload: How many CPUs, how much memory will Flink require for a particular use-case? What throughput and latency can you expect given your hardware?

We’ll kick off this talk discussing why these questions are extremely difficult to answer for a generic stream processing framework like Flink. But we won’t stop there. The best approach to answer sizing questions is to benchmark your Flink workload. We will present how we’ve set up a Flink SQL-based benchmarking environment and some benchmarking results for attendees to correlate our results with their workloads to approximate their resource requirements.

Naturally when benchmarking, the topic of performance tuning comes up: Are you optimally using the allocated resources? How to identify performance bottlenecks? What are the most common performance issues, and how to resolve them? In our case, a few configuration changes improved the throughput from 230mb/s to over 3200mb/s. How many CPU cores are needed for that in Flink? Attend the talk to find out, it's less than you would expect.

This talk is for both Flink beginners wanting to get an idea about Flink’s performance and operational behavior, as well as for advanced users looking for best practices to improve performance and efficiency.

Robert Metzger

Confluent