Background

Breakout Session

Topic Federation: Enhance Kafka Availabilty with Sharded Topics Across Clusters

Powering various business verticals (e.g. ride sharing, eats, and delivery) at Uber, the Kafka messaging system is one of the most critical components of the real-time data infrastructure and its availability is of utmost importance. Intuitive approaches such as scaling (vertical or horizontal) out a cluster, and adding more replicas for topics can enhance Kafka availability to a certain degree. However, it fails to address the fundamental issue of cluster-level degradation.    

  In this talk, we will shed some light on our topic federation solution to improve Kafka Availability. Topic federation essentially provides an abstraction layer, with which, a single logical topic can be sharded into multiple physical topics across different Kafka clusters. It also consists of a novel control plane that is capable of disseminating metadata changes to clients in a near real-time manner, which allows us to perform traffic fail-over dynamically and most importantly, without user/service disruption in the presence of a cluster degradation.    

  Specifically, we will cover the following topics:  

1. Brief overview of Kafka Ecosystem at Uber and the architecture of topic federation  

2. The use cases and business needs for building topic federation    

3. Deep dive into components that make federation possible. Particularly,    

3.1 the new control plane which is capable of disseminating metadata changes to clients in near real-time,    

3.2 and the intelligent in-house clients that can make routing decisions judiciously.    

4. Share experiences in the safe rollout, challenges we faced, and practical issues that we ran into.

Qiushi Han

Uber

Xinli Shang

Uber