The Kafka Protocol Deconstructed: A Live-Coded Deep Dive

Breakout Session

Kafka powers the real-time data infrastructure of countless organizations, but how many of us really understand the magic behind its speed and reliability? What makes a Kafka broker capable of handling millions of events per second while ensuring durability, ordering, and scalability? And why do features like idempotent producers, log compaction, and consumer group rebalance work the way they do?

In this deep-dive live-coding session, we’ll dissect Kafka down to its essence and rebuild a minimal, but fully functional, broker from scratch. Starting with a raw TCP socket, we’ll implement:

- Kafka’s Binary Wire Protocol: decode Fetch and Produce requests, frame by frame

- Log-Structured Storage: the secret behind Kafka’s append-only performance

- Batching & Compression: How Kafka turns thousands of messages into one efficient disk write

- Consumer Coordination: Group rebalances, offset tracking, and the challenges of "who reads what?"

- Replication & Fault Tolerance: why ISR (In-Sync Replicas) is needed for high availability

- Idempotence & Exactly-Once Semantics: the hidden complexity behind "no duplicates"

Along the way, we’ll expose Kafka’s design superpowers and its tradeoffs, while contrasting our minimal implementation with the real Kafka’s added layers (KRaft, SASL, quotas, etc.).

By the end, you won’t just use Kafka, you’ll understand it. Whether you’re debugging a production issue, tuning performance, or just curious about distributed systems, this session will change how you see Kafka.

Key Takeaways:

- How Kafka’s protocol works

- The role of log-structured storage in real-time systems

- Why replication and consumer coordination are harder than they look

- Where the real Kafka adds complexity

No prior Kafka internals knowledge needed, just a love for distributed systems and live coding.


Mateo Rojas

LittleHorse