Breakout Session

Atlassian's Lithium Platform: Dynamic, Self Hosted, and Distributed Ephemeral Streaming Pipelines

< All 2024 Sessions

Atlassian has numerous use cases that require moving large amounts of data around, validating and transforming it in flight. Platforms such as Apache Flink can be excellent choices for moving and transforming data at scale - effectively streaming ETL. However our uses cases present some challenges - they require stream processing pipelines to be entirely provisioned at runtime, including dedicated Kafka topics, parallelism, selection of stream processors with appropriate and available compute resources, and our stream processors must be hostable directly in product and platform services to enable in-process access to service context in order to meet throughput and concurrency goals. In addition, our pipelines require coordination amongst sources, transforms, and sinks that exist in different product and infrastructure services. This led us to build the Lithium Platform - designed to meet these needs. It is 100% event driven and is built on Kafka and Kafka Streams.

In this session we'll cover:

- control plane and data plane communication

- dynamic Kafka topic, compute resource, and stream processor provisioning and deprovisioning

- processor versioning

- streaming data isolation

- data pipeline pause, resume, rewind, and in-flight data remediation via sidelining

- declarative workplan specifications

- reliable state management built on Kafka Streams aggregators

- auction model for selecting workplan compute resources

- custom source, transform, validation, sink, and workplan state processors

- multi-cluster and cloud cross region architecture using message relay

Attendees will learn how the Lithium Platform was built, why its unique model is so important to the Atlassian ecosystem, and how Kafka and Kafka Streams were used to bring this critical capability to life.

Robert Englander

Atlassian

Download