Background

Breakout Session

Effective Data Lineage Strategies for Real-Time Systems

Data in motion is data on a journey, and understanding that journey is a crucial part of streaming governance. Data lineage enables organizations to unravel the intricacies of real-time data flows, enhance transparency, and build trust across an event-driven landscape. Data lands on a topic, consumed and transformed, then produced to another topic, repeat. In this way, data travels through any number of topics and intermediary systems in real-time before reaching its final destination. This poses a significant hurdle; without detailed message-level tracking, chain of custody and transformations done on that data can be lost.  

  This session will examine various methodologies for incorporating data lineage into event-driven systems. It will detail the design of an opinionated, streaming-centric solution, applicable across diverse architectures. This solution provides the system with message-level tracking, reconciliation, and replay capability without introducing third-party tools. Next, it will consider how to integrate data lineage capabilities into distributed tracing as it is already widely adopted by many organizations. Finally, it will analyze off-the-shelf solutions and assess their effectiveness in satisfying the requirements of event-driven architectures.    

 Attendees will acquire a deep understanding of the challenges associated with data lineage, gaining clarity around its complex and fractured landscape. Architects will be equipped with streaming-centric strategies for building data lineage, complete with powerful message-level capabilities. This empowers them to navigate the sea of third-party tools, build a bespoke solution, or build on the back of distributed tracing.

Mark Soule

Improving