Getting Started with Apache Flink: Essential Patterns and Best Practices

Breakout Session

This session provides a comprehensive introduction to Apache Flink for developers and architects who seek to build streaming solutions that are resilient, efficient, and maintainable. I will move through three critical layers of Flink development:

1. Establish a solid foundation based on well-engineered data products

You will learn best practices for:

Managing formats and schemas for the long term.

Ensuring data integrity and implementing error handling.

Working with streams of immutable records vs. streams with updates.

Handling the nuances of watermarking and late-data strategies.

2. Compose solutions from event streaming patterns

Rather than writing monolithic scripts, I will show you how to decompose complex problems using reusable components based on these design patterns:

Deduplication: removing duplicate events

Correlation: linking related events across streams (e.g., orders and their shipments)

Aggregation: computing real-time analytics

Enrichment: adding context to events from reference data

Pattern matching: detecting sequences or anomalies in event streams

3. Insist on operational excellence

Finally, I ground the technical theory in operational reality, and discuss the fundamentals that will help ensure that your application scales without breaking the bank or the cluster. You will learn how to:

Manage state mindfully and prevent indefinite state growth.

Navigate hidden costs by understanding the trade-offs and limitations inherent in some common situations.

Guarantee quality by creating solutions that can be tested, maintained, and evolved.

Key takeaway: Whether you are a newcomer to Flink or looking to improve your existing streaming platform, you will walk away with a practical checklist and a library of patterns to build data products that are as resilient as they are performant.


David Anderson

Confluent