Getting Started with Apache Flink: Essential Patterns and Best Practices
Breakout Session
This session provides a comprehensive introduction to Apache Flink for developers and architects who seek to build streaming solutions that are resilient, efficient, and maintainable. I will move through three critical layers of Flink development:
1. Establish a solid foundation based on well-engineered data products
You will learn best practices for:
Managing formats and schemas for the long term.
Ensuring data integrity and implementing error handling.
Working with streams of immutable records vs. streams with updates.
Handling the nuances of watermarking and late-data strategies.
2. Compose solutions from event streaming patterns
Rather than writing monolithic scripts, I will show you how to decompose complex problems using reusable components based on these design patterns:
Deduplication: removing duplicate events
Correlation: linking related events across streams (e.g., orders and their shipments)
Aggregation: computing real-time analytics
Enrichment: adding context to events from reference data
Pattern matching: detecting sequences or anomalies in event streams
3. Insist on operational excellence
Finally, I ground the technical theory in operational reality, and discuss the fundamentals that will help ensure that your application scales without breaking the bank or the cluster. You will learn how to:
Manage state mindfully and prevent indefinite state growth.
Navigate hidden costs by understanding the trade-offs and limitations inherent in some common situations.
Guarantee quality by creating solutions that can be tested, maintained, and evolved.
Key takeaway: Whether you are a newcomer to Flink or looking to improve your existing streaming platform, you will walk away with a practical checklist and a library of patterns to build data products that are as resilient as they are performant.
David Anderson
Confluent