Breakout Session
A common pattern in stream processing applications is to enrich incoming events with contextual information. For example, a fraud detection pipeline might enrich incoming transactions with information about the customer, the merchant, previous transactions, etc, as part of building up a feature vector to hand off to a fraud detection model.
Apache Flink® offers a bewildering array of possibilities for how to do enrichment. With the SQL/Table API, a temporal join, a lookup join, or an AsyncScalarFunction can all be good choices, depending on the details of your use case, while the DataStream API offers KeyedCoProcessFunctions, BroadcastProcessFunctions, AsyncFunctions, and more. Which enrichment method you choose can be affected by where the enrichment data lives, the size of that dataset, and how often it changes, among other things.
In this talk we’ll bring structure to this landscape. You’ll learn the most useful techniques for implementing streaming enrichment with Flink, and see concrete examples for where each technique is appropriate. To keep you on track, we’ll then explore some pitfalls you may encounter, and offer best practice coping strategies for dealing with those challenges. Along the way you’ll learn something about Flink SQL, the DataStream API, and tricky corner cases with watermarks.