Background

Lightning Talk

Streaming Entity Resolution for Kafka with Quine

"These two records are really the same thing!" Whether it's misspelled company names, or users who sign up twice with different email addresses, or bank accounts controlled by the same shadowy figure, the data received in a Kafka stream doesn't always resolve to reality we're trying to represent. This problem has been slowing down streaming adoption across the industry—from the world's largest banks, to the small teams trying to deploy their first streaming data pipeline. But cleaning that data quickly has gotten easier recently with new open source streaming graph tools which can perform powerful entity resolution on streaming data at scale while it's still in motion, and even if it's out of order.

This lightning talk will highlight two approaches to real-time entity resolution on streaming data using the Quine streaming graph. We'll look at how to view your stream as a graph and why that's the key to: use event-triggered "standing queries" for real-time entity-resolution in graphs, and use the history of a stream to unlock AI-powered entity resolution with graph neural networks. In each case, a Kafka stream with messy data comes in, and a Kafka stream with clean "entity-resolved" data comes out.

Ryan Wright

thatDot

Download