Breakout Session

A Survey of Fake Data Generation Tools for Data Streaming Applications: Stop Writing Your Own

< All 2024 Sessions

Dummy data is commonly used for testing applications. For small, functional tests, developers can quickly and easily mock up something close to what they expect. When it comes to data streaming applications, however, generating realistic streams of data that test all the functionality in an application means choosing from one of a crop of data stream generating tools, each with their own nuances, strengths, and limitations. So how does one choose?

In this session, you’ll first be introduced to the tools available to you for generating streams of data, such as Faker, kafka-connect-datagen, and the built in Kafka producer and consumer perf test clients. We’ll take stock of the different parameters you can tweak for each, like number of connections, requests, and messages. We will examine how well they synthesize data given a schema as well as their ability to generate data in specified traffic patterns. Then, we’ll dive in and explore the types of use cases where each of these technologies excel, going far beyond simply generating data for a test case. You’ll learn which tools are best for performance or load testing, if they can handle generating bursts of pseudo-random data to simulate specific traffic needs, and which ones synthesize data closer to your actual data streams.

By the end of the talk, you’ll have a good understanding of the current generation of tools for data stream generation and know how to best leverage them for your specific use case.

Afzal Mazhar

Confluent

Download