Skip to main content

Getting Started with Data Prepper

Data Prepper is an independent component that converts data for use with Circonus.

If you are migrating from Open Distro Data Prepper, visit the Migrating from Open Distro page.

1. Installing Data Prepper

There are two ways to install Data Prepper:

  1. Run the Docker image.
  2. Build from source.

The easiest way to use Data Prepper is by running the Docker image. We suggest you use this approach if you have Docker available.

You can pull the Docker image:

docker pull opensearchproject/data-prepper:latest

2. Configuring Data Prepper

You must configure Data Prepper with a pipeline before running it.

You will configure two files:

  • data-prepper-config.yaml
  • pipelines.yaml

3. Defining a pipeline

Create a Data Prepper pipeline file, pipelines.yaml, with the following configuration:

workers: 2
delay: "5000"
- stdout:

4. Running Data Prepper

Run the following command with your pipeline configuration YAML.

docker run --name data-prepper \
-v /full/path/to/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml \

This sample pipeline configuration above demonstrates a simple pipeline with a source (random) sending data to a sink (stdout). For more examples and details on more advanced pipeline configurations, see Pipelines.

After starting Data Prepper, you should see log output and some UUIDs after a few seconds:

2021-09-30T20:19:44,147 [main] INFO - Data Prepper server running at :4900
2021-09-30T20:19:44,681 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:45,183 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:45,687 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:46,191 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:46,694 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:47,200 [random-source-pool-0] INFO - Writing to buffer
2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO - simple-test-pipeline Worker: Processing 6 records from buffer

The remainder of this page provides examples for running Data Prepper from the Docker image.

However you configure your pipeline, you will run Data Prepper the same way. You run the Docker image and supply both the pipelines.yaml and data-prepper-config.yaml files.

For Data Prepper 2.0 or later, use this command:

docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest

For Data Prepper before version 2.0, use this command:

docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x

Once Data Prepper is running, it will process data until it is shut down. Once you are done, shut it down with the following command:

curl -X POST http://localhost:4900/shutdown

Additional configurations

For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from config/ in the application's home directory. By default, it uses in the shared-config directory.

For Data Prepper 1.5 or earlier, optionally add "-Dlog4j.configurationFile=config/" to the command if you would like to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper will default to the file in the shared-config directory.