This quickstart guide is provides a hands-on look at how you can move data into and out of Kafka without writing
a single line of code. It is helpful to review the concepts for Kafka Connect in tandem
with running the steps in this guide to gain a deeper understanding. At the end of this quickstart you will be able
to:

To demonstrate the basic functionality of Kafka Connect and its integration with the Confluent Schema Registry, we’ll
run a few local standalone Kafka Connect processes running connectors to let you insert data written to a file
into Kafka and write data from a Kafka topic to the console. For those that choose to use JSON as the Connect data format
please see the instructions here for a tutorial
that does not include the Schema Registry.

We require two configuration files to startup a FileStreamSourceConnector to read data from a file and output it to Kafka.
This tutorial will run the Connect worker in standalone mode so it will act as a lone agent. Details on standalone versus
distributed mode for Kafka Connect workers can be found in the concepts section.
Because we will run in standalone mode, we need a worker configuration file and a connector configuration file. First, the
worker configuration file which is located at ./etc/schema-registry/connect-avro-standalone.properties. Below is an explanation
of the contents:

# Sample configuration for a standalone Kafka Connect worker that uses Avro serialization and# integrates the the Schema Registry. This sample configuration assumes a local installation of# Confluent Platform with all services running on their default ports.# Bootstrap Kafka servers. If multiple servers are specified, they should be comma-separated.
bootstrap.servers=localhost:9092
# The converters specify the format of data in Kafka and how to translate it into Connect data.# Every Connect user will need to configure these based on the format they want their data in# when loaded from or stored into Kafka
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
# The internal converter used for offsets and config data is configurable and must be specified,# but most users will always want to use the built-in default. Offset and config data is never# visible outside of Connect in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false# Local storage file for offset data
offset.storage.file.filename=/tmp/connect.offsets

If choosing to use this tutorial without the Schema Registry, you would need to specify key.converter and
value.converter to use org.apache.kafka.connect.json.JsonConverter and remove the schema registry references.
Note that the values for internal.key.converter and internal.value.converter should be left alone in most cases.

The second file is the connector configuration file specific to the FileStreamSourceConnector. This file is located
at ./etc/schema-registry/connect-file-source.properties and the contents with an explanation are below:

# User defined connector instance namename=local-file-source
# The class implementing the connector
connector.class=FileStreamSource
# Maximum number of tasks to run for this connector instance
tasks.max=1# The input file (path relative to worker's working directory)# This is the only setting specific to the FileStreamSourcefile=test.txt
# The output topic in Kafkatopic=connect-test

Now let’s seed the file with some sample data. Note that the connector configuration specifies a relative path for the
file, so you should create the file in the same directory that you will run the Kafka Connect worker from.

$ echo -e "log line 1\nlog line 2" > test.txt

Next, start a Kafka Connect instance in standalone mode running this connector. For standalone mode, we can specify
the connector configurations directly on the command line:

Each of the two lines in our log file should be delivered to Kafka, having registered a schema with the Schema Registry.
One way to validate that the data is there is to use the console consumer in another console to inspect the contents
of the topic:

Note that we use the kafka-avro-console-consumer because the data has been stored in Kafka in with an Avro format. This
consumer uses the Avro converter that is bundled with the Schema Registry in order to properly lookup the schema for the
Avro data.

After you are done experimenting with reading from a file with Connect, hit Ctrl-C to gracefully stop the Connect worker.

Now that we have written some data to a Kafka topic with Connect, let’s consume that data with a downstream process.
In this section, we will have a sink connector running in the worker in addition to the source that we ran in the last
section. The sink will write messages to a local file. Below is the connector
configuration with an explanation for the FileStreamSink to write data to a file. This file is located at
./etc/kafka/connect-file-sink.properties.

# User defined name for the connector instancename=local-file-sink
# Name of the connector class to be run
connector.class=FileStreamSink
# Max number of tasks to spawn for this connector instance
tasks.max=1# Output file name relative to worker's current working directory# This is the only property specific to the FileStreamSink connectorfile=test.sink.txt
# Comma separate input topic listtopics=connect-test

Note that the configuration contains similar settings to the file source. A key difference is that multiple input
topics are specified with topics whereas the file source allows for only one output topic specified with topic.

Now start the Kafka Connect standalone process, but this time specify both connector configurations. They will run in
the same process, but each will have its own dedicated thread.

Once the process is up and running, you should see the two log lines written to the file test.sink.txt as the sink connector
consumes them. Note that the messages were not written again by the source connector because it was able to resume from
the same point in the file where it left off when we shut down the previous process.

With both connectors running, we can see data flowing end-to-end in real time. Use another terminal to tail the output file
and one more to add more lines to the text file:

$ tail -f test.sink.txt

$ echo -e "log line 3\nlog line 4" >> test.txt

You should see the additional lines output to test.sink.txt. The new data was picked up by the
source connector, written to Kafka, read by the sink connector from Kafka, and finally output to the file.

Both source and sink connectors can track offsets, so you can start and stop the process any number of times and add more
data to the input file and both will resume where they previously left off.

The connectors demonstrated in this quickstart are intentionally simple so no additional dependencies are necessary.
Most connectors will require a bit more configuration to specify how to connect to the source or sink system and what
data to copy, and for many you will want to execute on a Kafka Connect cluster for scalability and fault tolerance.
To get started with you’ll want to see the user guide for more details on running and managing
Kafka Connect, including how to run in distributed mode. The Connectors section includes
details on configuring and deploying the connectors that ship with Confluent Platform.

Tip

The easiest way to create, configure, and manage connectors is with Confluent Control Center. To
learn more about Control Center, see Introduction.