In Stashing Your First Event, you created a basic Logstash pipeline to test your Logstash setup. In the real world, a Logstash
pipeline is a bit more complex: it typically has one or more input, filter, and output plugins.

In this section, you create a Logstash pipeline that uses Filebeat to take Apache web logs as input, parses those
logs to create specific, named fields from the logs, and writes the parsed data to an Elasticsearch cluster. Rather than
defining the pipeline configuration at the command line, you’ll define the pipeline in a config file.

To get started, go here to
download the sample data set used in this example. Unpack the file.

Before you create the Logstash pipeline, you’ll configure Filebeat to send log lines to Logstash.
The Filebeat client is a lightweight, resource-friendly tool
that collects logs from files on the server and forwards these logs to your Logstash instance for processing.
Filebeat is designed for reliability and low latency. Filebeat has a light resource footprint on the host machine,
and the Beats input plugin minimizes the resource demands on the Logstash
instance.

In a typical use case, Filebeat runs on a separate machine from the machine running your
Logstash instance. For the purposes of this tutorial, Logstash and Filebeat are running on the
same machine.

The default Logstash installation includes the Beats input plugin. The Beats
input plugin enables Logstash to receive events from the Elastic Beats framework, which means that any Beat written
to work with the Beats framework, such as Packetbeat and Metricbeat, can also send event data to Logstash.

To install Filebeat on your data source machine, download the appropriate package from the Filebeat product page. You can also refer to
Getting Started with Filebeat in the Beats documentation for additional
installation instructions.

After installing Filebeat, you need to configure it. Open the filebeat.yml file located in your Filebeat installation
directory, and replace the contents with the following lines. Make sure paths points to the example Apache log file,
logstash-tutorial.log, that you downloaded earlier:

To keep the configuration simple, you won’t specify TLS/SSL settings as you would in a real world
scenario.

At the data source machine, run Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"

Filebeat will attempt to connect on port 5043. Until Logstash starts with an active Beats plugin, there
won’t be any answer on that port, so any messages you see regarding failure to connect on that port are normal for now.

Next, you create a Logstash configuration pipeline that uses the Beats input plugin to receive
events from Beats.

The following text represents the skeleton of a configuration pipeline:

# The # character at the beginning of a line indicates a comment. Use
# comments to describe your configuration.
input {
}
# The filter part of this file is commented out to indicate that it is
# optional.
# filter {
#
# }
output {
}

This skeleton is non-functional, because the input and output sections don’t have any valid options defined.

To get started, copy and paste the skeleton configuration pipeline into a file named first-pipeline.conf in your home
Logstash directory.

Next, configure your Logstash instance to use the Beats input plugin by adding the following lines to the input section
of the first-pipeline.conf file:

beats {
port => "5043"
}

You’ll configure Logstash to write to Elasticsearch later. For now, you can add the following line
to the output section so that the output is printed to stdout when you run Logstash:

stdout { codec => rubydebug }

When you’re done, the contents of first-pipeline.conf should look like this:

Now you have a working pipeline that reads log lines from Filebeat. However you’ll notice that the format of the log messages
is not ideal. You want to parse the log messages to create specific, named fields from the logs.
To do this, you’ll use the grok filter plugin.

The grok filter plugin is one of several plugins that are available by default in
Logstash. For details on how to manage Logstash plugins, see the reference documentation for
the plugin manager.

The grok filter plugin enables you to parse the unstructured log data into something structured and queryable.

Because the grok filter plugin looks for patterns in the incoming log data, configuring the plugin requires you to
make decisions about how to identify the patterns that are of interest to your use case. A representative line from the
web server log sample looks like this:

The IP address at the beginning of the line is easy to identify, as is the timestamp in brackets. To parse the data, you can use the %{COMBINEDAPACHELOG} grok pattern, which structures lines from the Apache log using the following schema:

Information

Field Name

IP Address

clientip

User ID

ident

User Authentication

auth

timestamp

timestamp

HTTP Verb

verb

Request body

request

HTTP Version

httpversion

HTTP Status Code

response

Bytes served

bytes

Referrer URL

referrer

User agent

agent

Edit the first-pipeline.conf file and replace the entire filter section with the following text:

filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}"}
}
}

When you’re done, the contents of first-pipeline.conf should look like this:

Save your changes. Because you’ve enabled automatic config reloading, you don’t have to restart Logstash to
pick up your changes. However, you do need to force Filebeat to read the log file from scratch. To do this,
go to the terminal window where Filebeat is running and press Ctrl+C to shut down Filebeat. Then delete the
Filebeat registry file. For example, run:

sudo rm data/registry

Since Filebeat stores the state of each file it harvests in the registry, deleting the registry file forces
Filebeat to read all the files it’s harvesting from scratch.

Next, restart Filebeat with the following command:

sudo ./filebeat -e -c filebeat.yml -d "publish"

After processing the log file with the grok pattern, the events will have the following JSON representation:

In addition to parsing log data for better searches, filter plugins can derive supplementary information from existing
data. As an example, the geoip plugin looks up IP addresses, derives geographic
location information from the addresses, and adds that location information to the logs.

Configure your Logstash instance to use the geoip filter plugin by adding the following lines to the filter section
of the first-pipeline.conf file:

geoip {
source => "clientip"
}

The geoip plugin configuration requires you to specify the name of the source field that contains the IP address to look up. In this example, the clientip field contains the IP address.

Since filters are evaluated in sequence, make sure that the geoip section is after the grok section of
the configuration file and that both the grok and geoip sections are nested within the filter section.

When you’re done, the contents of first-pipeline.conf should look like this:

Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C),
delete the registry file, and then restart Filebeat with the following command:

Now that the web logs are broken down into specific fields, the Logstash pipeline can index the data into an
Elasticsearch cluster. Edit the first-pipeline.conf file and replace the entire output section with the following
text:

output {
elasticsearch {
hosts => [ "localhost:9200" ]
}
}

With this configuration, Logstash uses http protocol to connect to Elasticsearch. The above example assumes that
Logstash and Elasticsearch are running on the same instance. You can specify a remote Elasticsearch instance by using
the hosts configuration to specify something like hosts => [ "es-machine:9092" ].

At this point, your first-pipeline.conf file has input, filter, and output sections properly configured, and looks
something like this:

Save your changes. To force Filebeat to read the log file from scratch, as you did earlier, shut down Filebeat (press Ctrl+C),
delete the registry file, and then restart Filebeat with the following command:

The date used in the index name is based on UTC, not the timezone where Logstash is running.
If the query returns index_not_found_exception, make sure that logstash-$DATE reflects the actual
name of the index. To see a list of available indexes, use this query: curl 'localhost:9200/_cat/indices?v'.

You’ve successfully created a pipeline that uses Filebeat to take Apache web logs as input, parses those logs to
create specific, named fields from the logs, and writes the parsed data to an Elasticsearch cluster. Next, you
learn how to create a pipeline that uses multiple input and output plugins.