With the release of Logstash 1.5 we have added the ability to add metadata to an event. The difference between
regular event data and metadata is that metadata is not serialized by any outputs. This means any metadata you add is
transient in the Logstash pipeline and will not be included in the output. Using this feature, one can add custom data
to an event, perform additional filtering or add conditionals based on the metadata while the event flows through the
Logstash pipeline. This will simplify your configuration and remove the need to define temporary fields.

To access the metadata fields you can use the standard field syntax:

[@metadata][foo]

Use Cases

Lets us consider some use cases to illustrate the power of metadata. In all our use cases, will
be using the rubydebug and the stdout output to check our transformation, so make sure you are correctly defining the
output codec with the metadata option set to true.

Note: The rubydebug codec used in the stdout output is currently
the only way to see what is in
@metadata at output time.

output {
stdout {
codec => rubydebug {
metadata => true
}
}
}

Date filter

Since logs arrive in a wide variety of formats, grok is used to extract them, and the date filter to convert them to
ISO8601 and overwrite the
@timestamp field with the timestamp from the log event. It happens frequently that users
omit to remove the source timestamp field after the conversion and overwrite, though.

Here's a rough example of how the new @metadata field could be used with the date filter and prevent a temporary
timestamp field from making it into Elasticsearch:

Before Logstash 1.5, you would remove the redundant timestamp field by adding the remove_field line into the date
filter as I outlined above. Theoretically, that will be a slower operation than this one. That makes using the
@metadata field a performance booster!

The @metadata field act like a normal field and you can do all the operations or filtering on it. Use them as a scratchpad if you don't need to persist the information.

Elasticsearch output

Some plugins leverage the use of the metadata, like the elasticsearch input. It allows you to keep the document
information in a predefined
@metadata field. This information is available to various parts of the Logstash pipeline, but will not be persisted in Elasticsearch documents.

Create your own id from your event data

Out of the box, Elasticsearch provides an efficient way to create unique IDs for every documents that you are inserting. In most cases, you should let Elasticsearch generate the IDs. However, there are scenarios where you would want to generate an unique identifier in Logstash based on the content of the event. Using IDs based on event data lets Elasticsearch perform de-duplication. In our example, we will generate the IDs using the logstash-filter-fingerprint and use the default hash method (SHA1).

Like in the previous examples, we are using the fieldref syntax to access the generated_id in the @metadata hash.
The Elasticsearch output will use this value as the document id, but the intermediate variable
generated_id will not be saved as part of the _source inside Elasticsearch.
If you do a query for the specific document using the generated ID you should see a similar document showing the saved information.

Similarly, you can also use @metadata as fieldref syntax in your configuration like any other fields:

"from server: %{[@metadata][source]}%"

Conclusion

As you have seen in the examples above, the addition of metadata provides a simple, yet convenient way to store intermediate results. This makes configuration less complex -- you don't have to use remove_field explicitly. Also, we can reduce storage of unnecessary fields in Elasticsearch which helps reduce the size of your index. Metadata is a powerful addition to your Logstash toolset. Start using this feature today in your configuration!