Structured Logging

As developers many of us come across with the logging mechanism where we use logging frameworks like apache log4j, commons logging, slf4j etc… We knew that log means “a stream of messages generated from running application”. The way we put the log statements in the code that should be understandable by humans. That is, the log messages which we get are “unstructured”. We have to use grep, awk or scripts with regular expressions to get the intended information out of the log files. Most of the times we will include the log messages to understand the exceptions/errors from the system. But, there are more use cases where we can get more information from the logs. The use cases includes, User behavior, Security auditing, Analytics, application monitoring (Which will enable the user to get alerts of weird application behavior). There is another scenario where human cannot go to each server instance and see the logs, if we have 1000’s of server instances in the cluster. We should have a centralized log system where machine can process the logs and provide us the insights.

To achieve the above said functionality from the logs, we have to feed the log messages to machine where machine will understand the log messages and generates insights to the user. To make machine to understand the log messages we need to follow “structured logging”. For example, the unstructured logging will look like below.

The above log message is the concatenated string with the required variables. The output will be like “User xyz is logged in from Firefox browser”. It’s totally unstructured. Understanding it and providing insights by using a machine is trivial. The same unstructured log message can be converted into structured like below.

{“User logged in”,”username”:”xyz”,”browser”:”Firefox”}

In the above message “User logged in” is an event and the data is paired with key value. The machine can segregate the events, keys and values and then generate the report.