Tests and coverage

Documentation

Command-line interface

The current --help looks like this:

usage: haproxy_log_analysis [-h] [-l LOG] [-s START] [-d DELTA] [-c COMMAND]
[-f FILTER] [-n] [--list-commands]
[--list-filters]
Analyze HAProxy log files and outputs statistics about it
optional arguments:
-h, --help show this help message and exit
-l LOG, --log LOG HAProxy log file to analyze
-s START, --start START
Process log entries starting at this time, in HAProxy
date format (e.g. 11/Dec/2013 or
11/Dec/2013:19:31:41). At least provide the
day/month/year. Values not specified will use their
base value (e.g. 00 for hour). Use in conjunction with
-d to limit the number of entries to process.
-d DELTA, --delta DELTA
Limit the number of entries to process. Express the
time delta as a number and a time unit, e.g.: 1s, 10m,
3h or 4d (for 1 second, 10 minutes, 3 hours or 4
days). Use in conjunction with -s to only analyze
certain time delta. If no start time is given, the
time on the first line will be used instead.
-c COMMAND, --command COMMAND
List of commands, comma separated, to run on the log
file. See --list-commands to get a full list of them.
-f FILTER, --filter FILTER
List of filters to apply on the log file. Passed as
comma separated and parameters within square brackets,
e.g ip[192.168.1.1],ssl,path[/some/path]. See --list-
filters to get a full list of them.
-n, --negate-filter Make filters passed with -f work the other way around,
i.e. ifthe ``ssl`` filter is passed instead of showing
only ssl requests it will show non-ssl traffic. If the
``ip`` filter isused, then all but that ip passed to
the filter will be used.
--list-commands Lists all commands available.
--list-filters Lists all filters available.

Commands

Commands are small purpose specific programs in themselves that report specific statistics about the log file being analyzed.
See the --help (or the section above) to know how to run them.

counter

Reports how many log lines could be parsed.

counter_invalid

Reports how many log lines could not be parsed.

http_methods

Reports a breakdown of how many requests have been made per HTTP method
(GET, POST…).

ip_counter

Reports a breakdown of how many requests have been made per IP.
Note that for this to work you need to configure HAProxy to capture the header that has the IP on it
(usually the X-Forwarded-For header).
Something like:
capture request header X-Forwarded-For len 20

top_ips

Reports the 10 IPs with most requests (and the amount of requests).

status_codes_counter

Reports a breakdown of how many requests per HTTP status code
(404, 500, 200, 301..) are on the log file.

request_path_counter

Reports a breakdown of how many requests per path (/rss, /, /another/path).

top_request_paths

Reports the 10 paths with most requests.

slow_requests

Reports a list of requests that downstream servers took more than 1 second to response.

counter_slow_requests

Reports the amount of requests that downstream servers took more than 1 second to response.

average_response_time

Reports the average time (in milliseconds) servers spend to answer requests.
.. note:: Aborted requests are not considered.

average_waiting_time

Reports the average time (in milliseconds) requests spend waiting on the various HAProxy queues.

server_load

Reports a breakdown of how many requests were processed by each downstream server.
Note that currently it does not take into account the backend the server is configured on.

queue_peaks

Reports a list of queue peaks.
A queue peak is defined by the biggest value on the backend queue on a series of log lines that are between log lines without being queued.

connection_type

Reports on how many requests were made on SSL and how many on plain HTTP.
This command only works if the default port for SSL (443) appears on the path.

requests_per_minute

Reports on how many requests were made per minute.
It works best when used with -s and -d command line arguments,
as the output can be huge.

print

Prints the raw lines.
This can be useful to trim down a file (with -s and -d for example) so that later runs are faster.

Filters

Filters, contrary to commands,
are a way to reduce the amount of log lines to be processed.

Note

The -n command line argument allows to reverse filters output.

This helps when looking for specific traces, like a certain IP, a path…

ip

Filters log lines by the given IP.

ip_range

Filters log lines by the given IP range
(all IPs that begin with the same prefix).

path

Filters log lines by the given string.

ssl

Filters log lines that are from SSL connections.
See :method::.HaproxyLogLine.is_https for its limitations.

slow_requests

Filters log lines that take at least the given time to get answered
(in milliseconds).

time_frame

This is an implicit filter that is used when --start, and optionally, --delta are used.
Do not use this filter on the command line, use --start and --delta instead.

Filters log lines by the amount of time the request had to wait on HAProxy queues.
If a request waited less than the given amount of time is accepted.

Installation

After installation you will have a console script haproxy_log_analysis:

$ python setup.py install

TODO

add more commands: (help appreciated)

reports on servers connection time

reports on termination state

reports around connections (active, frontend, backend, server)

your ideas here

think of a way to show the commands output in a meaningful way

be able to specify an output format. For any command that makes sense (slow
requests for example) output the given fields for each log line (i.e.
acceptance date, path, downstream server, load at that time…)

your ideas

CHANGES

2.0.2 (2016-11-17)

Improve performance for cmd_print.
[kevinjqiu]

2.0.1 (2016-10-29)

Allow hostnames to have a dot in it.
[gforcada]

2.0 (2016-07-06)

Handle unparseable HTTP requests.
[gforcada]

Only test on python 2.7 and 3.5
[gforcada]

2.0b0 (2016-04-18)

Check the divisor before doing a divison to not get ZeroDivisionError exceptions.
[gforcada]

2.0a0 (2016-03-29)

Major refactoring:

# Rename modules and classes:

haproxy_logline -> line

haproxy_logfile -> logfile

HaproxyLogLine -> Line

HaproxyLogFile -> Log

# Parse the log file on Log() creation (i.e. in its __init__)

[gforcada]

1.3 (2016-03-29)

New filter: filter_wait_on_queues.
Get all requests that waited at maximum X amount of milliseconds on HAProxy queues.
[gforcada]

0.0.3 (2014-07-09)

0.0.2 (2014-07-09)

0.0.1 (2014-07-09)

Add a way to negate the filters, so that instead of being able to filter by
IP, it can output all but that IP information.
[gforcada]

Add lots of filters: ip, path, ssl, backend, frontend, server, status_code
and so on. See --list-filters for a complete list of them.
[gforcada]

Add :method::.HaproxyLogFile.parse_data method to get data from data stream.
It allows you use it as a library.
[bogdangi]

Add --list-filters argument on the command line interface.
[gforcada]

Add --filter argument on the command line interface, inspired by
Bogdan’s early design.
[bogdangi] [gforcada]

Create a new module :module::haproxy.filters that holds all available filters.
[gforcada]

Improve :method::.HaproxyLogFile.cmd_queue_peaks output to not only show
peaks but also when requests started to queue and when they finsihed and
the amount of requests that had been queued.
[gforcada]

Show help when no argument is given.
[gforcada]

Polish documentation and docstrings here and there.
[gforcada]

Add a --list-commands argument on the command line interface.
[gforcada]

Generate an API doc for HaproxyLogLine and HaproxyLogFile.
[bogdangi]

Create a console_scripthaproxy_log_analysis for ease of use.
[bogdangi]

Add Sphinx documentation system, still empty.
[gforcada]

Keep valid log lines sorted so that the exact order of connections is kept.
[gforcada]

Add quite a few commands, see README.rst for a complete list of them.
[gforcada]

Run commands passed as arguments (with -c flag).
[gforcada]

Add a requirements.txt file to keep track of dependencies and pin them.
[gforcada]