Archives for :
sed

The server access log records all requests processed by the server. The location and content of the access log are controlled by the CustomLog directive. Of course, storing the information in the access log is only the start of log management. The next step is to analyse this information to produce useful statistics.
The principal use of awk is to break up each line of a file into ‘fields’ or ‘columns’ using a pre-defined separator. Because each line of the log file is based on the standard format we can do many things quite easily.
Using the default separator which is any white-space (spaces or tabs) we get the following:

Now that you understand the basics of breaking up the log file and identifying different elements, we can move on to more practical examples. But before we do that, we should explain how you can modify your log format and quickly extend capabilities of these simple examples.
The format argument to the LogFormat and CustomLog directives is a string. This string is used to log each request to the log file. It can contain literal characters copied into the log files and the C-style control characters “\n” and “\t” to represent new-lines and tabs. Literal quotes and backslashes should be escaped with backslashes.
The characteristics of the request itself are logged by placing “%” directives in the format string, which are replaced in the log file by the values as follows:
%%
The percent sign

%a
Remote IP-address

%A
Local IP-address

%B
Size of response in bytes, excluding HTTP headers.

%b
Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a ‘-‘ rather than a 0 when no bytes are sent.

%{Foobar}C
The contents of cookie Foobar in the request sent to the server. Only version 0 cookies are fully supported.

%D
The time taken to serve the request, in microseconds.

%{FOOBAR}e
The contents of the environment variable FOOBAR

%f
Filename

%h
Remote host

%H
The request protocol

%{Foobar}i
The contents of Foobar: header line(s) in the request sent to the server. Changes made by other modules (e.g. mod_headers) affect this. If you’re interested in what the request header was prior to when most modules would have modified it, use mod_setenvif to copy the header into an internal environment variable and log that value with the %{VARNAME}e described above.

%k
Number of keepalive requests handled on this connection. Interesting if KeepAlive is being used, so that, for example, a ‘1’ means the first keepalive request after the initial one, ‘2’ the second, etc…; otherwise this is always 0 (indicating the initial request). Available in versions 2.2.11 and later.

%l
Remote logname (from identd, if supplied). This will return a dash unless mod_ident is present and IdentityCheck is set On.

%m
The request method

%{Foobar}n
The contents of note Foobar from another module.

%{Foobar}o
The contents of Foobar: header line(s) in the reply.

%p
The canonical port of the server serving the request

%{format}p
The canonical port of the server serving the request or the server’s actual port or the client’s actual port. Valid formats are canonical, local, or remote.

%P
The process ID of the child that serviced the request.

%{format}P
The process ID or thread id of the child that serviced the request. Valid formats are pid, tid, and hextid. hextid requires APR 1.2.0 or higher.

%q
The query string (prepended with a ? if a query string exists, otherwise an empty string)

%r
First line of request

%R
The handler generating the response (if any).

%s
Status. For requests that got internally redirected, this is the status of the *original* request — %>s for the last.

%t
Time the request was received (standard english format)

%{format}t
The time, in the form given by format, which should be in an extended strftime(3) format (potentially localized). If the format starts with begin: (default) the time is taken at the beginning of the request processing. If it starts with end: it is the time when the log entry gets written, close to the end of the request processing. In addition to the formats supported by strftime(3), the following format tokens are supported:
sec
number of seconds since the Epoch
msec
number of milliseconds since the Epoch
usec
number of microseconds since the Epoch
msec_frac
millisecond fraction
usec_frac
microsecond fraction

These tokens can not be combined with each other or strftime(3) formatting in the same format string. You can use multiple %{format}t tokens instead. The extended strftime(3) tokens are available in 2.2.30 and later.

%T
The time taken to serve the request, in seconds.

%{UNIT}T
The time taken to serve the request, in a time unit given by UNIT. Valid units are ms for milliseconds, us for microseconds, and s for seconds. Using s gives the same result as %T without any format; using us gives the same result as %D. Combining %T with a unit is available in 2.2.30 and later.

%u
Remote user (from auth; may be bogus if return status (%s) is 401)

%U
The URL path requested, not including any query string.

%v
The canonical ServerName of the server serving the request.

%V
The server name according to the UseCanonicalName setting.

%X
Connection status when response is completed:

X =
connection aborted before the response completed.

+ =
connection may be kept alive after the response is sent.

– =
connection will be closed after the response is sent.

(This directive was %c in late versions of Apache 1.3, but this conflicted with the historical ssl %{var}c syntax.)

%I
Bytes received, including request and headers, cannot be zero. You need to enable mod_logio to use this.

%O
Bytes sent, including headers, cannot be zero. You need to enable mod_logio to use this.

%{VARNAME}^ti
The contents of VARNAME: trailer line(s) in the request sent to the server.

%{VARNAME}^to
The contents of VARNAME: trailer line(s) in the response sent from the server.

List all user agents ordered by the number of times they appear
awk -F\” ‘{print $6}’ access.log | sort | uniq -c | sort -fr
Identify problems with your site
Identify problems with your site by identifying the different server responses and the requests that caused them:
awk ‘{print $9}’ access.log | sort | uniq -c | sort
The output shows how many of each type of request your site is getting. A ‘normal’ request results in a 200 code which means a page or file has been requested and delivered but there are many other possibilities.
The most common responses are:
200 – OK
206 – Partial Content
301 – Moved Permanently
302 – Found
304 – Not Modified
401 – Unauthorised (password required)
403 – Forbidden
404 – Not Found