There are additional content modifiers that can provide protocol-specific
capabilities at the application layer. More information can be found at
Payload Keywords These keywords make sure the signature checks only
specific parts of the network traffic. For instance, to check specifically on
the request URI, cookies, or the HTTP request or response body, etc.

All HTTP keywords are modifiers. Note the difference between content modifiers
and sticky buffers. See Modifier Keywords for more information. As a
refresher:

‘content modifiers’ look back in the rule, e.g.:

alerthttpanyany->anyany(content:"index.php";http_uri;sid:1;)

‘sticky buffers’ are placed first and all keywords following it apply to that buffer, for instance:

It is important to understand the structure of HTTP requests and
responses. A simple example of a HTTP request and response follows:

HTTP request

GET/index.htmlHTTP/1.0\r\n

GET is a request method. Examples of methods are: GET, POST, PUT,
HEAD, etc. The URI path is /index.html and the HTTP version is
HTTP/1.0. Several HTTP versions have been used over the years; of
the versions 0.9, 1.0 and 1.1, 1.0 and 1.1 are the most commonly used
today.

HTTP response

HTTP/1.0200OK\r\n<html><title>somepage</title></HTML>

In this example, HTTP/1.0 is the HTTP version, 200 the response status
code and OK the response status message.

Another more detailed example:

Request:

Response:

Request:

Although cookies are sent in an HTTP header, you can not match on them
with the http_header keyword. Cookies are matched with their own
keyword, namely http_cookie.

Each part of the table belongs to a so-called buffer. The HTTP
method belongs to the method buffer, HTTP headers to the header buffer
etc. A buffer is a specific portion of the request or response that
Suricata extracts in memory for inspection.

All previous described keywords can be used in combination with a
buffer in a signature. The keywords distance and within are
relative modifiers, so they may only be used within the same
buffer. You can not relate content matches against different buffers
with relative modifiers.

With the http_method content modifier, it is possible to match
specifically and only on the HTTP method buffer. The keyword can be
used in combination with all previously mentioned content modifiers
such as: depth, distance, offset, nocase and within.

With the http_uri and the http_raw_uri content modifiers, it
is possible to match specifically and only on the request URI
buffer. The keyword can be used in combination with all previously
mentioned content modifiers like depth, distance, offset,
nocase and within.

The uri has two appearances in Suricata: the raw_uri and the
normalized uri. The space for example can be indicated with the
heximal notation %20. To convert this notation in a space, means
normalizing it. It is possible though to match specific on the
characters %20 in a uri. This means matching on the raw_uri. The
raw_uri and the normalized uri are separate buffers. So, the raw_uri
inspects the raw_uri buffer and can not inspect the normalized buffer.

With the http_header content modifier, it is possible to match
specifically and only on the HTTP header buffer. This contains all of
the extracted headers in a single buffer, except for those indicated
in the documentation that are not able to match by this buffer and
have their own content modifier (e.g. http_cookie). The modifier
can be used in combination with all previously mentioned content
modifiers, like depth, distance, offset, nocase and
within.

With the http_cookie content modifier, it is possible to match
specifically and only on the cookie buffer. The keyword can be used in
combination with all previously mentioned content modifiers like
depth, distance, offset, nocase and within.

Note that cookies are passed in HTTP headers, but are extracted to a
dedicated buffer and matched using their own specific content
modifier.

The http_user_agent content modifier is part of the HTTP request
header. It makes it possible to match specifically on the value of the
User-Agent header. It is normalized in the sense that it does not
include the _”User-Agent: “_ header name and separator, nor does it
contain the trailing carriage return and line feed (CRLF). The keyword
can be used in combination with all previously mentioned content
modifiers like depth, distance, offset, nocase and
within. Note that the pcre keyword can also inspect this
buffer when using the /V modifier.

Normalization: leading spaces are not part of this buffer. So
“User-Agent: rn” will result in an empty http_user_agent buffer.

The http_user_agent buffer will NOT include the header name,
colon, or leading whitespace. i.e. it will not include
“User-Agent: “.

The http_user_agent buffer does not include a CRLF (0x0D
0x0A) at the end. If you want to match the end of the buffer, use a
relative isdataat or a PCRE (although PCRE will be worse on
performance).

If a request contains multiple “User-Agent” headers, the values will
be concatenated in the http_user_agent buffer, in the order
seen from top to bottom, with a comma and space (“, “) between each
of them.

Example request:

GET/test.htmlHTTP/1.1User-Agent:SuriTester/0.8User-Agent:GGGG

http_user_agent buffer contents:

SuriTester/0.8,GGGG

Corresponding PCRE modifier: V

Using the http_user_agent buffer is more efficient when it
comes to performance than using the http_header buffer (~10%
better).

Inspect the start of a HTTP request or response. This will contain the
request/response line plus the request/response headers. Use flow:to_server
or flow:to_client to force inspection of request or response.

With the http_client_body content modifier, it is possible to
match specifically and only on the HTTP request body. The keyword can
be used in combination with all previously mentioned content modifiers
like distance, offset, nocase, within, etc.

Example of http_client_body in a HTTP request:

Example of the purpose of http_client_body:

Note: how much of the request/client body is inspected is controlled
in the libhtp configuration section via the request-body-limit
setting.

With the http_stat_code content modifier, it is possible to match
specifically and only on the HTTP status code buffer. The keyword can
be used in combination with all previously mentioned content modifiers
like distance, offset, nocase, within, etc.

With the http_stat_msg content modifier, it is possible to match
specifically and only on the HTTP status message buffer. The keyword
can be used in combination with all previously mentioned content
modifiers like depth, distance, offset, nocase and
within.

With the http_server_body content modifier, it is possible to
match specifically and only on the HTTP response body. The keyword can
be used in combination with all previously mentioned content modifiers
like distance, offset, nocase, within, etc.

Note: how much of the response/server body is inspected is controlled
in your libhtp configuration section via the response-body-limit
setting.

Using http_server_body is similar to having content matches
that come after file_data except that it doesn’t permanently
(unless reset) set the detection pointer to the beginning of the
server response body. i.e. it is not a sticky buffer.

http_server_body will match on gzip decoded data just like
file_data does.

Since http_server_body matches on a server response, it
can’t be used with the to_server or from_client flow
directives.

The http_host and http_raw_host buffers are populated
from either the URI (if the full URI is present in the request like
in a proxy request) or the HTTP Host header. If both are present, the
URI is used.

The http_host and http_raw_host buffers will NOT
include the header name, colon, or leading whitespace if populated
from the Host header. i.e. they will not include “Host: “.

The http_host and http_raw_host buffers do not
include a CRLF (0x0D 0x0A) at the end. If you want to match the end
of the buffer, use a relative ‘isdataat’ or a PCRE (although PCRE
will be worse on performance).

The http_host buffer is normalized to be all lower case.

The content match that http_host applies to must be all lower
case or have the nocase flag set.

http_raw_host matches the unnormalized buffer so matching
will be case-sensitive (unless nocase is set).

If a request contains multiple “Host” headers, the values will be
concatenated in the http_host and http_raw_host
buffers, in the order seen from top to bottom, with a comma and space
(“, “) between each of them.

With file_data, the HTTP response body is inspected, just like
with http_server_body. The file_data keyword works a bit
differently from the normal content modifiers; when used in a rule,
all content matches following it in the rule are affected (modified)
by it.

Example:

alerthttpanyany->anyany(file_data;content:"abc";content:"xyz";)

The file_data keyword affects all following content matches, until
the pkt_data keyword is encountered or it reaches the end of the
rule. This makes it a useful shortcut for applying many content
matches to the HTTP response body, eliminating the need to modify each
content match individually.

As the body of a HTTP response can be very large, it is inspected in
smaller chunks.

If the HTTP body is a flash file compressed with ‘deflate’ or ‘lzma’,
it can be decompressed and file_data can match on the decompress data.
Flash decompression must be enabled under libhtp configuration:

If a HTTP body is using gzip or deflate, file_data will match
on the decompressed data.

Negated matching is affected by the chunked inspection. E.g.
‘content:!”<html”;’ could not match on the first chunk, but would
then possibly match on the 2nd. To avoid this, use a depth setting.
The depth setting takes the body size into account.
Assuming that the response-body-minimal-inspect-size is bigger
than 1k, ‘content:!”<html”; depth:1024;’ can only match if the
pattern ‘<html’ is absent from the first inspected chunk.