Thank you for your answer!

Start sharing it with your friends and connections...

Extracting the User-Agent HTTP header from an Apache log

0

Looking at all the posts regarding User-Agent HTTP header searches, one of the commonalities is that they were told to change their format to Combined Log Format. I unfortunately cannot do that but I am still being asked to create a dashboard reports to show most common OS used and most common browser. Here is a log:

Ultimately I want separate count columns for browser type and OS type. How do I go about extracting the info I want? I believe I need to use a Regex statement, but I am unsure on how to proceed especially since both the client and browser are going to change in size?

If you want to job done right, you pretty much need an application. There is no simple way to parse a UA string. It requires either a massive lookup, or a combination of complex logic and a slightly-less-massive lookup. If you have a limited number of UA strings, your best bet is to simply enumerate them all into your own lookup, then set any others to "other" or something.

A pure regex is not going to do it alone. If you are a novice you can get some help for yourself by using the interactive field extraction creator. It is one of the options in the per-record drop down.

The difficulty is that there is no defined order or format for sub fields of the UA. I just tried myself with the following sample list culled from recent access logs for the generator to weave its magic on:

Correct. There is no way to do this just by parsing. UA strings are not strongly-specified, they are mostly suggestive. If you need great accuracy, you must use a lookup that maps known patterns to the item you want. (I mean, technically, you can probably write a regex that includes all the logic of a lookup table, but it would be an impractically enormous regex, so let's just say you can't.)