I want to store a bunch of documents in elasticsearch (which represent a
hit to a website) including the user agent of the client that made the
original HTTP request.

Since user agent strings have a lot of variance, and the useful parts need
parsing out (OS, browser, version etc.) I would like to be able to perform
aggregations on those extracted features.

The simplest way I can think to do this would be to analyze the user agent
string before indexing the document. The downside to this approach is as
new/different user agent strings emerge (which is not unlikely) you would
have to proactively update the parser.

This may be impossibly/undesirable for a number of reasons, but what I'd
really like to do is index the raw user agent string and then perform the
analysis/feature extraction post-hoc at query time. Any ideas/pointers on
how to do this?

Aggregators? Custom analyzers? (How would you handle an update to the
analyzer, would you need to re-run against all existing stored data?)

I want to store a bunch of documents in elasticsearch (which represent a
hit to a website) including the user agent of the client that made the
original HTTP request.

Since user agent strings have a lot of variance, and the useful parts need
parsing out (OS, browser, version etc.) I would like to be able to perform
aggregations on those extracted features.

The simplest way I can think to do this would be to analyze the user agent
string before indexing the document. The downside to this approach is as
new/different user agent strings emerge (which is not unlikely) you would
have to proactively update the parser.

This may be impossibly/undesirable for a number of reasons, but what I'd
really like to do is index the raw user agent string and then perform the
analysis/feature extraction post-hoc at query time. Any ideas/pointers on
how to do this?

Aggregators? Custom analyzers? (How would you handle an update to the
analyzer, would you need to re-run against all existing stored data?)

I want to store a bunch of documents in elasticsearch (which represent a
hit to a website) including the user agent of the client that made the
original HTTP request.

Since user agent strings have a lot of variance, and the useful parts need
parsing out (OS, browser, version etc.) I would like to be able to perform
aggregations on those extracted features.

The simplest way I can think to do this would be to analyze the user agent
string before indexing the document. The downside to this approach is as
new/different user agent strings emerge (which is not unlikely) you would
have to proactively update the parser.

This may be impossibly/undesirable for a number of reasons, but what I'd
really like to do is index the raw user agent string and then perform the
analysis/feature extraction post-hoc at query time. Any ideas/pointers on
how to do this?

Aggregators? Custom analyzers? (How would you handle an update to the
analyzer, would you need to re-run against all existing stored data?)

I want to store a bunch of documents in elasticsearch (which represent a
hit to a website) including the user agent of the client that made the
original HTTP request.

Since user agent strings have a lot of variance, and the useful parts need
parsing out (OS, browser, version etc.) I would like to be able to perform
aggregations on those extracted features.

The simplest way I can think to do this would be to analyze the user agent
string before indexing the document. The downside to this approach is as
new/different user agent strings emerge (which is not unlikely) you would
have to proactively update the parser.

This may be impossibly/undesirable for a number of reasons, but what I'd
really like to do is index the raw user agent string and then perform the
analysis/feature extraction post-hoc at query time. Any ideas/pointers on
how to do this?

Aggregators? Custom analyzers? (How would you handle an update to the
analyzer, would you need to re-run against all existing stored data?)