Jetty configuration is insane! Also, given that we use jetty-runner to run Jetty, we would need to customize the classpath to add JMX support, and substantially complexify the way we run Jetty. I'm not sure it is worth adding complexity.

For more context:

jetty-runner is a shaded jar containing the minimal subset of classes required to run jetty (jetty itself, javax.sevlet.*, javax.el.*, ...). This supports only a minimal Jetty configuration. To enable JMX reporting, we would need to add at least jetty-jmx and potentially other supporting jars, depending on the functions we want to monitor. Note that those jars are present in the blazegraph war file (and probably unused, not a great idea), but the war is loaded in a webapp specific classloader and not available from Jetty itself.

Instrument Blazegraph

Fairly trivial to implement. Instrumenting at servlet webapp level is a common way to collect metric independently from the application server.

Nginx

This would require packaging prometheus-nginx-exporter, and writing the appropriate integration in puppet. It would only measure traffic going through nginx, so we would be missing traffic related to updater. Collecting metrics at nginx level could be interesting for other applications as well.

Existing Blazegraph metrics

Blazegraph exposes the number of running queries (BigdataRDFContext.getQueries() through the StatusServlet. It does not seem to expose the same metrics through the usual CountersServlet that we use to collect statistics. The CountersServlet exposes some similar metrics (/ Journal / Concurrency Manager / Read Service / Average Active Count for example) which are based on a moving average (exponentially decaying) of the number of active thread in the various journal executor services. It also exposes / Query Engine / operatorActiveCount, which is based on the number of running ChunkedRunningQuery.

Conclusion

We should probably start by collecting the various active counters already exposed by Blazegraph, and see if the data collected makes any sense, and if it correlates with high load. Once we have a better understanding, we can refine.