This comment has been minimized.

@brancz Envoy has a very large mount of plumbing already in place for stats, export, etc. It also has a very specific threading model. We also need built-in native HDR histograms in Envoy for other features. We will definitely investigate the various libraries available and figure out the right path forward before anyone starts work on this feature. (We first need to find someone who wants to build this specific feature, having HDR histogram support in Envoy is orthogonal and I will open a separate issue on that).

This comment has been minimized.

If at all possible, /metrics is the default for Prometheus. It's configurable con Prometheus side, but usually when there is native Prometheus support projects adapt this convention.

Description and type information about a metric is technically optional, however it is best practice to have, as it will allow Prometheus to do certain optimizations based on the type. A description could be helpful for display when querying (both of those things are not implemented today, but have been discussed and the consensus is that this information is good to have). However if this is problematic to add, then this can be done in follow up improvements as well.

This comment has been minimized.

Update here: Once #3130 (review) lands we will be able to trivially export histogram summaries for Prometheus, and then we can consider this issue fixed. If someone wants to sign up for doing the histogram export that would be awesome!!!

This comment has been minimized.

I'll do this eventually if nobody else has yet, but I won't have time to start on it for awhile. If/when I start working on it, I'll comment here and assign it to myself. If that hasn't happened yet, someone else can claim it.

This comment has been minimized.

One must be able to specify the bins for the export so that we can degrade the internal histogram into the form. Once that configuration is in place, I'm happy to provide pointers as to how to munge a circllhist object into those summaries.

This comment has been minimized.

@postwait, aren't the histograms already calculated and exposed over /stats and just need the Prometheus endpoints? I see the histogram data there which is why I thought maybe it was just a formatting thing to expose the interface (but I could be over simplifying).

I'm working on exposing the log linear histogram data serialized in a stats sink. Have been trying to wrangle bazel into building a grpc client to hit the existing endpoint - the default metrics service usage isn't that clear to me currently.

Decide if we are going to output standard Prometheus fixed buckets or just output the entire histogram which is also possible.

If fixed buckets, we will need to degrade the internal histogram data to the format Prometheus expects. @postwait@ramaraochavali et al can help here. This will probably involve some histogram interface additions, but nothing too complicated, as we will just be operating on the already latched histograms, not threading issues to contend with.

This comment has been minimized.

edited

👋, I managed to pick this up and do most of the plumbing in #5601 to expose buckets and plumb that through into the Prometheus output. Hopefully someone can get a chance to review it. I did initially want to get in the configurable buckets but the PR was getting quite complex as is and I wanted to make sure it was along the right lines.

(this is my first real foray in proper C++ code so please do review it with a fine tooth comb 😃, thanks!)

From my experience, it is also respected when Envoy publish metrics to statsd. That is a huge advantage, because having 1000+ clusters and connecting to only couple of them, we don't want to publish metrics for every cluster. It would be very helpful if prometheus will support that too.

If you think this is easy enough, I could try to implement this (but I have a very little C++ experience).

This comment has been minimized.

Exposing only the metrics that have been updated since the last scrape breaks Prometheus in various ways (time-series are incorrectly marked as stale; two different Prometheuses scraping would interfere with each other). Printing these metrics is not a performance concern neither is processing them in Prometheus. Besides that, it's off topic. If you want to only collect metrics from individual Envoy servers, that's just a matter of configuring Prometheus not to scrape those servers.

This comment has been minimized.

Thanks for the clarification. It still wouldn't work well with Prometheus as when a process restarts and a counter resets, there is a difference in a counter being 0 and not there at all. The first is the continuation of a time-series, the later means the time-series has ended.

This comment has been minimized.

@brancz I agree understand your point about why this could be problematic. But the lack of usedonly doesn't prevent this case from happening anyways.

For instance, something Envoy is working on is the ability to lazy-load some configuration. For example, when a connection comes in with an SNI value that hasn't been seen before, Envoy could contact the mgmt/config server and ask for config. This would cause new metrics to appear, and if Envoy were to be restarted, those metrics would disappear again until a connection caused Envoy to load that particular bit of config again.

Another related use case we hear about is configurations that are very very large, where only a small subset is expected to be used on any given Envoy instance. (Some would argue that a better control plane should be used to prevent this, but the world is complicated.) Only publishing metrics for the small portion of config that is used can hugely reduce the number of published metrics.

Given this context, do you still feel that it would always be inappropriate to have usedonly be available for prometheus metrics?

This comment has been minimized.

edited

I also believe we should add usedonly (it's actually a further addition i'm wrapping up in a separate branch after the histogram changes are merged). It drastically reduces the number of metrics (especially if you have hundreds of clusters).

FWIW: In our set up, when we scrape Envoy with Prometheus, we add a label to each metric to identify the Envoy pod it came from (so we can isolate metrics for a specific Envoy pod). If a pod restarts, it gets a new label combination. It increases the cardinality but for this use case, a usedonly would be perfect.

This comment has been minimized.

There's a big difference in metrics being hidden because they were never measured/incremented and whether this has happened since the last scrape. The latter would always be problematic for use with Prometheus, but the way I read your description now @ggreenway, it feels like the first. The first I think would be ok to have as it would be a conscious decision by a user, meaning they make a conscious decision and know that a non existent metric may mean the same as a 0 metric.

This comment has been minimized.

edited

@brancz To clarify, metrics disappearing after the last scrape is a violation and would indeed break a lot of Prometheus set ups. This isn't what's happening here. Metrics accumulate over the lifetime of the Envoy process.

The purpose of usedonly is only excluding metrics which haven't been touched/emitted even once for the lifetime of the process