Reliable Insights

It's common to want reports from Prometheus, such as how many requests failed over an entire month.

While PromQL has some calendar functions, it's designed more for doing math over arbitrary fixed time periods rather than time periods that vary over time due to business logic. Which is to say that different months have different numbers of days it's not possible to do monthly reporting directly in PromQL. However this is easy with a small bit of Python scripting:

This is intended to be run at the start of the next month local time (though all your servers run in UTC, right?), and will return the amount of CPU time each job used in the previous month. To do this we need to know when the month ended as we want to evaluate the query then, and how long the month was as that's the range to use. This will produce a result like:

This is obviously a very simple example to get you going. In reality you'd probably want to work from aggregated data via recording rule, and write this information to a database of some form for prosperity.