Understanding cluster resource usage

This page explains how to use GKE usage metering to understand the
usage profiles of GKE clusters and tie usage to
individual teams or business units within your organization.
GKE usage metering has no impact on billing for your project; it allows you
to understand resource usage at a granular level.

Data is stored in BigQuery, where you can query it directly or export
it for analysis using external tools such as
Google Data Studio.

GKE usage metering is helpful for scenarios like the following:

Tracking per-tenant resource requests and actual resource consumption in a
multitenant cluster where each tenant operates within a given Namespace.

Determining the resource consumption of a workload running in a given
cluster, by assigning a unique label to the Kubernetes objects associated with
the workload.

Identifying workloads whose resource requests differ significantly from
their actual resource consumption, so that you can more efficiently allocate
resources for each workload.

Changes from the initial Beta

GKE usage metering has the following changes since its initial Beta release:

Actual resource consumption is now tracked, in addition to resource requests.
Resource consumption is tracked for clusters running v1.12.8-gke.8 and higher,
v1.13.6-gke.7 and higher, or 1.14.2-gke.8 and higher. Resource consumption
metering is stored in the gke_cluster_resource_consumption table of the
BigQuery dataset. Previously, only resource requests were tracked.

TPU requests (but not actual resource consumption) are now tracked.

You can now track resource requests and actual resource consumption on nodes
using custom machine types.

You can now enable GKE usage metering when creating or updating a cluster in
Google Cloud Platform Console.

If the BigQuery dataset is deleted, GKE usage metering now
recreates it automatically. Historical data is lost.

Upgrading

All changes are backward-compatible with the initial Beta, and data does
not need to be modified or migrated.

When you upgrade a cluster to a GKE version that support
resource consumption metering, it is not enabled automatically. You must
explicitly enable it by using the --enable-resource-consumption-metering flag.
An additional table is created in the BigQuery dataset
automatically. Both tables use the same schema.

Note: If you created a dashboard using the template provided by Google with the
initial Beta, you must create a new dashboard from the new template in order to
visualize actual resource consumption.

Creating the BigQuery dataset

To use GKE usage metering for clusters in your Google Cloud Platform project, you first
create the BigQuery dataset, and then configure clusters to use it. You
can use a single BigQuery dataset to store information about resource
usage for multiple clusters in the same project.

Visit Creating Datasets for more
details. Set the Default table expiration for the dataset to Never so that
the table doesn't expire. However, in the second GKE usage metering Beta, if a
table expires, it is recreated automatically (as an empty table).

Warning: If you delete a BigQuery dataset or table that a cluster is
using to log GKE usage metering data, Stackdriver Logging shows transient
warnings such as Failed to upload a record to BigQuery. To resolve
the warning, re-create the dataset or configure the cluster to use a different
dataset. Your historical data will be lost.

Enabling GKE usage metering for a cluster

You can enable GKE usage metering on a new or existing cluster, using
either the gcloud command or the GCP Console.

For clusters running GKE v1.12.8-gke.8 and higher
or v1.13.6-gke.7 and higher, enabling GKE usage metering also enables resource
consumption metering by default. To selectively disable resource consumption
metering while continuing to track resource requests, see the specific
instructions for enabling GKE usage metering using the gcloud command, in this
topic.

Resource consumption metering is enabled by default for clusters running
GKE v1.12.8-gke.8 and higher or v1.13.6-gke.7 and higher. To
disable it and only track resource requests, add the flag
--no-enable-resource-consumption-metering to the command above. You also need
to modify the example queries in the rest of this topic so that they do not
query for resource consumption.

If needed, the required tables are created within the BigQuery dataset
when the cluster starts.

Console

Note: When using Google Cloud Platform Console, it is not possible to enable
GKE usage metering while selectively disabling resource consumption metering. If
you need to do this, use the gcloud instructions instead.

Resource consumption metering is enabled by default for clusters running
GKE v1.12.8-gke.8 and higher or v1.13.6-gke.7 and higher. To
disable it and only track resource requests, add the flag
--no-enable-resource-consumption-metering to the command above. You also need
to modify the example queries in the rest of this topic so that they do not
query for resource consumption.

You can also change the dataset an existing cluster uses to store its usage
metering data by changing the value of the --resource-usage-bigquery-dataset
flag.

If needed, a table is created within the BigQuery dataset when the
cluster is updated.

Console

Note: When using Google Cloud Platform Console, it is not possible to enable
GKE usage metering while selectively disabling resource consumption metering. If
you need to do this, use the gcloud instructions instead.

Optional: Enabling network egress metering

By default, network egress data is not collected or exported. Measuring
network egress requires a network metering agent (NMA) running on each node. The
NMA runs as a privileged Pod, consumes some resources on the node (CPU, memory,
and disk space), and enables the
nf_conntrack_acct sysctl flag
on the kernel (for connection tracking flow accounting).

If you are comfortable with these caveats, you can enable network egress
tracking for use with GKE usage metering. To enable network egress tracking,
include the --enable-network-egress-metering option when creating or updating
your cluster, or select Enable network egress metering when enabling
GKE usage metering in the Google Cloud Platform Console.

Verifying that GKE usage metering is enabled

To verify that GKE usage metering is enabled on a cluster, and which
BigQuery dataset stores the cluster's resource usage data, use the
gcloud container clusters describe command. In the following example,

To insert your own values into the example before pasting, click any field that
is highlighted to edit its value. To reset
the values, refresh this page.

Console

Click the Edit button for the cluster you want to modify. It looks like
a pencil.

Disable GKE usage metering.

Click Save.

Choosing one or more BigQuery datastets

A dataset can hold GKE usage metering data for one or more clusters in your
project. Whether you use one or many datasets depends on your security needs:

A single dataset for the entire project simplifies administration.

A dataset per cluster allows you to delegate granular access to the datasets.

A dataset per related group of clusters allows you to find the right mix of
simplicity and granularity for your needs.

Visualizing GKE usage metering data using a Data Studio dashboard

You can visualize your GKE usage metering data using a
Data Studio dashboard,
which allows you to filter your data by cluster name, Namespace, or label, and
to adjust the reporting period dynamically. If you are an advanced user of
Data Studio and BigQuery, you can create a totally
customized dashboard, but you can also clone a dashboard that we created
specifically for GKE usage metering.

Note: You may see discrepancies between GKE usage metering data and
Cloud Billing data, due to upload latency. Batches of Cloud Billing data
take up to 5 hours to appear in BigQuery, while GKE usage metering
data appears in BigQuery roughly every hour.

You can
use the dashboard to visualize resource requests and
consumption on your clusters over time.

Prerequisites

Enable
Exporting Google Cloud Platform billing data to BigQuery
if it is not already enabled. During this process, you create a dataset, but
the table within the dataset takes up to 5 hours to appear and start to be
populated. When the table appears, its name is
gcp_billing_export_v1_[BILLING_ACCOUNT_ID].

Enable GKE usage metering on at least one cluster in the project. Note the name
you chose for the BigQuery dataset.

Gather the following information, which is needed to configure the dashboard

Cloud Billing export dataset ID and data table

GKE usage metering dataset ID

Create the Data Studio dashboard

The process for creating the datasource and dashboard has been streamlined.
After you meet the prerequisites, you use a custom
Data Studio connector to create the data source and the
dashboard automatically.

Use the Data Studio dashboard

The dashboard contains multiple reports:

Usage breakdown

Overall cluster usage ratio among all clusters sending usage metering data to
the same BigQuery data source, as well as detailed information
about resource type (such as CPU, memory, or network egress) by Namespace.
You can limit the report data to one or more individual clusters or Namespaces.

Usage breadown with unallocated resources

This report is similar to the previous one, but spreads unallocated resources
proportionally across all Namespaces. Unallocated resources include idle
resources and any resources that are not currently allocated by
GKE usage metering to specific tenants.

Cost trends - drill down by namespace

Usage trends among all clusters sending usage metering data to the same
BigQuery data source, by namespace. You can select one or
more individual clusters, Namespaces, resources, or SKUs.

Cost trends - drill down by label

Cost trends among all clusters sending usage metering data to the same
BigQuery data source. You can select one or
more individual clusters, resources, label names, or label values.

Consumption-based Metering

Consumption trends among all clusters sending usage metering data to the same
BigQuery data source. You can select one or
more individual Namespaces, label keys, or label values. This report is only
populated if resource consumption metering is enabled on at least one cluster.

Note: Because of differences in the frequency of data availability
between usage metering and Cloud Billing, data shown in the
dashboard is not definitive and is informational only.

You can change pages using the arrows near the top left of the screen. You can
change the timeframe for a page using the date picker. To share the report with
members of your organization, or to revoke access, click the Share Report
link, which looks like a person with a + symbol.

This animation illustrates each screen in the dashboard.
Your browser does not support the video tag.

After you copy the report into your project, you can customize it using the
Data Studio report editor.
Even if the report template provided by Google changes, your copy is unaffected.

Exploring GKE usage metering data using BigQuery

To view data about resource requests using BigQuery, query the
gke_cluster_resource_usage table within the relevant BigQuery dataset.

To view data about actual resource consumption, query the
gke_cluster_resource_usage_consumed table. Network egress consumption data
remains in the gke_cluster_resource_usage because there is no concept of
resource requests for egresses.

For more information about using queries in Cloud Bigtable, see
Running queries. The fields in
the schema are stable, though more fields may be added in the future.

These queries are simple examples. Customize your query to find the data you need.

Query for resource requests

To insert your own values into the example before pasting, click any field that
is highlighted to edit its value. To reset
the values, refresh this page.

Query for resource consumption

More examples

Expand the following sections to see more sophisticated examples.

How to query costs, broken down by Namespace

These queries ignore a cluster's resource usage when the billing
information of the associated cloud resource has not yet been exported to the
GCP billing export dataset. This happens when the time window
of a cloud resource usage record is ahead of the latest record in the exported
GCP billing data (the latency for billing export can be up to 5
hours).

GKE usage metering schema in BigQuery

The following table describes the schema for the GKE usage metering tables in the
BigQuery dataset. If your cluster is running a version of
GKE that supports resource consumption metering as well as
resource requests, an additional table is created with the same schema.

Field

Type

Description

cluster_location

STRING

The name of the Compute Engine zone or region in which the
GKE cluster resides.

cluster_name

STRING

The name of the GKE cluster.

namespace

STRING

The Kubernetes namespace from which the usage is generated.

resource_name

STRING

The name of the resource, such as "cpu", "memory", and "storage".

sku_id

STRING

The SKU ID of the underlying GCP cloud resource.

start_time

TIMESTAMP

The UNIX timestamp of when the usage began.

end_time

TIMESTAMP

The UNIX timestamp of when the usage ended.

fraction

FLOAT

The fraction of a cloud resource used by the usage. For a dedicated cloud
resource that is solely used by a single namespace, the fraction is always
1.0. For resources shared among multiple namespaces, the fraction is
calculated as the requested amount divided by the total capacity of the
underlying cloud resource.

cloud_resource_size

INTEGER

The size of the underlying GCP resource. For example, the
size of vCPUs on a n1-standard-2 instances is 2.

labels.key

STRING

The key of a Kubernetes label associated with the usage.

labels.value

STRING

The value of a Kubernetes label associated with the usage.

project.id

STRING

The ID of the project in which the GKE cluster
resides.

usage.amount

FLOAT

The quantity of usage.unit used.

usage.unit

STRING

The base unit in which resource usage is measured. For example, the base
unit for standard storage is byte-seconds.