Access Logs & Storage Logs

This document discusses how to download and review access logs and storage
information for your Cloud Storage buckets, and analyze the logs using
Google BigQuery.

Introduction

Cloud Storage offers access logs and storage logs in the form of CSV files
that you can download and view. Access logs provide information for all of the
requests made on a specified bucket and are created hourly, while the daily
storage logs provide information about the storage consumption of that bucket
for the last day. The access logs and storage logs are automatically
created as new objects in a bucket that you specify.

When you configure a Cloud Storage bucket to simulate the behavior of a static
website, you might want to log how resources in the website are being used.
Note that you can also configure bucket access logs and storage logs
for any Cloud Storage bucket.

Note: Timeliness of access logs delivery is not guaranteed.

Should you use access & storage logs or Cloud Audit Logging?

In most cases, Cloud Audit Logging is the recommended method for
generating logs that track API operations performed in Cloud Storage:

Cloud Audit Logging tracks access on a continuous basis.

Cloud Audit Logging produces logs that are easier to work with.

Cloud Audit Logging can monitor many of your Google Cloud Platform services, not just
Cloud Storage.

In some cases, you may want to use access & storage logs instead.

You most likely want to use access logs if:

You want to track access to public objects.

You want to track access to objects when the access is exclusively granted
because of the Access Control Lists (ACLs) set on the objects.

gsutil

Set permissions to allow Cloud Storage WRITE permission to
the bucket.

Cloud Storage must have WRITE permission to create and store
your logs as new objects. To grant Cloud Storage WRITE access to
your bucket, grant the cloud-storage-analytics@google.com group
write access with the following command:

Optionally, you can set the log_object_prefix object prefix for your log objects.
The object prefix forms the beginning of the log object name. It can be at most 900 characters
and must be a valid object name.
By default, the object prefix is the name of the bucket for which the logs are enabled.

XML API

Create a bucket to store your logs.

Create a bucket to store your logs using the following request:

PUT /example-logs-bucket HTTP/1.1
Host: storage.googleapis.com

Set permissions to allow Cloud Storage WRITE permission
to the bucket.

Cloud Storage must have WRITE permission to create and
store your logs as new objects. To grant Cloud Storage WRITE
access to your bucket, add an ACL entry for the bucket that grants the
cloud-storage-analytics@google.com group write access. Be sure
to include all existing ACLs for the bucket, in addition to the new ACL, in
the request.

JSON API

Set permissions to allow Cloud Storage WRITE permission to
the bucket.

Cloud Storage must have WRITE permission to create and
store your logs as new objects. To grant Cloud Storage WRITE
access to your bucket, add an ACL entry for the bucket that grants the
cloud-storage-analytics@google.com group write access.
You can do this with the following request to the BucketAccessControls resource
for the logging bucket:

Downloading logs

Storage logs are generated once a day and contain the storage usage for the
previous day. They are typically created before 10:00 am PST.

Usage logs are generated hourly when there is activity to report in the
monitored bucket. Usage logs are typically created 15 minutes after the end
of the hour.

Note:

Any log processing of usage logs should take into account the possibility
that they may be delivered later than 15 minutes after the end of an hour.

Usually, hourly usage log object(s) contain records for all usage that
occurred during that hour. Occasionally, an hourly usage log object contains
records for an earlier hour, but never for a later hour.

Cloud Storage may write multiple log objects for the same hour.

Occasionally, a single record may appear twice in the usage logs. While
we make our best effort to remove duplicate records, your log processing
should be able to remove them if it is critical to your log analysis. You can
use the s_request_id field to detect duplicates.

Access to your logs is controlled by the ACL on the log objects. Log objects
have the default object acl of the log bucket.

The easiest way to download your access logs and storage logs is either through
the Google Cloud Platform Console or using the gsutil tool. Your access logs are
in CSV format and have the following naming convention:

gs://<bucket_name>/<object_prefix>_usage_<timestamp>_<id>_v0

For example, the following is an access logs object for a bucket named
gs://example-bucket, created on June 18, 2013 at 14:00 UTC and stored in the
bucket gs://example-logs-bucket:

Console

gsutil

Run the following command:

gsutil cp <logs_object><destination_uri>

Analyzing logs in BigQuery

To query your Cloud Storage usage and storage logs, you can use
Google BiqQuery which enables fast, SQL-like queries against append-only
tables. The BigQuery Command-Line Tool, bq, is a Python-based tool that
allows you to access BigQuery from the command line. For information about
downloading and using bq, see the bq Command-Line Tool reference page.

Read schema data (.json file) from the same directory where the bq command runs.

Skip the first row of each log file because it contains column descriptions.

Because this was the first time you ran the load command in the example
here, the tables usage and storage were created. You could continue
to append to these tables with subsequent load commands with different
access log file names or using wildcards. For example, the following
command appends data from all logs that start with "bucket_usuage_2014",
to the storage table:

Modifying the access log schema

In some scenarios, you may find it useful to pre-process access logs before
loading into BigQuery. For example, you can add additional information to the
access logs to make your query analysis easier in BigQuery. In this section,
we'll show how you can add the file name of each storage access log to the log.
This requires modifying the existing schema and each log file.

Modify the existing schema,
cloud_storage_storage_schema_v0, to add file name as shown below. Give
the new schema a new name, for example, cloud_storage_storage_schema_custom.json,
to distinguish from the original.

The gsutil command copies the files into your working directory. The
second command loops through the log files and adds "filename" to the
description row (first row) and the actual file name to the data row
(second row). Here's an example of a modified log file:

Querying logs in BigQuery

Once your logs are loaded into BigQuery, you can query your access logs to
return information about your logged bucket(s). The following example shows you
how to use the bq tool in a scenario where you have access logs for a bucket
over several days and you have loaded the logs as shown in
Loading access logs into BigQuery. You can also execute the queries
below using the BigQuery Browser Tool.

In the bq tool, enter the interactive mode.

$ bq shell

Run a query against the storage log table.

For example, the following query shows how the storage of a logged bucket
changes in time. It assumes that you modified the storage access logs as
described in Modifying the Access Log Schema and that the log files
are named "log_storage_*".

project-name>SELECT SUBSTRING(filename, 13, 10) as day, storage_byte_hours/24 as size FROM [storageanalysis.storage] ORDER BY filename LIMIT 100

Access and storage log format

The access logs and storage logs can provide an overwhelming amount of
information. You can use the following tables to help you identify all the
information provided in these logs.

Access log fields:

Field

Type

Description

time_micros

integer

The time that the request was completed, in microseconds since the Unix epoch.

c_ip

string

The IP address from which the request was made. The "c" prefix indicates that this is information about the client.

c_ip_type

integer

The type of IP in the c_ip field:

A value of 1 indicates an IPV4 address.

A value of 2 indicates an IPV6 address.

c_ip_region

string

Reserved for future use.

cs_method

string

The HTTP method of this request. The "cs" prefix indicates that this information was sent from the client to the server.

cs_uri

string

The URI of the request.

sc_status

integer

The HTTP status code the server sent in response. The "sc" prefix indicates that this information was sent from the server to the client.

cs_bytes

integer

The number of bytes sent in the request.

sc_bytes

integer

The number of bytes sent in the response.

time_taken_micros

integer

The time it took to serve the request in microseconds, measured from when
the first byte is received to when the response is sent. Note that for
resumable uploads, the ending point is determined by the response to the final
upload request that was part of the resumable upload.