Identity and access management

When you move your workloads to Cloud Platform, you can continue to manage
your end users by using common user management services and tools
like LDAP, Active Directory, or G Suite's Admin service. However, the way
you manage access to networked resources diverges slightly from what
you might be used to.

On Cloud Platform, you configure access to your virtual resources
by using
Google Cloud IAM.
Cloud IAM does not
directly manage identities. Instead, it allows you to assign roles,
which are user-defined collections of access permissions, to several
Google identity types. Supported types include:

You can assign roles to non-Google identity types as well. To do so, you
bind the non-Google type to a Google identity type, and then assign the
role to the Google identity type. See
Best Practices for Enterprise Organizations
for more information.

Service accounts

When you run an application in a traditional data center, you typically
create a separate identity that your application uses while running. Your
application then performs operations requiring authentication, such as
API calls, under that identity.

Google Groups

You can use Google Groups to apply a role to a collection of users.
You simply create a Google Group, add your Google accounts or service
accounts to the group, and then apply your Cloud IAM role to the group.

Logging

Logging on Cloud Platform is nearly identical to logging in a traditional
data center environment. To collect logs, you install a logging agent on
each virtual machine. The logging agent logs information about the machine
and its applications, and then sends the logs to a centralized location
to be indexed. After the logs have been aggregated and indexed, they
can be processed, analyzed, or visualized.

Cloud Platform offers a powerful integrated suite of logging-oriented
services. In the Cloud Platform stack,
Stackdriver Logging
serves as the centralized collection and indexing service,
aggregating logs from your Compute Engine VM instances
and your other Cloud Platform resources. You can search and
filter your logs in the GCP Console's built-in logs viewer,
or you can stream your logs to other Cloud Platform endpoints
such as Google Cloud Storage, Google BigQuery, and Google Cloud Pub/Sub for
further processing and analysis.

Important: If you prefer your current logging tools and services,
such as Splunk, Logstash, Sensu, or Nagios, you can also set these
up on Cloud Platform. The installation and maintenance overhead will
be similar to what you are used to in a traditional data center,
only without any hardware maintenance overhead.

Collecting logs

To collect logs from your Compute Engine VM instances, you install the
Stackdriver Logging agent
on each instance. The agents then automatically start sending their log data to
Stackdriver.

If you plan to maintain a hybrid architecture, in which you will
maintain some resources on Cloud Platform and other resources
elsewhere, you can still take advantage of Cloud Platform's
logging-related services. For example, the
Fluentd community plugins page
contains a set of plugins for streaming aggregated logs to the
Stackdriver Logging API, BigQuery, Cloud Storage, and Cloud Pub/Sub.
Similarly, the
Logstash output plugins page
contains plugins for streaming logs to BigQuery and Cloud Storage.

You can also use Stackdriver Logging if your hybrid architecture
includes virtual machines on other clouds. In particular, the
Stackdriver Logging agent can be installed directly on Amazon EC2 instances.

Indexing and storing logs

As Stackdriver Logging collects logs from your Compute Engine
instances, it stores and indexes the logs. Stackdriver Logging
retains logs for 30 days. To store your logs for later processing,
analysis, or auditing, you can set up Stackdriver Logging to
automatically export the logs to other Cloud Platform services,
including:

Analyzing and processing logs

In traditional data centers, intensive analysis and processing tasks
must compete with other tasks for resources, and are subject to the
same upfront capital investment and capacity constraints. On Cloud
Platform, these issues disappear. You provision and pay for what
you need when you need it—you don't have to worry about reserving
time, cores, or storage resources.

Logs analysis and processing on Cloud Platform is designed to be flexible.
You can use Cloud Platform's stack of logs analysis and processing tools,
which includes:

The Stackdriver Logs Viewer

BigQuery

Google Cloud Dataproc

Google Cloud Dataflow

If you prefer to use your current analysis and processing tools, you can
do so by setting them up on Compute Engine VM instances. You can also
integrate the Cloud Platform logging stack into your current analysis
and processing pipelines.

Logs analysis with the Stackdriver Logs Viewer and BigQuery

The GCP Console provides a built-in
Stackdriver Logs Viewer
that you
can use to search and filter your logged data. For large-scale
analysis across a massive dataset, you can stream or export your
logs into BigQuery. Unlike a bulky MapReduce job, which can take
minutes or hours, BigQuery is capable of performing queries on
terabytes of logs in tens of seconds in some cases, allowing
you to quickly identify application anomalies,
perform audit logs analysis,
perform trend analysis on your logs, and more.

Logs processing with Cloud Dataproc and Cloud Dataflow

If you need to process your logs before analysis, you can export
them to Cloud Pub/Sub or Cloud Storage, and then use
Cloud Dataproc
or
Cloud Dataflow
to process them. After you process your logs with Cloud Dataproc or
Cloud Dataflow, you can send the results to BigQuery for
interactive or batch analysis.

Cloud Dataproc is Cloud Platform’s managed Apache Hadoop
and Apache Spark service. If you're already using Hadoop and
Spark, you can reuse your existing jobs on Cloud Dataproc without
worrying about acquiring hardware resources ahead of time. Because
you can store both the original and the processed logs in Cloud
Storage, you can also shut down your Cloud Dataproc cluster when
not in use. Cloud Dataproc clusters charge you only for the virtual
CPU resources you use, and only for the length of time you use them.

You might also consider Cloud Dataflow, a fully managed stream
and batch processing service. Based on Apache Beam, Cloud Dataflow
is a true serverless solution. Cloud Dataflow dynamically provisions
and allocates resources on demand, minimizing latency while
maintaining high utilization efficiency. For more information
about how you can integrate Cloud Dataflow into your logs
processing pipeline, see
Processing Logs at Scale with Cloud Dataflow.

Visualizing logs

Google provides two managed services that you can use
to visualize your logs data:

Cloud Datalab, built on the
Jupyter notebook model,
allows you to query and visualize data stored in BigQuery and Cloud
Storage. As a bonus, because Cloud Datalab is built on Jupyter
notebooks, there is a large ecosystem already in place to help
get you started.

In addition, several Cloud Platform partner visualization services provide
native connectors to BigQuery, including
Tableau,
BIME,
and
re:dash.
See
BigQuery Partners
for more information.

Finally, for maximum customization, you can build out a dashboard on
top of BigQuery using the BigQuery API and a JavaScript charting
library of your choice. See
Creating a BigQuery Dashboard for details.

Costs

Stackdriver Logging offers a Basic Tier, which is free up to 5 GB per
month, and a Premium Tier. The Premium Tier is priced per monitored
resource per month, prorated hourly. For more information, see the
Announcing pricing for Google Stackdriver
blog post.

If you
export logs
to Cloud
Storage, BigQuery, or Cloud Pub/Sub, you will also be charged for
the use of those services. If you export your logs to BigQuery,
you will also be charged a small data-streaming fee. See
BigQuery Pricing
for more information.

Auditing

Cloud Platform provides built-in audit logging, and makes it easy to
integrate your current audit-logging solutions as well. For
information on which services currently generate audit logs, see
Google Cloud Audit Logging.

VM instance logging

If you use standard OS-native audit-logging solutions, such as
syslog on Linux or Windows Event Log on Windows, you can set
them up by creating a Linux or Windows VM instance on Compute
Engine. You can also deploy your preferred third-party solutions
on Compute Engine instances.

Cloud Platform resource logging

Admin Activity audit logs, which contain an entry for every
API call or administrative action that modifies the configuration
or metadata of a service or project. This log type is always
enabled.

Data Access audit logs, which contain an entry for each
instance of the following events:

An API call or administrative action that reads the
configuration or metadata of a service or project.

An API call or administrative action that creates, modifies,
or reads user-provided data managed by a service, such as data
stored in a database service.

In most Cloud Platform services, Data Access audit logs are not
enabled by default, as they can have a much higher volume
than Admin Activity logs. Data Access logs are also more restricted
than Admin Activity logs. By default, only project owners and
users with the
Private Logs Viewer
role can access Data Access logs. Admin Activity logs are visible
to all project members.

Stackdriver Logging allows you to manage access to your audit
logs through Cloud IAM, Cloud Platform's identity and access
management layer. For more information, see the
Identity and access management
section.

As with other Stackdriver Logging logs, audit logs are retained
for 30 days by default. You can export your audit logs to BigQuery,
Cloud Storage, and Cloud Pub/Sub if you want to retain them for a
longer period of time.

Integrating with third-party services

Monitoring

As with logging, monitoring in a cloud environment uses a model
common to data center environments. You install a monitoring agent
on the virtual machines you would like to monitor. This monitoring
agent then sends metrics to a centralized location. From there, you
can define alerting configurations and create dashboards to visualize
your metrics in real time.

For monitoring tasks, Cloud Platform provides
Stackdriver Monitoring,
a full-featured monitoring
framework. As with logging, you can also use your current monitoring
tools and services, such as Splunk, DataDog, the Elastic/ELK stack,
Sensu, or Nagios.

Metric collection

To collect metrics from your Compute Engine VM instances, you install the
Stackdriver Monitoring agent
on each instance. The agents then automatically send metrics to
Stackdriver Monitoring.

By default, Stackdriver Monitoring monitors machine resources for
each instance, such as CPU load and network I/O. However, with a small
amount of additional configuration, Stackdriver Monitoring can also
monitor a number of common third-party applications as well, including
the Apache Web Server, MongoDB, NGINX, Redis, and Varnish. See
Monitoring Third-party Applications
for more information.

In addition to collecting metrics from Compute Engine VM instances,
Stackdriver Monitoring automatically collects metrics from several other
Cloud Platform services. The Stackdriver Monitoring agent can also be
installed directly to Amazon EC2 instances, and Stackdriver Monitoring
can be configured to collect metrics from many Amazon Web Services (AWS)
services as well. For a complete list of metrics available in Stackdriver
Monitoring, see
Metrics List.

You can also create custom metrics, and then instrument your applications
to send them to Stackdriver Monitoring through the Monitoring API.

Alerting

If you'd like to target an endpoint that isn't natively supported
by Stackdriver Monitoring, you can also configure a webhook. See
Configuring Webhooks
for more information.

Visualization

Like most monitoring frameworks, Stackdriver Monitoring provides a
customizable dashboard UI that allows you to visualize events in a
meaningful and actionable way. You can create charts that display
specific metrics for a given resource type, aggregated metrics,
metrics by a given resource ID, and more. In addition, you can
view indexed event logs and incident lists.

The GCP Console also provides visualizations on a per-service
basis for common metrics such as CPU, disk usage, and network traffic.
As with Stackdriver Logging, you can also use Cloud Datalab to
visualize and manipulate your metrics data.

Costs

Stackdriver Monitoring offers a Basic Tier, which is free up to 5GB
per month, and a Premium Tier. The Premium Tier is priced per monitored
resource per month, prorated hourly. For more information, see the
Announcing pricing for Google Stackdriver
blog post.

Resource provisioning

This section discusses ways in which you can provision your virtual
resources on Cloud Platform, and discusses the role of versioning in
the provisioning process.

VM instance provisioning

Compute Engine includes some built-in features that streamline
instance provisioning. You can create machine profiles called
instance templates,
and then assign them to
instance groups
to create tens or hundreds of identical instances as needed.

You can automatically scale the number of
VM instances within these groups by using Compute Engine's
built-in autoscaler. To use the autoscaler, you define the
minimum and maximum number of instances that you want to be
running at a given time, and then define the metrics against
which the autoscaler will create or destroy instances. You can set
the autoscaler to scale depending on CPU utilization, load balancer
capacity, or your own custom metrics. For more information, see
Autoscaling Groups of Instances.

General resource provisioning

You can automate the deployment of all of your Cloud Platform
resources with
Google Cloud Deployment Manager.
As with other configuration management tools, such as Puppet
and Chef, you specify your resources in a deployment template,
and then Deployment Manager uses this template to instantiate
and manage your resources. You can specify your deployment template
in several formats, including YAML, Python, and Jinja2.

Version control and source repositories

If you prefer to use your current version control solutions,
you can do so on Cloud Platform by hosting and running them on
Cloud Platform or by connecting to an externally hosted or
managed service such as GitHub or Bitbucket.

For users familiar with Git, Google also provides
Google Cloud Source Repositories, which
are fully featured, private Git repositories hosted on Cloud Platform.
Cloud Source Repositories can be added to a local Git repository as a
remote repository or hosted on GitHub or Bitbucket. Cloud Source
Repositories also provide a source editor that you can use to browse,
view, edit and commit changes to repository files from within the Cloud
Console.

Costs

Deployment Manager does not charge for use, but any billable
resources you deploy using Deployment Manager will incur charges.

Cloud Source Repositories is free of charge during its beta release.

What's next?

Check out the other Google Cloud Platform for Data Center Professionals
articles: