Google Cloud Platform

Google Cloud Platform (GCP) is a cloud computing service by Google that offers hosting on the same supporting infrastructure that Google uses internally for end-user products like Gmail, Google Search, Maps, and YouTube.

Cloud BigTable

Cloud BigTable is a fully managed NoSQL, wide-column database service for terabyte applications.

Accessed using the HBase API

Native compatibility with big data, Hadoop ecosystems

Managed, scalable storage

Data encryption in-flight and at rest

Control access with IAM

BigTable drives major applications, such as Google Search, Google Analytics, and Gmail

Learns and adjusts to access patterns

BigTable scales UP well; Datastore scales DOWN well.

If you need any of the following, consider using BigTable:

Storing > 1TB structure data;

Very high volume of writes;

Read/write latency < 10 milliseconds and strong consistency; and/or

HBase API compatible.

BigTable access patterns

Application API

Data can be read from and written to Cloud BigTable through a data service layer, like Managed VMs, the HBase REST Server, or a Java Server using the HBase client. Typically, this will be to serve data to applications, dashboards, and data services.

Streaming

Data can be streamed in (written even-by-even) through a variety of popular stream processing frameworks, like Cloud Dataflow Streaming, Spark Streaming, and Storm.

Batch Processing

Data can be read from and written to Cloud BigTable through batch processes, like Hadoop MapReduce, Dataflow, or Spark. Often, summarized or newly calculated data is written back to Cloud BigTable or to a downstream database.

Cloud SQL

Cloud SQL is a managed RDBMS.

Offers MySQL and PostgeSQLBeta databases as a service (DBaaS)

Automatic replication

Managed backups (automatic or scheduled)

Vertical scaling (read and write)

Horizontal scaling (read)

Google security (network firewalls and encryption)

Use cases

App Engine

Cloud SQL can be used with App Engine, using standard drivers.

You can configure a Cloud SQL instance to follow an App Engine application.

Compute Engine

Compute Engine instances can be authorized to access Cloud SQL instances using an external IP address.

Networking

Virtual Private Cloud (VPC)

You can provision GCP resources, connect them to each other, and isolate them from one another.

Google Cloud VPC networks are global; subnets are regional (and subnets can span the zones that make up the region).

You can have resources in different zones on the same subnet.

You can dynamically increase the size of a subnet in a custom network by expanding the range of IP addresses allocated to it (without any workload shutdown or downtime).

Forward traffic from one instance to another instance within the same network, even across subnets, without requiring external IP addresses.

Use your VPC route table to forward traffic within the network, even across subnets (and zones) without requiring an external IP address.

VPCs give you a global distributed firewall.

You can define firewall rules in terms of metadata tags on VMs (e.g., tag all of your web servers {VMs} with "web" and write a firewall rule stating that traffic on ports 80 and/or 443 is allowed into all VMs with the "web" tag, no matter what their IP address happens to be).

VPCs belong to GCP projects, however, if you wish to establish connections between VPCs, you can use VPC peering.

If you want to use the full power of IAM to control who and what in one project can interact with a VPC in another project, use shared VPCs.

Most quotas can be increased through a self-service form or a support ticket

IAM & admin -> Quotas

Labels

A utility for organizing GCP resources

Labels are key-value pairs

Attached to resources (e.g., VMs, disk, snapshots, images)

Can be created/applied via the Console, gcloud, or API

Example uses of labels:

Search and list all resources (inventory)

Filter resources (e.g., separate production from test)

Labels used in scripts

Label specification

A label is a key-value pair

Label keys and non-empty label values can contain lowercase letters, digits, and hyphens; must start with a letter; and must end with a letter or digit. The regular expression is: [a-z]([-a-z0-9]*[a-z0-9])

Instead of using a dynamic pipeline (like Cloud Dataflow), use BigQuery for data that needs to run more in the way of exploring a vast sea of data (and are able to do ad hoc SQL queries on that massive dataset)

No cluster maintenance required

Load data from Cloud Storage or Cloud Datastore or stream it into BigQuery at up to 100,000 rows per second

In addition to SQL queries, you can read/write data in BigQuery via Cloud Dataflow, Hadoop, and Spark

Compute and storage are separated with a terabit network in between

You only pay for storage and processing use

You pay for your data storage separately from queries

Automatic discount for long-term data storage

When the age of your data reach 90 days in BigQuery, Google will automatically drop the price of storage.

Free monthly quotas

99.9% SLA

Google's infrastructure is global and so is BigQuery. BigQuery lets you specify the region where your data will be kept. For example, if you want to keep data in Europe, you do not have to setup a cluster in Europe. Simply specify "EU" as the location when you create your dataset. US and Asia location are also available.

Use Spark SQL and Spark Machine Learning libraries (MLlib) to run classification algorithms

Save money with preemptible instances

The rate for pricing is based on the hour, but Dataproc is billed by the second (one minute minimum)

The MapReduce module means that one function (traditionally called the "Map" function) runs in parallel with a massive dataset to produce intermediate results. Another function (the "Reduce" function) builds a final result set, based on all those intermediate results.

Cloud Dataflow

Cloud Dataflow is a simplified stream and batch data processing service, with equal reliability and expressiveness.

Use Cloud Dataproc when you have a dataset of known size or when you want to manage your cluster size yourself. If your data is ingested in real-time or is of an unpredictable size or rate, use Cloud Dataflow.

Command Line Interface (CLI)

The Google Cloud SDK is a set of tools that you can use to manage resources and applications hosted on the Google Cloud Platform (GCP). These include the gcloud, gsutil, and bq command line tools. The gcloud command-line tool is downloaded along with the Cloud SDK.

Deployment Manager

"Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud Platform resources for you. Write flexible template and configuration files and use them to create deployments that have a variety of Cloud Platform services, such as Google Cloud Storage, Google Compute Engine, and Google Cloud SQL, configured to work together". source

The above will return a signed-URL (it will look something like https://storage.googleapis.com/xtof-sandbox/test.txt?x-goog-signature=23asd...), which you can send to users and will only be valid for 3 minutes. After 3 minutes, they will get an "ExpiredToken" error.

Create the tunnels between the VPN gateways. After the tunnels exist, create a static route to enable traffic to be forwarded into the tunnel. If this is successful, you can ping a local VM in one location on its internal IP from a VM in a different location.

1An EBS volume can be attached to only one EC2 instance at a time. Can attach up to 40 disk volumes to a Linux instance. Available in only one region by default.2GCP Persistent Disks in read-only mode can be attached to multiple instances simultaneously. Can attach up to 128 disk volumes. Snapshots are global and can be used in any region without additional operations or charges.