Databases

Setting up Your Redshift Cluster

Periscope Data

|

August 20, 2015

You’ve done your data warehouse research and have settled on Amazon Redshift. Now you just need to get everything set up. We’re heavy users of Redshift, so this is something we have a lot of experience with.

Setting up your cluster

Setting up a Redshift cluster is extremely easy. The details of connecting to your Redshift cluster vary depending on how you set it up, but the basics are the same.

Node Type

Compute nodes have more ECU and memory per dollar than storage nodes, but come with far less storage. We highly value speed at Periscope, so we’ve found these to be the most effective. The more data you are querying, the more compute you need to keep queries fast.

Storage nodes can work well if you have too much data to fit on SSD nodes within your budget, or you want to store a lot more data than you expect to query.

Number of Nodes

Now you need to figure out how many nodes to use. This depends somewhat on your dataset, but for single query performance, the more the merrier.

The size of your data will determine the smallest cluster you can have. Compute nodes only come with 160GB drives. Even if your row count is in the low billions, you may still require 10+ nodes.

Network Setup

The last step is network setup. Clusters in US East (North Virginia) do not require a VPC, while the rest do. For any production usage, we suggest using a VPC, as you’ll get better network connectivity to your EC2 instances.

A default VPC is created if one doesn’t exist. If you want to access Redshift from outside of AWS, then add a public IP by setting Publicly Accessible to true. Whether you want a public IP on your cluster is up to you — the rest of this post explains how to connect to both public and private IPs.

In either case, take note of the VPC Security group. You’ll need to allow access to the cluster through it later.

EC2 Classic

We’ll start with the simplest cluster setup possible — a cluster in Virginia not in any VPC. This kind of setup is best used for prototyping.

Once the cluster boots, the Configuration tab in the AWS Redshift console will show you the endpoint address.

Before connecting, we need to allow the IP in the Cluster Security Group. Click the link, then click Add Connection Type. The default is your current IP.

Now connect directly to your cluster:

psql -h \

periscope-test.us-east-1.redshift.amazonaws.com \

-p 5439 -U periscope dev

VPC

If your cluster is in a VPC with a public IP, there’s one more step: Head to the VPC’s security group for this clusters, and whitelist port 5439 for your IP address.