Subscribe

Follow Us

Amazon Redshift Tips & Tricks

Last month’s webinar, Working with Redshift: Amazon’s Affordable MPP Analytic Database in the Cloud, was well received and sparked a lot of interest from viewers. Based on the feedback, interaction and questions received, Chun Wu was happy to create a more in depth level of Tips & Tricks for our Analytic Minute readers.

For those of you who may have missed last month’s webinar, you can see a replay of the training by following the link to the right.

Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiently and parallelizing queries across multiple nodes.

Below you will find 10 Tips & Tricks to help you get started working with Redshift and some sample copy commands that will help you while working with Redshift.

Below you will find 10 Tips & Tricks and some sample copy commands that will help you while working with Redshift.

10 Amazon Redshift Tips & Tricks:

#

Tips & Tricks

Additional Comments

1.

Launch the Redshift cluster in the same AWS region as the S3 bucket to improve load performance of the COPY command.

2.

An encrypted Redshift cluster provides extra security but it also impacts performance. If possible, launch an encrypted cluster with Dense Compute (SSD) nodes

-- specify the manifest file name -- if the manifest option is used, the data file name will be replaced by the manifest file name

manifest;

Amazon Redshift Example 3: Merge sample pseudo codes

-- load the data into the staging table

copy tab_stg

from 's3://my_bucket/data_files'

…

-- update the target table with new values from the staging table

update tab_tgt tgt set col1 = stg.col1

from tab_stg stg

-- explicit join on the primary key and distribution key

where tgt.pkey = stg.pkey and tgt.dkey = stg.dkey

-- insert the new rows

insert into tab_tgt

select … from tab_stg

from tab_stg stg left join tab_tgt tgt

on stg.pkey = tgt.pkey and stg.dkey = tgt.dkey

where tgt.pkey is null

We hope you’ve enjoyed the added Tips & Tricks for working with Amazon’s Redshift and trust that it will make your next data warehouse implementation more efficient. We appreciate Steve Dine & Chun Wu for putting together the advanced Tips & Tricks list. Feel free to contact us for additional tips and tricks for working with Amazon Redshift.

Datasource Consulting is a premier leader in the Enterprise Data Management and Business Intelligence industry. We specialize in delivering lean, innovative, end-to-end solutions that leverage and express the full value of data.