In “Amazon Redshift Database Developer Guide“, there is an explanation for data join: “HASH JOIN and HASH are used when joining tables where the join columns are not both distribution keys and sort keys. MERGE JOIN is used when joining tables where the join columns are both distribution keys and… Read more »

Details about distribution styles: http://docs.aws.amazon.com/redshift/latest/dg/viewing-distribution-styles.html How to COPY multiple files into Redshift from S3 http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html Could “Group” (or “Order”) by number, not column name

1

2

3

4

5

6

7

SELECT listing.sellerid,sum(sales.qtysold)

FROM sales,listing

WHERE sales.salesid=listing.listid

ANDlisting.listtime>'2008-12-01'

ANDsales.saletime>'2008-12-01'

GROUP BY1

ORDER BY1;

COPY with automatical compression To apply automatic compression to an empty table, regardless of its current compression encodings, run the… Read more »

In previous article, I created two tables in my Redshift Cluster. Now I wan’t to find out the relation between salary of every employee and their working age. Tableau is the best choice for visualizing data analysis (SAS is too expensive and has no trail-version for learning). First, we connect… Read more »

Last year, I imported two datasets to Hive. Currently, I will load two these two datasets into Amazon RedShift instead. After created a RedShift Cluster in my VPC, I couldn’t connect to it even with Elastic IP. Then I check the parameters of my VPC between AWS’s default VPC, and… Read more »

Paper reference: iShuffle: Improving Hadoop Performance with Shuffle-on-Write Background: A job in Hadoop consists of three main stages: map, shuffle, reduce (Actually shuffle stage has been contained into reduce stage). What is the problem? Shuffle phase need to migrate large mount of data from nodes which running map job to… Read more »

Question1: Flume process report “Expected timestamp in the Flume event headers, but it was null” Solution1: The flume process expect to receive events with timestamp, but the event doesn’t have. For sending normal text event to flume, we need to tell it to generate timestamp with every events by itself…. Read more »