Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Amazon Redshift SSD - Queries on TBs of data can run in a few seconds

We have run benchmarks to compare Redshift SSD instances to Redshift HDD instances. See our blog at https://flydata.com/blog/posts/with-amazon-redshift-ssd-querying-a-tb-of-data-took-less-than-10-seconds

Amazon Redshift SSD - Queries on TBs of data can run in a few seconds

1.
Amazon Redshift SSD
- Queries on TBs of data can
run in a few seconds
FlyData: Amazon Redshift
BENCHMARK Series 03
www.flydata.com

2.
Amazon Redshift HDD took 33.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD took 4.32 seconds to run our
queries for 300GB data
Amazon Redshift SSD performed 8X faster
Takeaways:
•1.2 TB can now be handled in under
10 seconds.
•Use cases could spread to ad-delivery
optimization and financial trading
systems.
www.flydata.com

3.
Amazon Redshift is a popular data warehouse for
big data on the cloud. AWS added the SSD instance
type on January 24, 2014.
We have run benchmarks to compare Redshift SSD
instances to Redshift HDD instances using the
following parameters:
• Data Size: 1.2TB and 300GB
• Query performance when
querying against all records in the cluster
• Loading speed
• Cost comparison
www.flydata.com

4.
1. Query Speed for similar cluster sizes
• SSD version is
faster.
• Query against
1.2TB (entire data
set) took less than
10 seconds!
• For 1.2TB of data,
comparing similar
node sizes:
query time: 9.22s
(SSD) vs 28.48s
(HDD 8XLx2)
* See Appendix for queries being used.
Comparison of query speed against dw1.xlarge (HDD) and dw2.large (SSD) for 1.2TBs of data.
In order of cost
www.flydata.com

5.
1. Query Speed at similar pricing points
• Query performance comparison based
on similar pricing point.
• 4 nodes of dw2.large cost:
$0.25(/hour) * 4(nodes) = $1.00(/hour)
• 1 node of dw1.xlarge cost:
$0.85(/hour)
• Direct comparison is difficult, but we
can see much better query
performance for the dw2 (SSD)
Redshift.
* See Appendix for queries being used.
Comparison of query speed for cluster configurations with similar pricing for 300GB of data.
www.flydata.com

6.
2. Loading Time
• For similar cost
(DW2:$1.00/hour vs
DW1:$0.85/hour),
loading time was 4.6x
faster on SSD.
• For similar node sizes
(DW2:12 nodes vs
DW1:16 nodes),
loading time was
1.65x faster on SSD.
* See Appendix for queries being used.
Similar Cost Similar Node
Count
www.flydata.com

15.
Appendix: Additional Information
• All resources for our benchmark are on
our github repository
– https://github.com/hapyrus/redshift-
https://github.com/hapyrus/redshift-
benchmark
– The dataset we use is open on S3, so you
can reproduce the benchmark
www.flydata.com

21.
Additional Comments
• SSD could be 3.5x ~ 5x more expensive than
HDD for the same amount of storage space
(SSD is really optimized for performance)
• DW1.8xlarge is exactly 8 times a DW1.xlarge,
but DW2.8xlarge is actually 16 times a
DW2.large. This is because DW2.large nodes
are not “xlarge”; a bit confusing… ;)
(as of Jan. 27, 2014)
www.flydata.com