Documentation

Connecting Redshift from BI tools via JDBC Driver

SummaryBrief discussion of using the Amazon Redshift JDBC driver and alternatives to connect to Redshift from BI tools.

Connecting to Redshift from BI tools and other 3rd party applications can be done using the Amazon Redshift JDBC driver. Amazon has good documentation on how to install and set this up.

However, note that the Amazon driver has a proprietary license that can be limiting in some cases. Additionally, installing and managing another JDBC driver just for Redshift can be cumbersome. We've actually found that using the PostgreSQL JDBC driver with Redshift works very well for most use-cases.

PostgreSQL drivers are licensed BSD and are bundled with many SQL querying tools like IntelliJ IDEA/DataGrip, so in most tools you can simply choose the PostgreSQL driver, and supply the connection parameters matching your Redshift cluster. Just don't forget to set the right port: Redshift uses port 5439 by default while PostgreSQL uses 5432.

If you are planning to load data into Redshift, make sure to read about the COPY command because inserting data into Redshift using INSERT commands is generally too slow for most real-life scenarios. Using the COPY command utilizes Redshift's parallel architecture to speed things up by a lot, but it does have its downsides:

It adds some complexity as there are more steps in the process: Writing the data to files, uploading the files to S3, sending COPY commands to Redshift, and verifying that the data was loaded correctly.

When a COPY command contains data that fails to load, it is sometimes difficult to locate the problematic data, fix it, and then re-emit the COPY command.

To simplify the process of loading data into Amazon Redshift, consider using Alooma's data pipeline, which was designed and built to load data at scale, while making error handling as simple as possible, and eliminating data loss.