Tuesday, March 31, 2020

How to access data of Redshift cluster from PostgreSQL RDS

We were using Amazon Redshift as our data warehouse. It is a columnar-oriented petabyte-scale,  OLAP database suitable for analytical queries and applications.

There were many times when we wanted to pull data from Amazon Redshift and power our dashboards. This is basically using an OLAP database for OLTP use case - it was obviously not going to work very well.

Amazon RDS provides a feature called DBLink. DBLink enables us to access Amazon Redshift table directly from Amazon RDS. We don't need to explicitly copy over the data from Redshift to RDS, DBLink handles it for us and moves the data at the block level.

In this post, we will review the steps needed to setup the DBLink on Amazon RDS. This post assumes that you already have a running Amazon Redshift cluster and an Amazon RDS PostgreSQL instance.

Setting up DBLink
  • First we need to enable the DBLink extensions on the PostgreSQL RDS instance.
  • Next step is the main step which creates the link to the Amazon Redshift cluster and PostgreSQL RDS instance.
  • To achieve this we need to create the server link and the user mapping
  • Please note that all the values that are enclosed within <angle brackets> need to be replaced with actual values from your environment.
  • You might have to use the private IP address of your Redshift cluster. It totally depends on your setup.
  • Now that the DBLink is created, its time to query the Redshift cluster from the PostgreSQL to check if everything is setup correctly
  • Everything enclosed within $REDSHIFT$ is the actual query that runs on Redshift and results are sent back to PostgreSQL.
  • To create the cache of the Redshift data, we need to create a materialised view. 
  • Query the view just like you query any other view.
  • To refresh the view, use the following code

Thats about it, these are the simple steps using which you can access data of Redshift cluster from PostgreSQL RDS instance.

Have some Fun!