Le Nguyen The Dat bio photo

Le Nguyen The Dat

~ Data Science & Engineering

Twitter Facebook LinkedIn Github

Ran into this request just today, in short, we need to create a downstream Database for our Data Warehouse for better scalability, security, and performance for our internal users.

Prerequisite:

Note: currently - July 2014 - there is a bug that prevents the following query to finish, when you try to do it from an RDS instance type M2, M3 or R3 - the query would run on Redshift for about 15 minutes (you can’t even kill it) and ends with a blank error message. Just choose RDS Instance type M1 and it would be just fine.

Step-by-step instructions:

  • Connect to your instance (Remember to set your instance permission / firewall properly):
psql -h your.rds.address -p port -U username -d dbname
  • Configuring dblink:
CREATE EXTENSION dblink;
SELECT dblink_connect('dbname=dbname user=user password=password
                       host=your.redshift.address port=port');
  • Copy a table from Redshift to RDS:
SELECT mytable.*
INTO rds_schema_name.rds_table_name
FROM
    dblink('SELECT field_1, field_2 FROM redshift_schema.redshift_table')
AS
    mytable(field_1 text, field_2 text);
Get data from the given table in RDS:
SELECT field_1, field_2
FROM rds_schema_name.rds_table_name;

Do it with aws-redshift-to-rds:

In case you need to solve this problem ASAP, here is a little command-line interface tool (named: aws-redshift-to-rds) that I’ve developed while working at Zalora :).