Le Nguyen The Dat bio photo

Le Nguyen The Dat

~ Data Science & Engineering

Twitter Facebook LinkedIn Github

Encountered this bug trying to integrate Tableau with Amazon Redshift the other day, figured I should note it down somewhere…


image

Root cause:

Basically stated here:

If your client application uses an ODBC connection and your query creates a result set that is too large to fit in memory, you can stream the result set to your client application by using a cursor. When you use a cursor, the entire result set is materialized on the leader node, and then your client can fetch the results incrementally.

So, the bigger your Result Set Size per Cursor is, the more time you save retrieving the query result. However, the smaller your cluster is, the lower Result Set Size, lower number of concurrent query it can support. It’s basically a trade-off between speed and convenience. Different Redshift cluster type has different constraints on Result Set, as described here:

Node Type Max RS per Cluster Max RS per Cursor Concurrent Cursors
DW1 XL single node 64000 32000 2
DW1 XL multiple nodes 1800000 450000 4
DW1 8XL multiple nodes 14400000 960000 15
DW2 Large single node 16000 16000 1
DW2 Large multiple nodes 384000 192000 2
DW2 8XL multiple nodes 3000000 750000 4

Solution:

  • Go to your Redshift cluster setting and edit your Parameter Group (You will need to create a new one if you haven’t done so before):


image


  • Update max_cursor_result_set_size to a smaller number depending on your usage and cluster size:


image


  • Modify your cluster - change to newly created Parameter Group and reboot:


image