Redshift Deep Copy Disk Space

https://dba.stackexchange.com/questions/182015

aws
redshift

08-10-2020
|

Question

I am trying to perform a deep copy on RS. See http://docs.aws.amazon.com/redshift/latest/dg/performing-a-deep-copy.html

I created the new/temporary table using the SAME DDL with compression settings as the original table.

I have 3 nodes, each node is using 21% disk space.

When I run the copy:

insert into events_tmp (select * from events);

I run out of disk space. Does the deep copy not compress the data?

Thanks!

Solution

See vceron's post on https://discourse.snowplowanalytics.com/t/redshift-maintenance-best-practices/1124/2

When doing a Deep Copy in a highly unsorted table Redshift needs to sort this table before inserting into the new one.

This sort operation will be taken place in a intermediate temporary table, which at first be placed in memory, but as the data set is too big, will eventually spillover to disk.

However, those temporary tables are not compressed, and Redshift will allocate temporary disk space for that operation which results in disk full error if there is not sufficient space for the temporary data.

OTHER TIPS

Before running a Deep Copy you can Vacuum the table and free some more disk space, you might have updated / deleted rows in the table and it should remove them. if your table has a large unsorted region (which can’t be vacuumed) You can use the Column Encoding Utility from Amazon's GitHub to perform a deep copy. The Column Encoding Utility takes care of the compression analysis, column encoding and deep copy.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange