Question

Production Cluster details:

  • Node Type dc1.8xlarge
  • Nodes 25
  • 2.56TB SSD storage per node

Test Cluster details:

  • Node Type ds2.xlarge
  • Nodes 6
  • 2TB HDD storage per node

When same table with exactly same DDL & encoding is unloaded and copied from production cluster to test cluster, its disk footprint reduces exponentially. This has been tested with multiple tables with different distribution styles and sort key patterns.

Example: Table A (No sort key, DISTSYLE EVEN) - Size in production: 60GB; Size in test: 0.6 GB

Table B (Sort key, DISTSTYLE KEY) - Size in production: 96GB 100% sorted; Size in test: 1.4 GB 100% sorted

Any ideas what can result in this discrepancy? I have read most of redshift forums but not able to find a reason for this issue. I am using the admin view v_space_used_per_tbl (provided by AWS) for calculating size of the table.

Was it helpful?

Solution

Found this in AWS documentation, which describes this behavior clearly.

The default amount of disk storage space allocated to two tables residing on different clusters can vary significantly, even when the tables are created using identical data definition language (DDL) statements and contain the same number of rows.

Because the block size in RedShift is 1MB, all columns will take up 1MB per column at a minimum. On top of this if the DISTSTYLE is EVEN it will be closer to one block per slice in the database. Since there is no way to tweak the block size in RedShift there is no way to reduce the size of an empty table since the table's size follows below formula:

  • For tables created using the KEY or EVEN distribution style:

Minimum Table Size = block_size (1MB) * (number_of_user_columns + 3 system columns) * number_of_populated_slices * number_of_table_segments.

  • For tables created using the ALL distribution style:

Minimum Table Size = block_size (1MB) * (number_of_user_columns + 3 system columns) * number_of_cluster_nodes * number_of_table_segments

https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cluster-storage-space/

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top