Why did a Redshift UNLOAD increase precision of a float4 field in the output?

https://dba.stackexchange.com/questions/250485

redshift

15-02-2021
|

Question

I created a table in AWS Redshift such as

CREATE TABLE exampleTableName (
  id       int identity(1,1),
  accId    varchar(16) encode zstd,
  amount   float4,

  primary key(id)
)
distkey(accId)
interleaved sortkey(accId);

An example record in the table has the amount field's value as 120.12.

However, when I try to export the data by performing an UNLOAD, the resulting file (essentially a CSV) has additional precision for the field value.

The unload command:

UNLOAD ('SELECT * from exampleTableName')
TO 's3://bucket/prefixFile_'
IAM_ROLE 'XXX'
HEADER
ADDQUOTES
PARALLEL OFF
MAXFILESIZE AS 5gb
DELIMITER AS ',' 
GZIP;

The field value in the resulting output: 120.120003 (i.e. it has added 4 more decimal places, that aren't in the original dataset).

Why is this happening, and how can I prevent the additional precision (i.e. decimal places) from being output as part of the UNLOAD command?

Solution

Answer from AWS forum:

this happens when you use FLOAT for the decimal data because it cannot store values
of arbitrary precision. https://en.wikipedia.org/wiki/IEEE_754

I would generally always recommend using the DECIMAL datatype unless you have an
existing application that has an unchangeable requirement for FLOAT, e.g., 
calculation expects FLOAT and output cannot change.

Additionally, by using DECIMAL you are able to use our new AZ64 compression encoding
which will reduce the amount of storage needed and improve your query performance.
https://aws.amazon.com/about-aws/whats-new/2019/10/amazon-redshift-introduces-az64-a-new-compression-encoding-for-optimized-storage-and-high-query-performance/

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange