Bulk updating existing rows in Redshift

Question 1

I'm having to do exactly this for a project right now. The method I'm using involves 3 steps:

1.

Run an update that addresses changed fields (I'm updating whether or not the fields have changed, but you can certainly qualify that):

update table1 set col1=s.col1, col2=s.col2,...
from table1 t
 join stagetable s on s.primkey=t.primkey;

2.

Run an insert that addresses new records:

insert into table1
select s.* 
from stagetable s 
 left outer join table1 t on s.primkey=t.primkey
where t.primkey is null;

3.

Mark rows no longer in the source as inactive (our reporting tool uses views that filter inactive records):

update table1 
set is_active_flag='N', last_updated=sysdate
from table1 t
 left outer join stagetable s on s.primkey=t.primkey
where s.primkey is null;

Question 2

Is posible to create a temp table. In redshift is better to delete and insert the record. Check this doc

http://docs.aws.amazon.com/redshift/latest/dg/merge-replacing-existing-rows.html

Question 3

Here is the fully working approach for Redshift.

Assumptions:

A.Data available in S3 in gunzip format with '|' separated columns, may have some garbage data see maxerror.

B.Sales fact with two dimension tables to keep it simple (TIME and SKU(SKU may have many groups and categories))).

C.You have Sales table like this.

CREATE TABLE sales (
 sku_id int encode zstd,
 date_id int encode zstd,
quantity numeric(10,2) encode delta32k,
);

1)Create Staging table, that should resemble with your Online Table used by app/apps.

CREATE TABLE stg_sales_onetime (
 sku_number varchar(255) encode zstd,
 time varchar(255) encode zstd,
 qty_str varchar(20) encode zstd,
 quantity numeric(10,2) encode delta32k,
 sku_id int encode zstd,
 date_id int encode zstd
);

2)Copy data from S3( this could done using SSH).

copy stg_sales_onetime (sku_number,time,qty_str) from 
  's3://<buecket_name>/<full_file_path>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' delimiter '|' ignoreheader 1 maxerror as 1000 gzip;

3)This step is optional, in case you don't have good formatted data, this a your transformation step if needed(as converting String(12.555654) quantity to Number(12.56))

update  stg_sales_onetime set quantity=convert(decimal(10,2),qty_str);

4)Populating the correct IDs from dimension table.

update  stg_sales_onetime set sku_id=<your_sku_demesion_table>.sku_id  from <your_sku_demesion_table> where stg_sales_onetime.sku_number=<your_sku_demesion_table>.sku_number;
update  stg_sales_onetime set time_id=<your_time_demesion_table>.time_id  from <your_time_demesion_table> where stg_sales_onetime.time=<your_time_demesion_table>.time;

5)Finally you have data good to go from Staging to Online Sales table.

insert into sales(sku_id,time_id,quantity)  select sku_id,time_id,quantity from stg_sales_onetime;