Question

I am trying to load 73 local files onto redshift. The data do not have common delimiters such as comma or tab. Instead, the delimiter is 13 spaces. Is there a way to treat these spaces as delimiters?

I am using the same example from AWS documentation. The actual data looks like the following:

1          ToyotaPark          Bridgeview          IL
2          ColumbusCrewStadium          Columbus          OH
3          RFKStadium          Washington          DC
4          CommunityAmericaBallpark          KansasCity          KS
5          GilletteStadium          Foxborough          MA
6          NewYorkGiantsStadium          EastRutherford          NJ
7          BMOField          Toronto          ON
8          TheHomeDepotCenter          Carson          CA
9          Dick'sSportingGoodsPark          CommerceCity          CO
10          PizzaHutPark          Frisco          TX

Sample code:

create table venue_new(
    venueid smallint not null,
    venuename varchar(100) not null,
    venuecity varchar(30),
    venuestate char(2),
    venueseats integer not null default '1000');

copy venue_new(venueid, venuename, venuecity, venuestate) 
from 's3://mybucket/data/venue_noseats.txt' 
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'
delimiter '          ';

The actual data has about 80 columns with different width. The good thing is that there is no space in each data element. Instead of specifying fixed width for each column. Is there an easier way to delimit the data by 13 spaces?

Was it helpful?

Solution

The copy command only allows for single character delimiters, so you cannot import this data directly into your target table. Instead, you will need to create a staging table:

create table stage_venue (venue_record varchar(200));

Run your copy command (assuming your data does not have the pipe, |, character in it):

copy stage_venue from 's3://mybucket/data/venue_noseats.txt' credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Then use the split command to populated your target table (note that I counted only 10 spaces and not 13 in your sample):

insert into venue_new (venueid, venuename, venuecity, venuestate), select split_part(venue_record,'          ',1),split_part(venue_record,'          ',2),split_part(venue_record,'          ',3),split_part(venue_record,'          ',4) from stage_venue;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top