Question

I am using microsoft azure HDInsight. I have the data in the below format.

container/folder/year/month/date/file1.csv

container/folder/year/month/date/file2.csv . . .and so on.

I created an external table with partitions using below queries

'drop table partition;
CREATE EXTERNAL TABLE partition (id string, event timestamp and so on)
PARTITIONED BY (year INT, month INT, day INT)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE'

Got the output that the query submitted was successful and external table was created.

Then gave the Alter command to add the partition.

'ALTER TABLE partition ADD PARTITION(year=2014, month=1, day=1)
 LOCATION'wasb://$containerName@$storageAccountName.blob.core.windows.net/containerName/folderName/2014/01/01';'

Did not get any errors even here.

Then when i gave a simple select statement, i couldn't retrieve anything from the data files.

'select * from partition where year=2014 AND month=01 AND day=01 limit 10;'

and also tried

'select * from partition limit 10;'

Both the select statement did not return anything. Unable to figure what went wrong. Any suggestions please?

Was it helpful?

Solution

If your data files are stored in blob storage at a Uri like:

https://<account>.blob.core.windows.net/<container>/folderName/2014/01/01

then your wasb: uri needs to be:

wasb://<container>@<account>.blob.core.windows.net/folderName/2014/01/01

or if this is the default storage account/container for your cluster you can just use:

wasb:///folderName/2014/01/01

The extra containerName in your ALTER TABLE wasb: above might be throwing things off by pointing to paths that are effectively empty.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top