Question

I want to import tables which containing blob fields from oracle to hbase using sqoop. I used below command:

sqoop import --connect jdbc:oracle:thin:@10.2.152.241:1521:QASOURCE
  --username devsrc --table fnd_lobs --password devsrc --hbase-table fndlobs
  --column-family cf --hbase-row-key file_id --columns file_id,file_name,file_data,upload_date,expiration_date,program_name
  --hbase-create-table --as-sequencefile --verbose -m1

Here file_data is a blob column.

The query is executing, but the output of hbase does not display the file_data field.

Output of hbase:

 238883                            column=cf:program_name, timestamp=1386805999370, value=PER_ADDR_gb_UK.pdf                          
 238883                            column=cf:upload_date, timestamp=1386805999370, value=2004-07-01 04:33:40.0                        
 238884                            column=cf:file_name, timestamp=1386805999370, value=/appltop/115/per/11.5.0/patch/115/publisher/tem
                                   plates/PER_WS1_gb_UK.pdf                                                                           
 238884                            column=cf:program_name, timestamp=1386805999370, value=PER_WS1_gb_UK.pdf                           
 238884                            column=cf:upload_date, timestamp=1386805999370, value=2004-07-01 04:33:41.0                        
 238885                            column=cf:file_name, timestamp=1386805999370, value=/appltop/115/per/11.5.0/patch/115/publisher/tem
                                   plates/PER_WS3_gb_UK.pdf                                                                           
 238885                            column=cf:program_name, timestamp=1386805999370, value=PER_WS3_gb_UK.pdf                           
 238885                            column=cf:upload_date, timestamp=1386805999370, value=2004-07-01 04:33:49.0                        
 238886                            column=cf:file_name, timestamp=1386805999370, value=/appltop/115/per/11.5.0/patch/115/publisher/tem
                                   plates/PER_WS4_gb_UK.pdf       

The output is not displaying the file_data field. How to rectify this problem?

Was it helpful?

Solution

in Sqoop we have the option like --hbase-bulkload option, it enables to do bulkloading.

below is the command for sqoop bulk load

sqoop import -Dsqoop.hbase.add.row.key=true --connect jdbc:oracle:thin:@ipaddress:portnumber:DBName --username username --table fnd_lobs_dupl --password pwd --hbase-table blobs --column-family cf --columns file_id,file_name,file_data,upload_date --hbase-row-key file_id --hbase-create-table --inline-lob-limit 0 --hbase-bulkload --as-sequencefile -m1

OTHER TIPS

It depends on which version of sqoop you are using. for now, at version 1.4.4, you cannot import blob to hbase.

Sqoop’s direct mode does not support imports of BLOB, CLOB, or LONGVARBINARY columns.

But if you compile it from trunk following https://github.com/apache/sqoop/blob/trunk/COMPILING.txt

Apache Sqoop Trunk: https://github.com/apache/sqoop/tree/trunk

There is a new function. It will work because of loadLargeObjects(lobLoader); in HBaseBulkImportMapper.java

--hbase-bulkload     Enables bulk loading

To decrease the load on hbase, Sqoop can do bulk loading as opposed to direct writes. To use bulk loading, enable it using --hbase-bulkload.

Again, 1.4.4 does not have this argument.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top