Question

I am trying to map and insert data to cassandra using hive. Having only little experience in both, I tried to read and follow this: http://www.datastax.com/docs/datastax_enterprise3.0/solutions/about_hive

From my understanding, this is what I do:

  • I create an external table in hive, using CassandraStorageHandler, mapping to the right keyspace and to the cassandra column using :key, :column and :value special names:

    CREATE EXTERNAL TABLE test_table (myproductid INT , mydate TIMESTAMP , quantity BIGINT )
    STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
    WITH SERDEPROPERTIES ( "cassandra.ks.name" = "test",
    "cassandra.columns.mapping" = ":key,:column,:value");
    
  • I insert data in hive (nothing specific):

    INSERT OVERWRITE TABLE test_table
    SELECT anId, aTimestamp, COUNT(*)
    FROM myDataTable
    GROUP BY anId, aTimestamp;
    

Everything seems find from Hive, the table description is right and data appear in the table as expected. From Cassandra, DESCRIBE test_table shows:

CREATE TABLE test_table (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};

Data is not typed, and if I SELECT it, it appears as binary data:

0x3238373639 | 0x323031332d30312d30322030303a30303a3030 |   0x31

I tried to follow the documentation and add to SERDEPROPERTIES:

"cassandra.cf.validatorType" = "Int32Type, DateType, LongType"

and also :

"cassandra.cql3.type" = "int, timestamp, bigint"

But no change.

I read some more documentation, specifically about UDFs to convert data types, but I'd like to avoid this overhead if it is not essential.

Am I missing something? Am I missing a step or things are not done right?

Thanks!

Was it helpful?

Solution

"cassandra.cf.validatorType" = "Int32Type, DateType, LongType" should match the Column Family/Table column data type. Your test_table columns are blob, so you have to encode it back to binary when you insert data to the table.

CassandraStorageHandler is depreciated. You can refer to http://www.datastax.com/docs/datastax_enterprise3.2/solutions/about_hive for the latest CqlStorageHandler.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top