Question

I've been trying to achieve this without success. I tried to use the included hive disitribution on dse with shark, however, shark provides with a patched up and older version of Hive (0.9 I believe), which makes shark execution impossible due to incompatibilities. I also tried to use the patched up hive version from shark instead of dse's, recycling the dse hive configuration (in order to make available CFS to shark's hive distribution) only to discover a long list of dependencies from the full dse classpath (hive, cassandra, hadoop, etc.).

It is possible to achieve this with C* by following the instructions on this blog.

Am I being stubborn by trying to use CFS? Is there a way with or without CFS on dse?

Thanks!

Here are some shark-env.sh highlights:

export HIVE_HOME="/home/cassserv/hive-0.9.0-bin/" #choosing this when using hive distro.
#export HIVE_HOME="/usr/share/dse/hive/" #choosing this when using dse distro.
export HIVE_CONF_DIR="/home/cassserv/hive-0.9.0-bin/conf" #edited dse hive-site.xml conf file
#export HIVE_CONF_DIR="/etc/dse/hive" #original dse hive-site.xml conf file

Edited hive-site.xml highlights:

<property>
    <name>hive.hwi.war.file</name>
    <!--<value>lib/hive-hwi.war</value>-->
    <value>lib/hive-hwi-0.9.0-shark-0.8.1.war</value><!--edited to use sharks distro-->
    <description>This sets the path to the HWI war file, relative to ${HIVE_HOME}</description>
  </property>

<property>
    <name>hadoop.bin.path</name>
    <!--<value>${dse.bin}/dse hadoop</value>-->
    <value>/usr/share/dse hadoop</value><!--edited to override variable-->
  </property>

Here's shark's output while trying to use sharks patched hive distro with dse's hive configuration. That missing class is in dse.jar file:

Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore class not found)

I'm trying to figure out if I can do something like this in the edited hive-site.xml:

<property>
<name>fs.cfs.impl</name>
<value>org.apache.cassandra.hadoop.fs.CassandraFileSystem</value>
</property>
<property>
    <name>hive.metastore.rawstore.impl</name>
    <!--<value>com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore</value>--> <value>org.apache.hadoop.hive.metastore.ObjectStore</value>
    <description>Use the Apache Cassandra Hive RawStore implementation</description>
  </property>

in order to remove any dependency from the dse libraries. Also, might not use dse's hadoop distro.

Was it helpful?

Solution

DSE 4.5 has Spark and Shark 0.9 integrated. You don't need to setup anything, it works out-of-the-box the same way pig/hive worked before.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top