Question

Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.

Was it helpful?

Solution

Yes there is!

It's pretty easy actually.

Steps:

1. Get a hive thrift client in your language.

I'm in ruby so I use this wrapper - https://github.com/forward/rbhive (gem install rbhive)

If you're not in ruby, you can download the hive source and run thrift on the included thrift configuration files to generate client code in most languages.

2. Connect to hive on port 10001 and run a describe query

In ruby this looks like this:

RBHive.connect(host, port) do |connection|
    connection.fetch("describe select * from categories limit 10")
end

If the query is invalid the client will throw an exception with details of why the syntax is invalid. Describe will return you a query tree if the syntax IS valid (which you can ignore in this case)

Hope that helps.

OTHER TIPS

"describe select * from categories limit 10" didn't work for me.

Maybe this is related to the Hive version one is using. I'm using Hive 0.8.1.4

After doing some research I found a similar solution to the one Matthew Rathbone provided:

Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:

EXPLAIN [EXTENDED] query

So for everyone who's also using rbhive:

RBHive.connect(host, port) do |c|
    c.execute("explain select * from categories limit 10")
end

Note that you have to substitute c.fetch with c.execute, since explain won't return any results if it succeeds => rbhive will throw an exception no matter if your syntax is correct or not.

execute will throw an exception if you've got an syntax error or if the table / column you are querying doesn't exist. If everything is fine, no exception is thrown but also you'll receive no results, which is not an evil thing

In the latest version hive 2.0 comes with hplsql tool which allows us to validate hive commands without actually running them.

Configuration: add the below XML in hive/conf folder and restart hive

https://github.com/apache/hive/blob/master/hplsql/src/main/resources/hplsql-site.xml

To Run the hplsql and validate the query , please use the below command: To validate Singe Query

hplsql -offline -trace -e 'select * from sample'

(or) To Validate Entire File

hplsql -offline -trace -f samplehql.sql

If the query syntax is correct , the response from hplsql would be something like this:

Ln:1 SELECT // type
Ln:1 select * from sample // command
Ln:1 Not executed - offline mode set // execution status

if the query Syntax is wrong , the syntax issue in the query will be reported

If the hive version is older, we need to manually place the hplsql jars inside the hive/lib and proceed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top