Question
Is there a programmatic way to validate HiveQL statements for errors like basic syntax mistakes? I'd like to check statements before sending them off to Elastic Map Reduce in order to save debugging time.
Solution
Yes there is!
It's pretty easy actually.
Steps:
1. Get a hive thrift client in your language.
I'm in ruby so I use this wrapper - https://github.com/forward/rbhive (gem install rbhive)
If you're not in ruby, you can download the hive source and run thrift on the included thrift configuration files to generate client code in most languages.
2. Connect to hive on port 10001 and run a describe query
In ruby this looks like this:
RBHive.connect(host, port) do |connection|
connection.fetch("describe select * from categories limit 10")
end
If the query is invalid the client will throw an exception with details of why the syntax is invalid. Describe will return you a query tree if the syntax IS valid (which you can ignore in this case)
Hope that helps.
OTHER TIPS
"describe select * from categories limit 10" didn't work for me.
Maybe this is related to the Hive version one is using. I'm using Hive 0.8.1.4
After doing some research I found a similar solution to the one Matthew Rathbone provided:
Hive provides an EXPLAIN command that shows the execution plan for a query. The syntax for this statement is as follows:
EXPLAIN [EXTENDED] query
So for everyone who's also using rbhive:
RBHive.connect(host, port) do |c|
c.execute("explain select * from categories limit 10")
end
Note that you have to substitute c.fetch with c.execute, since explain won't return any results if it succeeds => rbhive will throw an exception no matter if your syntax is correct or not.
execute will throw an exception if you've got an syntax error or if the table / column you are querying doesn't exist. If everything is fine, no exception is thrown but also you'll receive no results, which is not an evil thing
In the latest version hive 2.0 comes with hplsql tool which allows us to validate hive commands without actually running them.
Configuration: add the below XML in hive/conf folder and restart hive
https://github.com/apache/hive/blob/master/hplsql/src/main/resources/hplsql-site.xml
To Run the hplsql and validate the query , please use the below command: To validate Singe Query
hplsql -offline -trace -e 'select * from sample'
(or) To Validate Entire File
hplsql -offline -trace -f samplehql.sql
If the query syntax is correct , the response from hplsql would be something like this:
Ln:1 SELECT // type
Ln:1 select * from sample // command
Ln:1 Not executed - offline mode set // execution status
if the query Syntax is wrong , the syntax issue in the query will be reported
If the hive version is older, we need to manually place the hplsql jars inside the hive/lib and proceed.