Question

I've been able to connect to a hiveserver (1) created with

hive --service hiveserver -v -p 10001

using the following java:

TSocket transport = new TSocket("hive.example.com", 10001);     
transport.setTimeout(999999999);
TBinaryProtocol protocol = new TBinaryProtocol(transport);
Client client = new ThriftHive.Client(protocol); 

transport.open();        
client.execute("SHOW TABLES");     
System.out.println(client.fetchOne());
transport.close();

Does an equivalent exist for hiveserver2, and if so what is it? The best I have found is a design proposal and I have yet to find any documentation. It looks like Cloudera has something set up for python here

Alternatively, what is the best way to run arbitrary Hive queries from Java? If it is relevant, I'm running on Hortonworks Data Platform 1.2

Was it helpful?

Solution 2

Have you considered using the HiveClient JDBC interface?

OTHER TIPS

The server process is expecting a SASL handshake from the client (which is why you can see TSaslServerTransport in the stack trace). Use TSaslClientTransport as a wrapper for your TSocket connection - you'll also need to pass an appropriate configured SaslClient instance to the constructor as well. Alternatively, you can modify the hive-site.xml to turn SASL authentication off.

<property><name>hive.server2.authentication</name><value>NOSASL</value></property>

After a bit of searching I managed to generate a Java thrift server and client for hiveserver 2 using cli_service.thrift found in Hortonworks Data Platform 1.2. If anyone is interested, you can find it in this tarball. Once I did that and imported the resulting files my IDE let me know that the Hiveserver2 client API was in the jars I had all along. Unfortunately though, I was not able to find it in the Apache hive jars so in Maven, adding this to you pom.xml doesn't quite cut it.

<dependency>
  <groupId>org.apache.hive</groupId>
  <artifactId>hive-service</artifactId>
  <version>0.10.0</version>
</dependency>

I added the 0.10.0.21 version of hive-server for the HDP 1.2 release to my repository and referenced that instead. Then I manually added all of it's dependencies to my pom.xml, including several other 0.10.0.21 hive jars from HDP. Since this process is somewhat tangential to my answer, I won't go into more detail about this unless someone asks for it.

Actually getting the API to work is another thing entirely. Through a combination of poking around the dozens of files generated by thrift, looking at cli_service.thrift, and looking at the Apache JDBC implementation (which is the only example I know of for writing against the Hiveserver2 thrift API), I came up with the following code which is almost a direct translation of the Hiveserver (1) example:

TSocket transport = new TSocket("hive.example.com", 10002);

transport.setTimeout(999999999);
TBinaryProtocol protocol = new TBinaryProtocol(transport);
TCLIService.Client client = new TCLIService.Client(protocol);  

transport.open();
TOpenSessionReq openReq = new TOpenSessionReq();
TOpenSessionResp openResp = client.OpenSession(openReq);
TSessionHandle sessHandle = openResp.getSessionHandle();

TExecuteStatementReq execReq = new TExecuteStatementReq(sessHandle, "SHOW TABLES");
TExecuteStatementResp execResp = client.ExecuteStatement(execReq);
TOperationHandle stmtHandle = execResp.getOperationHandle();

TFetchResultsReq fetchReq = new TFetchResultsReq(stmtHandle, TFetchOrientation.FETCH_FIRST, 1);
TFetchResultsResp resultsResp = client.FetchResults(fetchReq);

TRowSet resultsSet = resultsResp.getResults();
List<TRow> resultRows = resultsSet.getRows();
for(TRow resultRow : resultRows){
    resultRow.toString();
}

TCloseOperationReq closeReq = new TCloseOperationReq();
closeReq.setOperationHandle(stmtHandle);
client.CloseOperation(closeReq);
TCloseSessionReq closeConnectionReq = new TCloseSessionReq(sessHandle);
client.CloseSession(closeConnectionReq);

transport.close();

This was run against a Hiveserver2 server launched with:

export HIVE_SERVER2_THRIFT_PORT=10002;hive --service hiveserver2

Unfortunately, I'm getting the same behavior as when I tried to run a Hiveserver (1) client against Hiveserver2. transport.open() works, but the first request (in hiverserver2's case client.OpenSession() as opposed to hiveserver's (1) client.execute()) hangs. Wireshark shows that the TCP segment is ACK'd. There is no console output or anything written to the logs until I kill my client or the request times out, then I get:

13/03/14 11:15:33 ERROR server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
    at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
    at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
    at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
    at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
    at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
    at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
    ... 4 more
Caused by: java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(SocketInputStream.java:168)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
    at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
    ... 10 more

Someone seems to have run into a similar problem with a Python client. I lack the reputation to post a link, so if you want to see their (unresolved) question Google hiveserver2 thrift client python grokbase

Since it doesn't work, this is only a partial answer to my question. However, now that I have the API I will make a new question for getting it to work. I won't be able to link to that either, so if you want to see the follow up look in my user history.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top