Question

I'm trying to run a hive query in Cloudera Hue interface, and it works fine for a few hundred records. When I run it on a bigger dataset, it fails though. I tried searching it on the internet but it looks like there are a lot of similar errors but not the exact solution that I'm looking for. I'm using redexp_replace in my hive query, I don't think this is causing any exceptions (My impression is that it can handle string and NULL types easily)

The error I get is java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 12

UPDATE: This is the record causing the problem:

columnA:ReadData (or ListDirectory)

columnB:ListDirectory)

columnC: NULL

columnD: NULL

My query: regexp_replace(columnA , columnB , "") as columnA, column B regexp_replace(columnC , columnD , "") as columnC,

Please let me know where I'm going wrong.

Here is the interesting part of the log... *....[record]...

  at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
                              at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
                              at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
                              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
                              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
                              at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
                              at java.security.AccessController.doPrivileged(Native Method)
                              at javax.security.auth.Subject.doAs(Subject.java:396)
                              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
                              at org.apache.hadoop.mapred.Child.main(Child.java:262)
                              Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: DFSOutputStream is closed
                              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:620)
                              at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
                              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
                              at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
                              at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
                              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
                              at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
                              at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
                              at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
                              at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)
                              ... 9 more
                              Caused by: java.io.IOException: DFSOutputStream is closed
                              at org.apache.hadoop.hdfs.DFSOutputStream.isClosed(DFSOutputStream.java:1239)
                              at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1407)
                              at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:161)
                              at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:104)
                              at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:90)
                              at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
                              at java.io.DataOutputStream.write(DataOutputStream.java:90)
                              at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:86)
                              at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:606)
                              ... 18 more

                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing...
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarded 90478 rows
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 90478 rows
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarded 90478 rows
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing...
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 forwarded 0 rows
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:90478
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
                              2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 Close done
                              2013-05-31 16:35:20,090 INFO ExecMapper: ExecMapper: processed 90477 rows: used memory = 10815536
                              2013-05-31 16:35:20,097 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
                              2013-05-31 16:35:20,099 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Filesystem closed
                              2013-05-31 16:35:20,099 WARN org.apache.hadoop.mapred.Child: Error running child
                              java.io.IOException: Filesystem closed
                              at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:552)
                              at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)
                              at java.io.FilterInputStream.close(FilterInputStream.java:155)
                              at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
                              at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:195)
                              at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72)
                              at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
                              at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:273)
                              at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:223)
                              at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
                              at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
                              at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
                              at java.security.AccessController.doPrivileged(Native Method)
                              at javax.security.auth.Subject.doAs(Subject.java:396)
                              at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
                              at org.apache.hadoop.mapred.Child.main(Child.java:262)
                              2013-05-31 16:35:20,102 WARN org.apache.hadoop.mapred.Task: Parent died.  Exiting attempt_201305300036_0011_m_000000_1
Was it helpful?

Solution

I don't fully understand what you're trying to accomplish, but I can tell you that the second argument to regex_replace needs to be a valid regular expression.

You are passing values that are not valid expressions. Specifically, ListDirectory) since ) is a reserved symbol in regular expressions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top