Question

I am looking for a in-built String split function in Hive? E.g. if String is

A|B|C|D|E

then I want to have a function like array split(string input, char delimiter)

so that I get back [A,B,C,D,E].

Does such a in-built split function exist in Hive. I can only see regexp_extract and regexp_replace. I would love to see a indexOf() and split() string functions.

Thanks

Ajay

Was it helpful?

Solution

There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:

split(string str, string pat)
   Split str around pat (pat is a regular expression) 

In your case, the delimiter "|" has a special meaning as a regular expression, so it should be referred to as "\\|".

OTHER TIPS

Another interesting usecase for split in Hive is when, for example, a column ipname in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:

SELECT split(ipname,'[\.]')[0] FROM tablename;

Just a clarification on the answer given by Bkkbrad.

I tried this suggestion and it did not work for me.

For example,

split('aa|bb','\\|')

produced:

["","a","a","|","b","b",""]

But,

split('aa|bb','[|]')

produced the desired result:

["aa","bb"]

Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.

For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top