문제

I have two arrays of string in Hive like

{'value1','value2','value3'}
{'value1', 'value2'}

I want to merge arrays without duplicates, result:

{'value1','value2','value3'}

How I can do it in hive?

도움이 되었습니까?

해결책 2

You will need a UDF for this. Klout has a bunch of opensource HivUDFS under the package brickhouse. Here is the github link. They have a bunch of UDF's that exactly serves your purpose. Download,build and add the JAR. Here is an example

CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';

select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;

OK
["d","b","c","a"]

다른 팁

A native solution could be that:

SELECT id, collect_set(item)
FROM table
LATERAL VIEW explode(list) lTable AS item
GROUP BY id;

Firstly explode with lateralview, and next group by and remove duplicates with collect_set.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top