The solution is simple.
Add in the where clause : Item1>Item2
How to write a self-join query in hive for avoiding custom duplicates
Question
I have a requirement to get the pair of items by means of the value matched from a table with schema Item, value. I can achieve it by doing a self join but I'm getting duplicate values like depicted below
Item Value
---------------
item1 value1
item2 value1
item3 value3
item4 value2
When I do self join with distinct, i get values like
Item1 Item2 Value
------------------------
item1 item2 value1
item2 item1 value1
But for me, the above rows are duplicate and need only one of them. How can I achieve this? Appreciate your interest and help.
Note:
As I have my own definition of duplicated here in this requirement, i referred it as custom-duplicates
in the question. Please do suggest if they are called by a different name.
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow