Question

I'm trying a simple INNER JOIN between two tables in Hive. I have one table of ORDERS and the other one is a LOG table. This is the structure of both:

ORDERS:

id_operacion string               
fecha string                
id_usuario string                
id_producto string                
unidades int                
id_bono string                
precio float            
precio_total float                
ip string

LOGS:

host STRING
identity STRING
user STRING
time STRING
request STRING
status STRING
size STRING
referer STRING
agent STRING

Not all the 'ip' in the order table have a value, some of them are null.

I would like to get the values order.id_usuario,order.id_producto,logs.host,logs.agent. In the cases that the order.ip=log.host.

I'm trying this query:

SELECT order.id_producto,order.id_usuario,log.host,log.agent
FROM order JOIN log ON order.ip=log.host;

I don't know why but the query responds me with all the agents the table of logs has, and not only with the ones that match with the IP address of both tables.

I hope I've explained well the issue. Any ideas?

Was it helpful?

Solution

It sounds like a problem with the join. It could be to do with the nulls. It could also be the data in the tables. Try this?

SELECT order.id_producto,order.id_usuario,log.host,log.agent
FROM order 
JOIN log ON order.ip=log.host
WHERE order.ip is not null
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top