Question

I am working with visitor log data and need to summarize it by IP address. The data looks like this:

id        | ip_address     | type     | message  | ...
----------+----------------+----------+----------------
1         | 1.2.3.4        | purchase | ...
2         | 1.2.3.4        | visit    | ...
3         | 3.3.3.3        | visit    | ...
4         | 3.3.3.3        | purchase | ...
5         | 4.4.4.4        | visit    | ...
6         | 4.4.4.4        | visit    | ...

And should summarize with:

type="purchase" DESC, type="visit" DESC, id DESC

The yield:

chosenid  | ip_address     | type     | message  | ...
----------+----------------+----------+----------------
1         | 1.2.3.4        | purchase | ...
4         | 3.3.3.3        | purchase | ...
6         | 4.4.4.4        | visit    | ...

Is there an elegant way to get this data?


An ugly approach follows:

set @row_num = 0; 
CREATE TEMPORARY TABLE IF NOT EXISTS tt AS 
SELECT *,@row_num:=@row_num+1 as row_index FROM log ORDER BY type="purchase" DESC, type="visit" DESC, id DESC
ORDER BY rating desc;

Then get the minimum row_index and id for each ip_address (https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column)

Then join those id's back to the original table

Was it helpful?

Solution

I think this should be what you need:

SELECT yourtable.*
FROM
  yourtable INNER JOIN (
    SELECT   ip_address,
             MAX(CASE WHEN type='purchase' THEN id END) max_purchase,
             MAX(CASE WHEN type='visit' THEN id END) max_visit
    FROM     yourtable
    GROUP BY ip_address) m
  ON yourtable.id = COALESCE(max_purchase, max_visit)

Please see fiddle here.

My subquery will return the maximum purchase id (or null if there's no purchase) and the maximum visit id. Then I'm joining the table with COALESCE, if max_purchase is not null the join will be on max_purchase, otherwise it will be on max_visit.

OTHER TIPS

You can use Bill Karwin's approach here:

SELECT t1.*
FROM (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t1
LEFT JOIN (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t2
  ON t1.ip_address = t2.ip_address
  AND (t2.is_purchase > t1.is_purchase
     OR (t2.is_purchase = t1.is_purchase AND t2.id > t1.id))
WHERE t2.id IS NULL 

SQL Fiddle here

The following query gets the most recent id based on your rules by using a correlated subquery:

select t.ip_adddress,
       (select t2.id
        from table t2
        where t2.ip_address = t1.ip_address
        order by type = 'purchase' desc, id desc
        limit 1
       ) as mostrecent
from (select distinct t.ip_address
      from table t
     ) t;

The idea is to sort the data first by purchases (with id descending too) and then by visits and choose the first one in the list. If you have a table of ipaddresses, then you don't need the distinct subquery. Just use that table instead.

To get the final results, we can join to this or use in or exists. This uses in.

select t.*
from table t join
     (select id, (select t2.id
                  from table t2
                  where t2.ip_address = t1.ip_address
                  order by type = 'purchase' desc, id desc
                  limit 1
                 ) as mostrecent
      from (select distinct t.ip_address
            from table t
           ) t
     ) ids
     on t.id = ids.mostrecent;

This query will work best if there is an index on table(ip_address, type, id).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top