Question

I am trying to use SQLite3 to form correlations between two tables that have similar data. Here's what I have so far:

CREATE TABLE a (date TEXT, user TEXT, ip TEXT);
CREATE INDEX a_index ON a (date, user, ip);
CREATE TABLE b (date TEXT, ip TEXT);
CREATE UNIQUE INDEX b_index ON b (date, ip);

INSERT INTO a VALUES('2014-03-01 03:15:16', 'a', '127.0.0.1');
INSERT INTO a VALUES('2014-03-01 03:15:18', 'b', '127.0.0.2');
INSERT INTO a VALUES('2014-03-01 03:15:21', 'c', '127.0.0.3');
INSERT INTO a VALUES('2014-03-01 03:15:21', 'd', '127.0.0.4');
INSERT INTO a VALUES('2014-03-01 03:15:29', 'e', '127.0.0.5');
INSERT INTO a VALUES('2014-03-01 03:16:32', 'f', '127.0.0.6');

INSERT INTO b VALUES('2014-03-01 03:15:16', '127.0.0.1');
INSERT INTO b VALUES('2014-03-01 03:15:17', '127.0.0.1');
INSERT INTO b VALUES('2014-03-01 03:15:19', '127.0.0.1');
INSERT INTO b VALUES('2014-03-01 03:15:22', '127.0.0.4');
INSERT INTO b VALUES('2014-03-01 03:16:32', '127.0.0.5');

I know I could simply use an inner join to combine these two sets, like this:

SELECT *
FROM a
JOIN b ON a.ip = b.ip AND a.date = b.date;

and it would return

2014-03-01 03:15:16|a|127.0.0.1|2014-03-01 03:15:16|127.0.0.1

as expected. However, due to clock drift in time recording. I would like to match any possible entries +- 3 seconds from each other. In this case, I have used:

SELECT *
FROM a
JOIN b ON a.ip = b.ip AND a.date BETWEEN DATETIME(b.date, '-3 seconds') AND DATETIME(b.date, '+3 seconds');

This works, although it's returning more entries than I wanted. Instead of the following:

2014-03-01 03:15:16|a|127.0.0.1|2014-03-01 03:15:16|127.0.0.1
2014-03-01 03:15:16|a|127.0.0.1|2014-03-01 03:15:17|127.0.0.1
2014-03-01 03:15:16|a|127.0.0.1|2014-03-01 03:15:19|127.0.0.1
2014-03-01 03:15:21|d|127.0.0.4|2014-03-01 03:15:22|127.0.0.4

I am wondering if it's possible to return only one entry max per entry in the a table if a matching entry is found in the b table. So the expected result would look something like this:

2014-03-01 03:15:16|a|127.0.0.1|2014-03-01 03:15:16|127.0.0.1
2014-03-01 03:15:21|d|127.0.0.4|2014-03-01 03:15:22|127.0.0.4

How should / could this be accomplished?

Was it helpful?

Solution

Further to my comment, above (“Explicitly SELECT the fields you want, leave out the 2nd timestamp (which you don't appear to be interested in anyway) that's making the “duplicate” rows different, and use SELECT DISTINCT so that you only get unique rows.”), you could try the following:

SELECT DISTINCT a.date, a.user, a.ip 
  FROM a JOIN b ON a.ip = b.ip 
  AND a.date 
    BETWEEN DATETIME(b.date, '-3 seconds')
      AND DATETIME(b.date, '+3 seconds');
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top