I have a data frame in which I want to identify all pairs of rows whose time value t
differs by a fixed amount, say diff
.
In [8]: df.t
Out[8]:
0 143.082739
1 316.285739
2 344.315561
3 272.258814
4 137.052583
5 258.279331
6 114.069608
7 159.294883
8 150.112371
9 181.537183
...
For example, if diff = 22.2423
, then we would have a match between rows 4 and 7.
The obvious way to find all such matches is to iterate over each row and apply a filter to the data frame:
for t in df.t:
matches = df[abs(df.t - (t + diff)) < EPS]
# log matches
But as I have a log of values (10000+), this will be quite slow.
Further, I want to look and check to see if any differences of a multiple of diff
exist. So, for instance, rows 4 and 9 differ by 2 * diff
in my example. So my code takes a long time.
Does anyone have any suggestions on a more efficient technique for this?
Thanks in advance.
Edit: Thinking about it some more, the question boils down to finding an efficient way to find floating-point numbers contained in two lists/Series objects, to within some tolerance.
If I can do this, then I can simply compare df.t
, df.t - diff
, df.t - 2 * diff
, etc.