How to select first record prior/after a given timestamp in KDB?
سؤال
I am currently just pulling in all records 1min leading up to the timestamp (e.g. if the timestamp I'm interested in is 2014.04.14T09:30
):
select from Prices where timestamp within 2014.04.14T09:29 2014.04.14T09:30, stock=`GOOG
However, this is clearly not very robust. Sometimes the previous record may be at 09:25am
and then the query returns nothing. Sometimes the query may return hundreds of records if there have been a lot of price changes, even though all I need is the last record returned.
I know this can be done with an asof join, but want to avoid it for the time being as Prices
is simply too big at present.
I am also interested in doing the same, but in finding the first record after a given timestamp.
Note also that Prices
is a splayed table
المحلول
Select last record before the given timestamp:
q)select from Price where stock=`GOOG,i=last i,timestamp<2014.04.14T09:30
Select first record after the given timestamp:
q)select from Price where stock=`GOOG,i=first i,timestamp>2014.04.14T09:30
نصائح أخرى
Use asof
or aj
to get the performance kdb+ is known for. The bigger Prices is, the more reason for doing so.
I would question your logic for avoiding aj
. aj
and asof
use the bin
operator which is binary search and hence more performant than scanning the timestamp column.
Let's create your table and run the solution from the other answer:
Prices:([]stock:`g#1000000?`GOOG,9?`4;timestamp:asc 2014.04.14+1000000?0t;price:1000000?100f,size:1000000?100j)
q)\t do[1000;select from Prices where timestamp<2014.04.14T09:30,stock=`GOOG,i=last i]
10205
We can make this a lot better by reordering the constraints:
q)\t do[1000;select from Prices where stock=`GOOG,timestamp<2014.04.14T09:30,i=last i]
2030
But nothing will beat this:
q)\t do[1000;Prices asof `stock`timestamp!(`GOOG;2014.04.14D09:30)]
9
By the way, you are using datetime in your question, which is deprecated, so I've replaced it with timestamp. This has no impact on performance.
Few more things to remember while using aj
:
- in-memory prices - the table should be
`g#sym
andtime
sorted withinsym
- on-disk prices -
`p#sym
andtime
sorted withinsym
Also in case of partitioned/splayed tables, using the where
constraints (except the date
in the date-partitioned table) can severely impact the performance.