سؤال

I am currently just pulling in all records 1min leading up to the timestamp (e.g. if the timestamp I'm interested in is 2014.04.14T09:30):

select from Prices where timestamp within 2014.04.14T09:29 2014.04.14T09:30, stock=`GOOG

However, this is clearly not very robust. Sometimes the previous record may be at 09:25am and then the query returns nothing. Sometimes the query may return hundreds of records if there have been a lot of price changes, even though all I need is the last record returned.

I know this can be done with an asof join, but want to avoid it for the time being as Prices is simply too big at present.

I am also interested in doing the same, but in finding the first record after a given timestamp.

Note also that Prices is a splayed table

هل كانت مفيدة؟

المحلول

Select last record before the given timestamp:

q)select from Price where stock=`GOOG,i=last i,timestamp<2014.04.14T09:30

Select first record after the given timestamp:

q)select from Price where stock=`GOOG,i=first i,timestamp>2014.04.14T09:30

نصائح أخرى

Use asof or aj to get the performance kdb+ is known for. The bigger Prices is, the more reason for doing so.

I would question your logic for avoiding aj. aj and asof use the bin operator which is binary search and hence more performant than scanning the timestamp column.

Let's create your table and run the solution from the other answer:

Prices:([]stock:`g#1000000?`GOOG,9?`4;timestamp:asc 2014.04.14+1000000?0t;price:1000000?100f,size:1000000?100j)
q)\t do[1000;select from Prices where timestamp<2014.04.14T09:30,stock=`GOOG,i=last i]
10205

We can make this a lot better by reordering the constraints:

q)\t do[1000;select from Prices where stock=`GOOG,timestamp<2014.04.14T09:30,i=last i]
2030

But nothing will beat this:

q)\t do[1000;Prices asof `stock`timestamp!(`GOOG;2014.04.14D09:30)]
9

By the way, you are using datetime in your question, which is deprecated, so I've replaced it with timestamp. This has no impact on performance.

Few more things to remember while using aj:

  • in-memory prices - the table should be `g#sym and time sorted within sym
  • on-disk prices - `p#sym and time sorted within sym

Also in case of partitioned/splayed tables, using the where constraints (except the date in the date-partitioned table) can severely impact the performance.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top