How to create date ranges of events coming from an API where open state represents the date range and close state the gaps between the ranges?

dba.stackexchange https://dba.stackexchange.com/questions/265527

Question

I would like to store date/time ranges in PostgreSQL of some arbitrary events that have a state of open or closed and a date/time of when the state changed.

Events coming from an API will have the following data for a single event:

Request 1:
{
  id: 1,
  state: 'open',
  date: '2020-02-17T10:00:00Z'
}

Request 2:
{
  id: 1,
  state: 'close',
  date: '2020-02-17T10:10:00Z'
}

Request 3:
{
  id: 1,
  state: 'open',
  date: '2020-02-17T11:00:00Z'
}

The requests can come in any order, so future dates can come before past dates or states are not always open -> close -> open -> close. For example the API could send the open states for the same event one after the other.

I was thinking of using the tstzrange for saving this data in the database in the following form:

CREATE TABLE events (
  id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  event_id int,
  validity tstzrange
);

The open states are captured in the validity column and close states are the gaps between the validity columns. For example if a single event has the following states and date/times (using only times for simplicity) in this order:

state          date/time
close          20:30
open           18:00
close          16:00
open           15:00
open           20:00
close          19:30

The validity rows should look like:

id          event_id           validity
1           1                  [15:00, 16:00)
2           1                  [18:00, 19:30)
3           1                  [20:00, 20:30)

Event 1 has the state open between 15:00 - 16:00, state close between 16:00 - 18:00, state open between 18:00 - 19:30 and so on.

To illustrate this visually:

validity visualized

My problem is that the events are not coming in order, so I don't know how to manipulate the individual validity columns to insert / update these rows.

Was it helpful?

Solution

I figured it out how to insert/update this kind of data into PostgreSQL date ranges from an algorithmic perspective that this may be helpful to you too.

As this data is a time series, ultimately when we know the full dataset, there can't be any overlaps between the open, close states of a single event over time.

Keeping the algorithm flexible enough, for when we don't know the full range of events and open/close states there are a couple of rules that needs to be taken into consideration:

If there's a validity column containing the event's date and:

  1. if the event state is open then:

1.1. if the containing validity has a negative infinity = update the lower bound of the containing validity to the event's date.

1.2. if there is another validity row strictly before the containing validity, meaning that the containing validity lower bound equals to another validity upper bound = update the lower bound of the containing validity to the event's date.

1.3. if the above points are false = split the containing validity into two separate rows:

1.3.1. update the containing validity lower bound to the event's date. 1.3.2. insert a new row where the lower bound is the containing validity's original lower bound and the upper bound is the event's date.

  1. if the event state is close then (reverse of the open state rules):

2.1. if the containing validity has a positive infinity = update the upper bound of the containing validity to the event's date.

2.2. if there is another validity row strictly after the containing validity, meaning that the containing validity upper bound equals to another validity lower bound = update the upper bound of the containing validity to the event's date.

2.3. if the above points are false = split the containing validity into two separate rows:

2.3.1. update the containing validity upper bound to the event's date.

2.3.2. insert a new row where the upper bound is the containing validity's original upper bound and the lower is the event's date.

If there's no validity range containing the event's date then:

  1. if the event state is open then insert a new validity row with the lower bound set to the event's date and the upper bound set to positive infinity.

  2. if the event state is close then insert a new validity row with the upper bound set to the event's date and the lower bound set to negative infinity.

These rules gives you the best possible picture of the event states where the open state are captured in the validity columns itself and the close state is the gaps between the validity rows.

Once there are more and more data coming in, either from historical data or current ones, the validity columns will give you a clearer picture of the event states over time.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top