Logic rules representation and storage in DB column

https://dba.stackexchange.com/questions/259741

24-02-2021
|

Question

I have two kind of "elements". One that generates time series (lets call them generator element) and the other that based on some logic rules applied to the time series became "activated" (activable elements).

This simplification represent the time series

CREATE TABLE data.time_series
(
    generator_element_id uuid NOT NULL,
    epoch bigint NOT NULL, -- UNIX epoch
    level integer NOT NULL, -- values from 0 to 3
    -- other fields
    -- PRIMARY KEY (generator_element_id, epoch),
    -- FOREIG KEY TO the table with the generator_elements
)

The idea is that the activable elements are activated following rules like

Legend:

Generator element GE
Activable element AE

IF  (GE1 has level > 1 AND GE2 has level > 1) OR (GE3 has level > 2) for a given instant T1  THEN AE1 becames activated

I'm trying to find the best way to represent this logic rules in the database, as those will be configured by the user and may include multiple conbinations of "AND" and "OR" rules

The issue is precesily with this combination of using AND's and OR's, as if it would be only one type of those I could create another table with the relation between the different kind of elements, and the thresholds. But using a combination of ANDs and ORs i'm at loss..

My first thought was to store those relations using a plain string, similar to what I described before, but using the corresponding UUIDs for the generator elements, and then have a service that will process those rules, and decide to activate or not the elements.

I'm searching for alternatives to this approach as I don't think that my approach could scalate well enough, nor I like it personally...

For context I'm using PostgreSQL 12.0 with docker

Edit:

Clarifying about the epochs as it was not clear. The epoch value is not included in the logic rule "definition", that only includes the elements and its levels (actually it includes other ids but to keep simple just this two variables).

As for the actual calculation the records will be retrieved and separated in "bins" for each epoch. This way we apply the rule over the records for that epoch (assuming that there will always be records for all the generator elements for the given epoch)

Also adding some test data as an example. In this case I'll use one epoch value 1581638400 that corresponds to 2020-02-14 00:00:00Z

INSERT INTO data.time_series
VALUES (
    '00000000-0000-0000-0000-000000000001', -- uuid for GE1
    1, --level
    1581638400 -- epoch for 2020-02-14 00:00:00Z
)

INSERT INTO data.time_series
VALUES (
    '00000000-0000-0000-0000-000000000002', -- uuid GE2
    2, --level
    1581638400 -- epoch for 2020-02-14 00:00:00Z
)


INSERT INTO data.time_series
VALUES (
    '00000000-0000-0000-0000-000000000003', -- uuid GE3
    0, --level
    1581638400 -- epoch for 2020-02-14 00:00:00Z
)

Here the rule specified as my proposal would be for the activable element AE1:

-- Omiting the 0s for simplicity
"('000...01' > 1 AND '000...02' > 1) OR  ('000...03' > 2)"

With the dummy data, the result will be that AE1 is activated as the first part of the rule is completed

But with the way that the rule is saved in my proposal, i'll have make a specific processing system for that...

Any help is apreciated

Solution

I'll write this as an answer, even though it's just a few ideas. Breaking down the logical rules in a relational form is probably a waste of time, I would simply store the rules using XML or JSON. A parser friendly format like:

{ OR 
    { AND 
          { GE1 has level > 1 }
          { GE2 has level > 1 }
    }
    { GE3 has level > 2 }
}

is maybe beneficial. Note that this is just an example

This format can easily be transformed into SQL like:

WHERE ((ge=1 AND level>1) AND (ge=2 AND level=2))
   OR  (ge=3 AND level>2)

The ge attribute can be constructed as:

WITH T (uid, level, epoch, ge) AS (
    SELECT uid, level, epoch
         , row_number() over (partition by epoch
                              order by uid) as ge
    FROM data.time_series
)

All and all you would end up with a query like:

WITH T (uid, level, epoch, ge) AS (
    SELECT uid, level, epoch
         , row_number() over (partition by epoch
                              order by uid) as ge
    FROM data.time_series
)
SELECT uid, level, epoch
FROM T
WHERE ((ge=1 AND level>1) AND (ge=2 AND level=2))
   OR  (ge=3 AND level>2)

The idea is hence to read all the rules into memory, when a rule is about to be evaluated it is transformed into SQL, and the query is executed against the time_series table.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange