Time-based Task scheduler engine

https://softwareengineering.stackexchange.com/questions/378999

13-02-2021
|

Pregunta

I am working on an enterprise level application. I have an event table in the database. I need to update my events' states in the database on a precise date and time. There may be 200 milliseconds delays max. Since my event table will grow over time, I think long-polling may be a performance issue. In my application, which is net.core web API, I want to build a structure that accepts ScheduledTask and somehow sleeps until next TimeToExecute time to come.

public class ScheduledTask
{
    public string CurrentState {get; set;}
    public string NextState {get; set;}
    public DateTime TimeToExecute {get; set;}
}

So to overcome my situation I would use;

IHostedService; With this approach, I need no 3rd party tool. I can inject the IHostedService and use it. I can check the list in every 150 ms and execute the update process asynchronously.
HangFire; As I searched, this tool works just like the IHostedService more or less.

My question is as my event table grows and the number of events grows(like 500 per minute) do I have any other architecture options other than above options?

Solución

Trying to do this with any kind of lightweight process (asynchronous processes, threads or anything else like that) is a recipe for failure unless you're in an environment that gives you hard guarantees about when the code will be executed. It also doesn't help that if your program dies and is restarted, any pending status updates will be lost unless you have a way to reconstitute them and make updates that should have happened while the program wasn't running.

The advance notice you get of a new status and when it takes effect are data, and what better way to deal with it than to put it in the database? You were, after all, going to do a transaction to update the status anyway once your LWP decided it was time. Doing so gets you all of the ACIDity your database has to offer and will scale to whatever your database can store rather than the number of jobs the LWP environment you're using can muster.

Doing this requires that you break the status of your events out into a separate table that lists the time ranges and what status is in effect during each one.

This example is nominally PostgreSQL, but you can do the equivalent in any relational database:

CREATE TABLE event (
    id BIGSERIAL PRIMARY KEY,
    -- All other columns except the status
);

CREATE TABLE event_status (
    event BIGINT REFERENCES events(id),  -- Which event
    times TSTZRANGE,    -- Range of times this status is valid
    status INTEGER      -- The status (doesn't have to be INTEGER)
);

That's the table structure.* You'll need to set up constraints and triggers to make sure that rows in event_status for a given event express a contiguous range of times that covers the dawn of time to infinity and that insertion of a new row properly adjusts the end times of surrounding rows.

With that in place, getting the current status is a JOIN of the two tables that weeds out rows that are either not in effect anymore or not in effect yet. This works nicely in a view:

CREATE VIEW event_current_status AS
    SELECT event.*, event_status.status
    FROM event JOIN event_status ON event_status.event = event.id
    -- The && operator means containment, i.e., the current time
    -- falls within the range of the 'times' column.
    WHERE event_status.times && now()
    ;

Properly indexed (event and times in the same index on event_status should do), you should have no performance concerns.

* You mentioned in a comment that this information is available elsewhere in the database; I'm going to hand-wave that for now. That said, it might be a good idea to investigate the merits of pulling it directly from the tables where it originates instead of having a duplicate status table. There are trade-offs for both that will depend on your situation.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange