cron script to act as a queue OR a queue for cron?

https://stackoverflow.com/questions/226473

03-07-2019
|

Question

I'm betting that someone has already solved this and maybe I'm using the wrong search terms for google to tell me the answer, but here is my situation.

I have a script that I want to run, but I want it to run only when scheduled and only one at a time. (can't run the script simultaneously)

Now the sticky part is that say I have a table called "myhappyschedule" which has the data I need and the scheduled time. This table can have multiple scheduled times even at the same time, each one would run this script. So essentially I need a queue of each time the script fires and they all need to wait for each one before it to finish. (sometimes this can take just a minute for the script to execute sometimes its many many minutes)

What I'm thinking about doing is making a script that checks myhappyschedule every 5 min and gathers up those that are scheduled, puts them into a queue where another script can execute each 'job' or occurrence in the queue in order. Which all of this sounds messy.

To make this longer - I should say that I'm allowing users to schedule things in myhappyschedule and not edit crontab.

What can be done about this? File locks and scripts calling scripts?

Solution

add a column exec_status to myhappytable (maybe also time_started and time_finished, see pseudocode)

run the following cron script every x minutes

pseudocode of cron script:

[create/check pid lock (optional, but see "A potential pitfall" below)]
get number of rows from myhappytable where (exec_status == executing_now)
if it is > 0, exit
begin loop
  get one row from myhappytable
    where (exec_status == not_yet_run) and (scheduled_time <= now)
    order by scheduled_time asc
  if no such row, exit
  set row exec_status to executing_now (maybe set time_started to now)
  execute whatever command the row contains
  set row exec_status to completed
  (maybe also store the command output/return as well, set time_finished to now)
end loop
[delete pid lock file (complementary to the starting pid lock check)]

This way, the script first checks if none of the commands is running, then runs first not-yet run command, until there are no more commands to be run at the given moment. Also, you can see what command is executing by querying the database.

A potential pitfall: if the cron script is killed, a scheduled task will remain in "executing_now" state. That's what the pid lock at beginning and end is for: to see if the cron script terminated properly. pseudocode of create/check pidlock:

if exists pidlockfile then
  check if process id given in file exists
  if not exists then
    update myhappytable set exec_status = error_cronscript_died_while_executing_this   
      where exec_status == executing_now
    delete pidlockfile
  else (previous instance still running)
    exit
  endif
endif
create pidlockfile containing cron script process id

OTHER TIPS

You can use the at(1) command inside your script to schedule its next run. Before it exits, it can check myhappyschedule for the next run time. You don't need cron at all, really.

I came across this question while researching for a solution to the queuing problem. For the benefit of anyone else searching here is my solution.

Combine this with a cron that starts jobs as they are scheduled (even if they are scheduled to run at the same time) and that solves the problem you described as well.

Problem

At most one instance of the script should be running.
We want to cue up requests to process them as fast as possible.

ie. We need a pipeline to the script.

Solution:

Create a pipeline to any script. Done using a small bash script (further down).

The script can be called as
./pipeline "<any command and arguments go here>"

Example:

./pipeline sleep 10 &
./pipeline shabugabu &
./pipeline single_instance_script some arguments &
./pipeline single_instance_script some other_argumnts &
./pipeline "single_instance_script some yet_other_arguments > output.txt" &
..etc

The script creates a new named pipe for each command. So the above will create named pipes: sleep, shabugabu, and single_instance_script

In this case the initial call will start a reader and run single_instance_script with some arguments as arguments. Once the call completes, the reader will grab the next request off the pipe and execute with some other_arguments, complete, grab the next etc...

This script will block requesting processes so call it as a background job (& at the end) or as a detached process with at (at now <<< "./pipeline some_script")

#!/bin/bash -Eue

# Using command name as the pipeline name
pipeline=$(basename $(expr "$1" : '\(^[^[:space:]]*\)')).pipe
is_reader=false

function _pipeline_cleanup {
        if $is_reader; then
                rm -f $pipeline
        fi
        rm -f $pipeline.lock

        exit
}
trap _pipeline_cleanup INT TERM EXIT

# Dispatch/initialization section, critical
lockfile $pipeline.lock
        if [[ -p $pipeline ]]
        then
                echo "$*" > $pipeline
                exit
        fi

        is_reader=true
        mkfifo $pipeline
        echo "$*" > $pipeline &
rm -f $pipeline.lock

# Reader section
while read command < $pipeline
do
        echo "$(date) - Executing $command"
        ($command) &> /dev/null
done

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow