Question

I have a perl script which takes in unique parameters (one of the parameters being --user=username_here). Users can start these processes using a web interface I am developing.

A MySQL table, transactions, keeps track of users that run the perl script

id  user    script_parameters           execute last_modified
23  alex    --user=alex --keywords=thisthat     0   2014-05-06 05:49:01
24  alex    --user=alex --keywords=thisthat     0   2014-05-06 05:49:01
25  alex    --user=alex --keywords=lg       0   2014-05-06 05:49:01
26  alex    --user=alex --keywords=lg       0   2014-04-30 04:31:39

The execute value for a given row will be "1" if the process should be running. It is set to "0" if the process should be ended.

My perl script constantly checks this value to make sure it's not "0" and if it is, the perl script terminates.

However, I need to manage these process to protect against this problem:

  1. What if my server abruptly crashes and restarts, OR the script crashes? I will need something running in the background, reading the transactions table and make sure it restarts the perl script as many times as needed using the appropriate parameters.

And so, I'm having trouble figuring out how to balance giving control to the user to manage his/her own transaction(s), while I also make sure that the transactions that SHOULD be running, ARE running, and those that AREN'T, AREN'T.

Hope that makes sense and I appreciate any help!

Was it helpful?

Solution

It seems you're trying to launch long-running processes from a web server and then track those processes in a database. That's not impossible, but not a recommended practice.

The main problem is that an HTTP request needs to be currently being handled in your web server for you do actually do anything (including track processes running on the system) -- you need something that can run all the time...

Instead, a better idea would be to have another daemonized "manager" process (as you mention perl, that'd be a good language to write it in) spawn & track the long running tasks (by PID and signals), and for that process to update your SQL database.

You can then have your "manager" process listen for requests to start a new process from your web server. There are various IPC mechanisms you could use. (e.g: signals, SysV shm, unix domain sockets, in-process queues like ZeroMQ, etc).

This has multiple benefits:

  • If your spawned scripts need to run with user/group based isolation (either from the system or each other), then your webserver doesn't need to run as root, nor be setgid.
  • If a spawned process "crashes", a signal will be delivered to the "manager" process, so it can track mis-executiions without issues.
  • If you use in-process queues (e.g: ZeroMQ) to deliver requests to the "manager" process, it can "throttle" requests from the web server (so that users cannot intentionally or accidentally cause D.O.S).
  • Whether or not the spawned process ends well, you don't need an 'active' HTTP request to the web server in order to update your tracking database.

As to whether something that should be running is running, that's really up to your semantics. (i.e: is it based on a known run time? based on data consumed? etc).

The check as to whether it is running can be two-fold:

  1. The "manager" process updates the database as appropriate, including the spawned PID.
  2. Your web server hosted code can actually list processes to determine if the PID in the database is actually running, and even how much time it's been doing something useful!

The check for whether it is not running would have to be based on convention:

  1. Name the spawned processes something you can predict.
  2. Get a process list to determine what's still running (defunct?) that shouldn't be.

In either case, you could either inform the users who requested the processes be spawned and/or actually do something about it.

One approach might be to have a CRON job which reads from the SQL database and does ps to determine which spawned processes need to be restarted, and then re-requests that the "manager" process does so using the same IPC mechanism used by the web server. How you differentiate starts vs. restarts in your tracking/monitoring/logging is up to you.

If the server itself loses power or crashes, then you could have the "manager" process perform cleanup when it first runs, e.g:

  1. Look for entries in the database for spawned processes that were alegedly running before the server was shut down.
  2. Check for those processes by PID and run time (this is important).
  3. Either re-spawn the spawned proceses that didn't complete, or store something in the database to indicate to the web server that this was the case.

Update #1

Per your comment, here are some pointers to get started:

You mentioned perl, so presuming you have some proficiency there -- here are some perl modules to help you on your way to writing the "manager" process script:

If you're not already familiar with it CPAN is the repository for perl modules that do basically anything.

Daemon::Daemonize - To daemonize process so that it will continue running after you log out. Also provides methods for writing scripts to start/stop/restart the daemon.

Proc::Spawn - Helps with 'spawning' child scripts. Basically does fork() then exec(), but also handles STDIN/STDOUT/STDERR (or even tty) of child process. You could use this to launch your long-running perl scripts.

If your web server front-end code is not already written in perl, you'll need something that's pretty portable for inter-process message-passing and queuing; I'd probably make your web server front end in something easy to deploy (like PHP).

Here are two possibilities (there are many more):

Proc::ProcessTable - You can use this check on running processes (and get all sorts of stats as discussed above).

Time::HiRes - Use the high-granularity time functions from this package to implement your 'throttling' framework. Basically just limit the number of requests you de-queue per unit of time.

DBI (with mysql) - Update your MySQL database from the "manager" process.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top