Question

I have a python script that is watching a web service request workspace. Each time a client submits a job to my web service a unique job folder is created in a well known location. I have a script polling this well known location for folder with out a "flag" (a blank text file with a specific name that indicates processing is complete on this job).

Right now my script can call a worker script to process the contents of the new folder but has to wait until the worker script is finished before it can proceed handing folders out.

My question is what are options to have new instances of the worker script instatiated and return control to the manager. Would creating a python executable that takes parameters of the worker script and have the manager script call it via the command line work? Or would creating the worker script into a class that can have numerous instations processing work?

Once the worker script is done it does not need to message back to the manager script job complete. It will does this via dropping a text file into the directory. Though now that I think about it I will have to hold somewhere that the each job directory has been handed out because it will take 1.5 minutes for the worker script to process.

Any advice/links would be much appreciated.

Was it helpful?

Solution

First of all, I agree that you need to put a flag in your directories indicating that a directory is being processed. The master script should be the only one to set the flag, or you will risk race conditions (two worker scripts taking the same directory at the same time). You can use the same file; the master script creates it empty (meaning "in progress") and the worker script writes 1B in it (meaning "done"). That way, the master script only has to check the existence of the flag.

Back to your question:

  • you can indeed make your worker script into a standalone program, and call it via the subprocess module;

  • you can make it a thread (with the threading module [2]), which is somewhat easier to code; this may be inefficient because of the GIL, but if your worker script is highly IO-bound, it should not be too much of a problem;

  • if you are using Python 3, you may want to look at the multiprocessing module [3]
    which I never used but seems to mix the usability of threading without being vulnerable to the GIL; it seems that it is not completely portable though.

Hope this helps

  • [1] http://docs.python.org/library/subprocess.html
  • [2] http://docs.python.org/library/threading.html
  • [3] http://docs.python.org/dev/library/multiprocessing.html
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top