Python multiprocessing blocking unexpectedly

https://stackoverflow.com/questions/23638511

21-07-2023
|

Question

I'm running a python script using apache and modpython on a Linux environment. It integrates in a web tool which allow file processing. The part of my script that includes file processing can have very long execution time. In the original version of my code, the script waits for the files to be processed, and at the end it returns some html with a link to download the resulting file.

submit.html

<html>
    <body>
        <form enctype="multipart/form-data" action="./my_script.py/main" method="post">
            <div> <input type="file" name="file"> </div>
            <div> <input type="submit" value="Upload"> </div>
    </body>
</html>

my_script.py

def main(file):
    process(file)
    return "<p> Download your file <a href='./%s'></a>", % file

def process(file)
    #some file treatment here, and a resulting file is stored in current directory

I want to write a feature which would allow the user to receive the resulting file by email. In that case once he has uploaded his file, I would like to redirect him to a page, he could keep on using the web tool, while his file is being processed server sided and thus the user is not being Unix forks. I have made several tests with these 3 options but I always get blocked by the running script. For what I have understood, multiprocessing is best suited for my case, so I have tried this:

my_script.py

def main(file, receiver_mail_address):
    p = Process(target=process_and_email, args=(file, receiver_mail_address)
    p.start()
    return "<p> The resulting files will be emailed to you at %s.</p>" % receiver_mail_address

def process_and_email(file, receiver_mail_address):
    #some file processing here, and emailing. these functions work perfectly as expected.

In this situation I have skipped the p.join() step which is said in the python docs to

"Block the calling thread until the process whose join() method is called terminates or until the optional timeout occurs."

But in my case, it is still blocked. It means I have to wait for my process p to being over before reaching the return statement. How could I do that ?

Edit :

I have tried to change to the subprocess module. So I have put the process_and_email function into a new file called process_and_email.py and I modified the main script :

my_script.py

def main(file, receiver_mail_address):
    directory = os.path.firname(__file__)
    path = os.path.join(directory, 'process_and_email.py')

    subprocess.Popen(['python2.7', path, file, receiver_mail_address], shell=True)

    return "<p> The resulting files will be emailed to you at %s.</p>" % receiver_mail_address

I still have the same problem : I cannot reached the return statement before the process_and_email.py file has been totally executed.

Solution

This is happening because your parent process won't exit until all non-daemon child processes have completed the work they're doing. So in your case, process_and_email needs to complete before the script can exit, even though main has completed. You can make the child process a daemon, which will allow the parent script to exit right away, but it will kill the worker process prior to exiting, which isn't what you want either.

I think a better option for you is to use the subprocess module to spawn a separate Python script to do your processing in the background. That way your parent script can exit, and leave the worker process running.

OTHER TIPS

A common pattern used in web applications is to maintain a global queue, for example, beanstalkd which has a nice Python interface called beanstalkc. You would then submit such jobs to the queue and have a separate program/process watch that queue and work on items in the queue.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow