سؤال

I'm running several simulations using Condor and have coded the program so that it outputs a progress status in the console. This is done at the end of a loop where it simply prints the current time (this can also be percentage or elapsed time). The code looks something like this:

printf("START");
while (programNeedsToRum) {

   // Run code repetitive code...

   // Print program status update
   printf("[%i:%i:%i]\r\n", hours, minutes, seconds);
}
printf("FINISH");

When executing normally (i.e. in the terminal/cmd/bash) this works fine, but the condor nodes don't seem to printf() the status. Only once the simulation has finished, all the status updates have been outputted to the file but then it's no longer of use. My *.sub file that I submit to condor looks like this:

universe = vanilla
executable = program
output = out/out-$(Process)
error = out/err-$(Process)
queue 100

When submitted the program executes (this is confirmed in condor_q) and the output files contain this:

START

Only once the program has finished running its corresponding output file shows (example):

START
[0:3:4]
[0:8:13]
[0:12:57]
[0:18:44]
FINISH

Whilst the program executes, the output file only contains the START text. So I came to the conclusion that the file is not updated if the node executing program is busy. So my question is, is there a way of updating the output files manually or gather any information on the program's progress in a better way?

Thanks already

Max

هل كانت مفيدة؟

المحلول

What you want to do is use the streaming output options. See the stream_error and stream_output options you can pass to condor_submit as outlined here: http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html

By default, HTCondor stores stdout and stderr locally on the execute node and transfers them back to the submit node on job completion. Setting stream_output to TRUE will ask HTCondor to instead stream the output as it occurs back to the submit node. You can then inspect it as it happens.

نصائح أخرى

Here's something I used a few years ago to solve this problem. It uses condor_chirp which is used to transfer files from the execute host to the submitter. I have a python script that executes the program I really want to run, and redirects its output to a file. Then, periodically, I send the output file back to the submit host.

Here's the Python wrapper, stream.py:

 #!/usr/bin/python
 import os,sys,time

 os.environ['PATH'] += ':/bin:/usr/bin:/cygdrive/c/condor/bin'
 # make sure the file exists
 open(sys.argv[1], 'w').close()

 pid = os.fork()
 if pid == 0:
    os.system('%s >%s' % (' '.join (sys.argv[2:]), sys.argv[1]))
 else:
    while True:
        time.sleep(10)
        os.system('condor_chirp put %s %s' % (sys.argv[1], sys.argv[1]))
        try:
            os.wait4(pid, os.WNOHANG)
        except OSError:
            break

And my submit script. The problem ran sh hello.sh, and redirected the output to myout.txt:

 universe                = vanilla
 executable              = C:\cygwin\bin\python.exe
 requirements            = Arch=="INTEL" && OpSys=="WINNT60" && HAS_CYGWIN==TRUE
 should_transfer_files   = YES
 transfer_input_files    = stream.py,hello.sh
 arguments               = stream.py myout.txt sh hello.sh
 transfer_executable     = false

It does send the output in its entirety, so take that in to account if you have a lot of jobs running at once. Currently, its sending the output every 10 seconds .. you may want to adjust that.

with condor_tail you can view the output of a running process. to see stdout just add the job-ID (and -f if you want to follow the output and see the updates immediately. Example:

condor_tail 314.0 -f
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top