سؤال

I'm using the xz zipping utility on a PBS cluster; I've just realised that the time I've allowed for my zipping jobs won't be long enough, and so would like to restart them (and then, presumably, I'll need to include the .xz that has already been created in the new archive file?). Is it safe to kill the jobs, or is this likely to corrupt the .xz files that have already been created?

هل كانت مفيدة؟

المحلول

I am not sure about the implications of using xz in a cluster, but in general killing an xz process (or any decent compression utility) should only affect the file being compressed at the time the process terminates. More specifically:

  • Any output files from input files that have already been compressed should not be affected. The resulting .xz compressed files should remain perfectly usable.

  • Any input files that have not been processed yet should not be altered at all.

  • The input file that was being compressed at the time of termination should not be affected.

  • Provided that the process is terminated using the SIGTERM signal, rather than a signal than cannot be caught like SIGKILL, xz should clean-up after itself before exiting. More specifically, it should not leave any partial output files around.

  • If xz is killed violently, the worst that should (as opposed to might) happen is for a partial compressed file to remain on the disk, right along its corresponding input file. You may want to ensure that such files are cleaned up properly - a good way is to have xz work in a separate directory from the actual storage area and move files in and out for compression.

That said, depending on the importance of the compressed data, you may still want to incorporate measures to detect and deal with any corrupt files. There can be a lot of pathological situations where things do not happen as they are supposed to...

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top