Question

I'm running matlab on a cluster. when i run my .m script from an interactive matlab session on the cluster my results are reproducible. but when i run the same script from a qsub command, as part of an array job away from my watchful eye, I get believable but unreproducible results. The .m files are doing exactly the same thing, including saving the results as .mat files.

Anyone know why run one way the scripts give reproducible results, and run the other way they become unreproducible?

Is this only a problem with reproducibility or is this indicative of inaccurate results?

%%%%% Thanks to spuder for a helpful answer. Just in case anyone stumbles upon this and is interested here is some further information. If you use more than one thread in Matlab jobs, this may result in stealing resources from other jobs which plays havoc with the results. So you have 2 options: 1. Select exclusive access to a node. The cluster I am using is not currently allowing parallel array jobs, so doing this for me was very wasteful - i took a whole node but used it in serial. 2. Ask matlab to run on a singleCompThread. This may make your script take longer to complete, but it gets jobs through the queue faster.

Was it helpful?

Solution

There are a lot of variables at play. Ruling out transient issues such as network performance, and load here are a couple of possible explanations:

You are getting assigned a different batch of nodes when you run an interactive job from when you use qsub.

I've seen some sites that assign a policy of 'exclusive' to the nodes that run interactive jobs and 'shared' to nodes that run queued 'qsub' jobs. If that is the case, then you will almost always see better performance on the exclusive nodes.

Another answer might be that the interactive jobs are assigned to nodes that have less network congestion.

Additionally, if you are requesting multiple nodes, and you happen to land on nodes that traverse multiple hops, then you could be seeing significant network slowdowns. The solution would be for the cluster administrator to setup nodesets.

Are you using multiple nodes for the job? How are you requesting the resources?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top