I am fairly new to SLURM: the grid I use has many different users and when they are submitting or canceling jobs, it seems that other users are not able to query partition status, etc. This is extremely frustrating especially when creating jobs that spawn other jobs since they end up failing because the controller is busy. Does anyone know a workaround?

有帮助吗?

解决方案

With the default settings, Slurm can get slow/hang when many users submit/modify/cancel many jobs at the same time, especially with backfill and accounting enabled.

See tips to improve on that in these slides from the Slurm User Group Meeting of 2012.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top