質問

I am fairly new to SLURM: the grid I use has many different users and when they are submitting or canceling jobs, it seems that other users are not able to query partition status, etc. This is extremely frustrating especially when creating jobs that spawn other jobs since they end up failing because the controller is busy. Does anyone know a workaround?

役に立ちましたか?

解決

With the default settings, Slurm can get slow/hang when many users submit/modify/cancel many jobs at the same time, especially with backfill and accounting enabled.

See tips to improve on that in these slides from the Slurm User Group Meeting of 2012.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top