Вопрос

I am fairly new to SLURM: the grid I use has many different users and when they are submitting or canceling jobs, it seems that other users are not able to query partition status, etc. This is extremely frustrating especially when creating jobs that spawn other jobs since they end up failing because the controller is busy. Does anyone know a workaround?

Это было полезно?

Решение

With the default settings, Slurm can get slow/hang when many users submit/modify/cancel many jobs at the same time, especially with backfill and accounting enabled.

See tips to improve on that in these slides from the Slurm User Group Meeting of 2012.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top