Handling a queue of GPU jobs without resource manager
How do you execute a lot of experiments? Say you want to run as many as you can over night, and examine them in the morning. The jobs can be executed one after the other as easy as bash run.sh
when you have a single GPU. The problems arise when you have more than one so you want to make sure each GPU is occupied with exactly one task at a time and new task is fetched when previous finished. A clever way would be to use a resource manager such as slurm
or torque
, but I leave it as a future work for myself. This post shows a possible workaround for a local workstation using a simple python script.