Handling a queue of GPU jobs without resource manager
How do you execute a lot of experiments? Say you want to run as many as you can over night, and examine them in the morning. The jobs can be executed one after the other as easy as bash run.sh
when you have a single GPU. The problems arise when you have more than one so you want to make sure each GPU is occupied with exactly one task at a time and new task is fetched when previous finished. A clever way would be to use a resource manager such as slurm
or torque
, but I leave it as a future work for myself. This post shows a possible workaround for a local workstation using a simple python script.
So here it is:
Just adjust the command-line, N_GPU
and parameter loop.
The most non-trivial part is to use backend="threading"
because Queue seems to be thread-safe only with it (see test). You may also notice printing issues, they can be fixed with print locks. Find a fixed version here.