Navigation and service


Some hints for using the judge batch operations after switch maui->moab.

* use msub instead of qsub.
The qsub command is still operational. However, it won't allow you
to request GPUs. There's a man page.

* Replace your #PBS directives in the scripts by #MOAB directives.

* Unless you specify a jobname with -n, your stdout/err will land in

* To request GPUs: -l nodes=X:ppn=Y:gpus=<1|2>
You will get 1 or two GPUs per node. New environment variables:
- $PBS_GPUFILE (like $PBS_NODEFILE) contains a list of entries
- $CUDA_VISIBLE_DEVICES: Lists the GPUs at your disposition. This
variable will allow you to use only the requested GPUs.

* Node Exclusivity:
Has been removed. Serial jobs should now be able to squeeze
into the gaps left by the GPU jobs. Please let us know if this
results in longer waiting time.

* Standing reservation:
The standing reservation has been kept for now. Hopefully we can get
rid of it because of better scheduling policies.

* Why isn't my job running?
Please use the "checkjob -v <job id> command to see the reason.

* What other commands are there?
The moab commands live in /opt/moab/bin. However, not all commands
are accessible to users.

* Where can I find documentation?
The official documentation is at

* Test License:
We're running under a test license. This license expires 2011-09-15
01:25:14. If moab stops working after this date, please see your
procurement department.

* Memory Limits (starting 2011-08-18):
Please specify the _total_ memory your job uses with the
-l mem=<size>
option. It defaults to 4 gb, the maximum is 5184 gb. If there's
not enough memory available, your job won't start.
The mem resource will make sure that memory requirements from
different jobs won't interfere with each other.
Additionally, specify your per process memory requirement with
-l pvmem=<size>
It defaults to 3.5 gb, the maximum is 92 gb. If your job violates the
pvmem limit, the processes are killed.
To learn about how to specify a size, confer the pbs_resources
The checkjob command, called after job completion, tells about
memory violations:
Message[0] job cancelled - job 23239 exceeded MEM usage hard limit (62 > 44)

* For questions: Please mail to