===== PBS =====
The Portable Batch System (PBS) is the batch-job scheduler for Tiger. It also allocates processors for interactive parallel jobs. This document provides information for getting started with the batch facilities of PBS.
===== Queues =====
Different users may have access to different queues, and different queues may allow different job limits and have different priorities.
<>Use the "qstat -q'' command to see the current list of queues.The only queues that can be submitted to are the routing queues and they are as follows:
----
''work''
----- work.
---- ''interactive'' ----- For
interactive batch work. |
===== Job Command Files =====
To run a batch job under PBS, first write a job command file. PBS command files have two components: PBS submission options and shell commands. The PBS submission options are preceded by ''#PBS'', making them appear as comments to a shell. The shell commands follow the last "#PBS" option and represent the executable content of the batch job. If any ''#PBS'' lines follow executable statements, they will be treated as comments only.
If your login shell is "''csh''", the following message may appear in the standard output of a job:
Warning: no access to tty, thus no job control in this shell
Short of modifying ''csh'', there is no way to eliminate the message. It is just an informative message and should have no other effect on the job.
Here is an example of a command file, specifying some typical PBS
keywords.
The job file must have ''#!/bin/bash'' in order to use mpiexec. If you want to use another shell, you can but then you cannot use ''mpiexec''. With other shells at this time, you must use ''mpirun''; and then you will need to provide the ''-np # -hostfile $PBS_NODEFILE'' arguments to mpirun.
The ''-N'' option names the job.
The ''-j oe'' option includes standard error in the standard output file, which is named ''<batch script name>.o$PBS_JOBID''. If the option were changed to ''-j eo'', then standard output would instead be added to standard error (yielding a "''.e''" file instead of "''.o''").
The ''-q'' option specifies the queue the job will be submitted to.
The ''-l'' option specifies resources limits, like walltime, memory,
and the number of processors.
The ''-W group_list=users'' option specifies the group to run under. At this time, this is necessary to run.
<>==== Common "qsub" Parameters ====''#PBS -a <date>''
''#PBS -q <queue>''
''#PBS -j {eo,oe}''
''#PBS -l
<resource>=<value>,<resource>=<value>...''
''#PBS -o <name>''
''#PBS -e <name>''
''#PBS -m {a,b,e}''
''#PBS -N <name>''
''#PBS -S <shell>''
''#PBS -V''
''#PBS -W ...''
==== $PBS_O_WORKDIR ====
PBS sets the environment variable ''$PBS_O_WORKDIR'' to the directory where the batch job was submitted. By default, a job starts in your home directory. Include the following command in your script if you want it to start in the submission directory.
cd $PBS_O_WORKDIR
==== MPI Jobs ====
Here is an example command file for a parallel MPI job.
This job requires up to an hour of runtime and 4 cpus. This batch
script would be used to run a 4-process executable. Note that you must
include ''#!/bin/bash'' in order to use mpiexec. If you wish to use
another shell, then you cannot use ''mpiexec''. In that case, your job
script might look like
If the executable ''test'' were a hybrid MPI-OpenMP code code and
you wanted 4 MPI processes each with 2 threads, use a command file that
looks like the following.
==== xd1launcher ====
The standard Linux scheduler was designed to manage many
independent processes. The Linux Synchronized Scheduler (LSS) modifies
the standard Linux process scheduler to efficiently manage
multi-process parallel applications.
By default all jobs use the standard linux process scheduler. Because this scheduler is inefficient for parallel applications, the LSS should be used for all parallel jobs.
The xd1launcher command gives users the ability use the LSS; it should be used for all jobs executed on tiger's compute nodes.
For a given job the xd1launcher command:
- Enables the job's processes to use the LSS
- Binds the job's processes to a CPU
The //XD1LAUNCHER// environment variable is set by default for all users to the current tested xd1laucher version. To ensure use of the most recent tested xd1launcher version, XD1LAUNCHER should be used.
The following example runs the executable a.out using the LSS:
mpiexec $XD1LAUNCHER a.out
More information on xd1launcher can be found through //man
xd1launcher// on tiger.
===== Environment Variables =====
All PBS-provided environment-variable names start with ''PBS_''. Some are then followed by ''O'' (''PBS_O_''), indicating that the variable is from the job's "originating" environment (from which it was submitted). The following short example lists some of the more useful variables, and typical values.
===== Submitting Jobs =====
Use "''qsub''" to submit a job command file for batch execution. The job shell will **not** inherit the working directory from where you submitted the job, so you might want to use ''$PBS_O_WORKDIR'' to reference the directory from where the job was submitted. Unless you use full path names, the standard output and standard error files will be saved in ''$PBS_O_WORKDIR'".
If you forget to supply a ''wall_clock_limit'', your job will get the default limit, regardless of class.
===== Job Status =====
Use ''qstat -a'' to check the status of submitted jobs.
Req'd Req'd Elap
Job ID Username
Queue Jobname SessID NDS TSK Memory
Time S Time
--------------- -------- -------- ---------- ------ --- --- ------
----- - -----
1211.ch328-n6 user1 work
bad-test 25876
4 4 -- 06:00 R 00:28
1239.ch328-n6 user2 work
good-test 20722 32 32
-- 06:00 R 00:20
The first column is the ID of each job (which has been truncated), and the second column is the owner. The ''S'' column gives the status of each job. Here are some common job-status values.
|Status value| Meaning |
| ''E'' | Exiting after having run |
| ''H'' | Held |
| ''Q'' | Queued, eligible to run |
| ''R'' | Running |
| ''S'' | Suspended |
| ''T'' | Being moved to new location |
| ''W'' | Waiting for its execution time |
===== Stopping Jobs =====
Use ''qdel'' with a job ID to cancel that job. The command removes
waiting jobs and aborts running jobs.
$ qdel 12816
You can also keep jobs from running without removing them from PBS using "''qhold''" with a list of job IDs. You can then use ''qrls'' to release held jobs and allow them to run.
You can also use ''qorder'' to change the relative order of your jobs in the queue.
===== Documentation =====
Tiger has ''man'' pages for each of the PBS commands. See
"man pbs" for an overview.