=====  PBS  =====

The Portable Batch System (PBS) is the batch-job scheduler for Tiger. It also allocates processors for interactive parallel jobs. This document provides information for getting started with the batch facilities of PBS.

===== Queues =====

Different users may have access to different queues, and different queues may allow different job limits and have different priorities.

<>Use the "qstat -q'' command to see the current list of queues.


ch328-n6:~> qstat -aserver: ch328-n6

Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
fpga               --      --       --     --        0     0   --   E R
test               --      --       --     --         0     0   --   E R
work               --      --       --     --      0     0   --   E R
                                                       ----- -----
                                                           0     0




The only queues that can be submitted to are the routing queues and they are as follows:

----  ''work''           -----  work.
----  ''interactive''    -----  For interactive batch work. |

=====  Job Command Files  =====

To run a batch job under PBS, first write a job command file. PBS command files have two components: PBS submission options and shell commands. The PBS submission options are preceded by ''#PBS'', making them appear as comments to a shell. The shell commands follow the last "#PBS" option and represent the executable content of the batch job. If any ''#PBS'' lines follow executable statements, they will be treated as comments only.

If your login shell is "''csh''", the following message may appear in the standard output of a job:

Warning: no access to tty, thus no job control in this shell

Short of modifying ''csh'', there is no way to eliminate the message. It is just an informative message and should have no other effect on the job.

Here is an example of a command file, specifying some typical PBS keywords.

#!/bin/bash
#PBS -N test
#PBS -j oe
#PBS -q work
#PBS -l walltime=1:00:00,nodes=4:ppn=2
#PBS -W group_list=users
cd $PBS_O_WORKDIR
mpiexec $XD1LAUNCHER ./test


The job file must have ''#!/bin/bash'' in order to use mpiexec. If you want to use another shell, you can but then you cannot use ''mpiexec''. With other shells at this time, you must use ''mpirun''; and then you will need to provide the ''-np # -hostfile $PBS_NODEFILE'' arguments to mpirun.

The ''-N'' option names the job.

The ''-j oe'' option includes standard error in the standard output file, which is named ''<batch script name>.o$PBS_JOBID''. If the option were changed to ''-j eo'', then standard output would instead be added to standard error (yielding a "''.e''" file instead of "''.o''").

The ''-q'' option specifies the queue the job will be submitted to.

The ''-l'' option specifies resources limits, like walltime, memory, and the number of processors. 

The ''-W group_list=users'' option specifies the group to run under. At this time, this is necessary to run.

<>====  Common "qsub" Parameters  ====

''#PBS -A <account>''

<>Causes the job time to be charged to ''<account>''.

''#PBS -a <date>''

Declares the time after which the job is eligible for execution.

''#PBS -q <queue>''

Directs the job to a specified queue.

''#PBS -j {eo,oe}''

Causes the standard error and standard output to be combined in one file.
''eo'' --  Standard output is added to standard error.
''oe'' -- Standard error is added to standard output.

''#PBS -l <resource>=<value>,<resource>=<value>...''

Specifies resource limits.
''nodes=n:ppn=p'' --  Number of nodes n, number of (MPI) processes per node p, and number of cpus (threads) per node c.  Note you can only use a max of 2 for ppn,
''walltime'' --  Wall-clock time.

''#PBS -o <name>''

Writes standard output to ''<name>'' instead of ''<job script>.o$PBS_JOBID''. ''$PBS_JOBID'' is an environment variable created by PBS that contains the PBS job identifier.

''#PBS -e <name>''

Writes standard error to ''<name>'' instead of ''<job script>.e$PBS_JOBID''.

''#PBS -m {a,b,e}''

Sends E-mail to the submitter when the job aborts (''a''), begins (''b''), or ends (''e'').

''#PBS -N <name>''

Sets the job name to ''<name>'' instead of the name of the job script.

''#PBS -S <shell>''

Uses the shell ''<shell>'' to run the script. Make sure the full path to the shell is correct.

''#PBS -V''

Exports all environment variables from the submitting shell into the batch shell.

''#PBS -W ...''

Sets job dependencies between two or more jobs. See ''man qsub'' for details.

====  $PBS_O_WORKDIR  ====

PBS sets the environment variable ''$PBS_O_WORKDIR'' to the directory where the batch job was submitted. By default, a job starts in your home directory. Include the following command in your script if you want it to start in the submission directory.

cd $PBS_O_WORKDIR

====  MPI Jobs  ====

Here is an example command file for a parallel MPI job.

#!/bin/bash
#PBS -N test
#PBS -j oe
#PBS -l walltime=1:00:00,nodes=2:ppn=2
#PBS -W group_list=userscd $PBS_O_WORKDIR
mpiexec $XD1LAUNCHER ./test


This job requires up to an hour of runtime and 4 cpus. This batch script would be used to run a 4-process executable. Note that you must include ''#!/bin/bash'' in order to use mpiexec. If you wish to use another shell, then you cannot use ''mpiexec''. In that case, your job script might look like

#!/bin/ksh
#PBS -N test
#PBS -j oe
#PBS -l walltime=1:00:00,nodes=2:ppn=2
#PBS -W group_list=userscd $PBS_O_WORKDIR
mpirun -np 4 -hostfile $PBS_NODEFILE $XD1LAUNCHER ./test


If the executable ''test'' were a hybrid MPI-OpenMP code code and you wanted 4 MPI processes each with 2 threads, use a command file that looks like the following.

#!/bin/bash
#PBS -N test
#PBS -j oe
#PBS -l walltime=1:00:00,nodes=4:ppn=1:cpp=2
#PBS -W group_list=tigercd $PBS_O_WORKDIR
export OMP_NUM_THREADS=2
mpiexec $XD1LAUNCHER ./test


====  xd1launcher  ====

The standard Linux scheduler was designed to manage many independent processes. The Linux Synchronized Scheduler (LSS) modifies the standard Linux process scheduler to efficiently manage multi-process parallel applications.

By default all jobs use the standard linux process scheduler. Because this scheduler is inefficient for parallel applications, the LSS should be used for all parallel jobs.

The xd1launcher command gives users the ability use the LSS; it should be used for all jobs executed on tiger's compute nodes.

For a given job the xd1launcher command:
- Enables the job's processes to use the LSS
- Binds the job's processes to a CPU

The //XD1LAUNCHER// environment variable is set by default for all users to the current tested xd1laucher version. To ensure use of the most recent tested xd1launcher version, XD1LAUNCHER should be used.

The following example runs the executable a.out using the LSS:
mpiexec $XD1LAUNCHER a.out

More information on xd1launcher can be found through //man xd1launcher// on tiger.

=====  Environment Variables  =====

All PBS-provided environment-variable names start with ''PBS_''. Some are then followed by ''O'' (''PBS_O_''), indicating that the variable is from the job's "originating" environment (from which it was submitted). The following short example lists some of the more useful variables, and typical values.


PBS_NODEFILE=<hostlist of nodes you can run on>
PBS_O_HOME=/spin/home/<username>
PBS_O_LOGNAME=<username>
PBS_O_SHELL=/bin/ksh
PBS_O_HOST=tiger-302-6.ornl.gov
PBS_O_WORKDIR=<directory from which you submitted the job>
PBS_O_QUEUE=batch
PBS_O_TZ=EST5EDT
PBS_JOBNAME=INTERACTIVE
PBS_JOBID=149.tiger-302-6.ornl.gov
PBS_QUEUE=batch
PBS_ENVIRONMENT=PBS_INTERACTIVE


=====  Submitting Jobs  =====

Use "''qsub''" to submit a job command file for batch execution. The job shell will **not** inherit the working directory from where you submitted the job, so you might want to use ''$PBS_O_WORKDIR'' to reference the directory from where the job was submitted. Unless you use full path names, the standard output and standard error files will be saved in ''$PBS_O_WORKDIR'".

If you forget to supply a ''wall_clock_limit'', your job will get the default limit, regardless of class.

=====  Job Status  =====

Use ''qstat -a'' to check the status of submitted jobs.

                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1211.ch328-n6   user1  work     bad-test       25876     4   4    --  06:00 R 00:28
1239.ch328-n6   user2  work     good-test     20722  32  32    --  06:00 R 00:20

The first column is the ID of each job (which has been truncated), and the second column is the owner. The ''S'' column gives the status of each job. Here are some common job-status values.

|Status value| Meaning |
| ''E'' | Exiting after having run |
| ''H'' | Held |
| ''Q'' | Queued, eligible to run |
| ''R'' | Running |
| ''S'' | Suspended |
| ''T'' | Being moved to new location |
| ''W'' | Waiting for its execution time |

=====  Stopping Jobs  =====

Use ''qdel'' with a job ID to cancel that job. The command removes waiting jobs and aborts running jobs.
$ qdel 12816

You can also keep jobs from running without removing them from PBS using "''qhold''" with a list of job IDs. You can then use ''qrls'' to release held jobs and allow them to run.

You can also use ''qorder'' to change the relative order of your jobs in the queue.

=====  Documentation  =====

Tiger has ''man'' pages for each of the PBS commands. See "man pbs" for an overview.