When using the RPN environment you have access to several tools, some of
which are explained below.
You are using this RPN environment on our UQAM servers by default and on
Compute Canada clusters after having followed the steps to set up the SSM environment.
A little reminder:
If you reach a "collaboration" web page asking for login and
password information, try using 'science' for both.
Usually, the way to submit, check, and kill a job depends on the
scheduling system and on the way it is installed.
To make your life easier, the RPN environment contains a set of tools that
will do all these "adjustments" for you and that you can always get used
the same way. They will basically do a "translation" for you.
To submit any job on our UQAM servers, always
use the command 'soumet'.
On the Compute Canada clusters it is up to you if you want to use soumet
or not.
"Soumet" can only get used to submit shell
scripts! If you want to submit anything else, for example
an executable, you need to create a script first which executes the
executable.
The scripts that will get executed to not know from which directory the
job got submitted. Therefore, if you want to run a script in a certain
directory, make sure you 'cd' into this directory first!
The command to submit a job/script could look like this:
soumet jobname [ -t time_in_seconds -listing listing-directory -jn listing_name -cpus number_of_cpus -mpi ]
Where:
'jobname' is the name of the shell script, also called job, you want to submit.
'-t time_in_seconds'
specifies the wallclock in seconds time you acquire for the job. Even if
the job is not finished after this time expired the job will get
terminated. So better always ask for enough time. However, on larger
clusters like Compute Canada clusters jobs asking for more time will be
queued longer.
On our UQAM systems the default wallclock time for single CPU/core jobs is
10 days. For multi core jobs the default time is 1 minute.
'-jn listing_name' specifies the name of the listing or log file of the job. Everything that would appear on the screen when running the job interactively will now get written into this file. The default listing name is the basename of the job.
'-listing listing-directory'
specifies the directory in which the listing will get written. The default
directory is:
~/listings/${TRUE_HOST}
If you want to use the default directory, you should first create the
directory ~/listings and then create a symbolic link inside that
directory, pointing to a place where you have more space, for example to a
directory (that you have to create!) under your data space:
mkdir -p /dataspace/Listings
mkdir ~/listings
ln -s /dataspace/Listings
~/listings/${TRUE_HOST}
Replace 'dataspace' with the full name of your data directory.
'-cpus number_of_cpus'
specifies the number of cpus you want to use when running the job in
parallel using MPI and/or OpenMP. The syntax is the following: MPIxOpenMP
For example, if you want to use 4 MPI processes and 2 OpenMP processes you
would write: -cpus 4x2
If you want to use pure MPI with MPI=4 processes you would write:
-cpus 4x1 or simply -cpus 4
If you want to use pure OpenMP with OpenMP=2 processes you would
write: -cpus 1x2
The default is 1x1.
'-mpi' needs to get added when running a script (or the executable that will get executed within the script) with MPI.
To get more information about the command simply execute:
soumet -h
As said above, also the way to check (and kill) jobs depends on the queueing system. Therefore, we created a script called 'qs' which you can find in my ovbin:
~winger/ovbin/qs
On our UQAM servers as well as on Beluga (when using the RPN environment) you already have an alias pointing to the above command, called:
qs
Depending on the machine, this might be an alias again.