How to partition a grid

To run the model using multiple cpu's with MPI the model domain will have to be devided into x*y "tiles". Each MPI cpu will then work on one tile.
By default each MPI cpu will then also output the requested output fields for it's own tile. Depending on the machine this might not always be the fastest way, neither for the output nor for later post processing. Therefore it is possible to "block" the output (see below).

The variables used to set the division into tiles, the topology, as well as the variables to set the blocking can be found in the file 'gemclim_settings.nml'.

Set the topology with: 'Ptopo_npex' and 'Ptopo_npey'
Set the blocking with: 'Ptopo_nblocx' and 'Ptopo_nblocy'
Set OpenMP with:        'Ptopo_smtdyn' and 'Ptopo_smtphy'
    with SMT (simultaneous multithreading): Ptopo_smtphy=Ptopo_smtdyn*2
    otherwise Ptopo_smtphy=Ptopo_smtdyn

Topology

(Dividing the model domain)

     
In LAM mode insure that the tiles are as square as possible so that the borders over which the tiles are communicating are as small as possible.
When running a global grid, uniform or stretched, insure that you have as little as possible divisions in x-direction. That means keeping 'Ptopo_npex' small and rather increase 'Ptopo_npey'. Some calculations in the model work very well in x-direction when not cut (near the poles).

Also have a look at the general introduction to GEMDM.

Checktopo

Unfortunately it is not possible to split a grid into any number of tiles. There are several restrictions. To check if a certain topology is possible you can use 'checktopo'.

   checktopo    -gni  : 
Number of points along axis X

-gnj  :  Number of points along axis Y

-gnk  :  Number of levels

-npx  :  Number of cpu's along axis X

-npy  :  Number of cpu's along axis Y

-cfl  :  Number of points for piloting - (Pil_maxcfl/Step_maxcfl) (LAM only)

-hblen  :  Number of points for blending - (Pil_hblen/Hblen_x/Hblen_y) (LAM only)

-vspng  :  vertical sponge - (Vspng_nk)

                                                                                                                             
 

Block the output

The idea behind the blocking is that one cpu outputs the fields for it's own tile and the neighbouring tiles. Like this the writing cpu's do not interfere with eachother and the output file size is larger.
In general the fastest writing of the output files will be achieved, if all tiles/cpu's/cores which get blocked (grouped) are on the same node!
For this one has to know how the cpu's get distributed and how many cpu's (cores) are on one node. The latter depends on the machine.

  
AIX in Dorval 16 cores per node

marvin 4 cores per node

st1/st2/st3 8 cores per node

colosse 8 cores per node

The cores get distributed over the tiles from left to right and bottom to top starting with the lower left tile.
Therefore, when OpenMP=1, the first core on the first node will compute the lower left tile, the second core on the same node will compute the one right of it, ...
In case of i.e. OpenMP=2 the first and second core from the first node will compute the lower left tile, the third and fourth core on the same node will compute the one right of it, ...

Here are some examples about which tile gets treated by which node and core:
Blocking should be done in a way so that there is not more than one node (color) per block!


OpenMP: 1  (Ptopo_smtdyn=1)         Blocking:   Ptopo_nblocx = 1, 2 or 4
Topology: 4x4
Ptopo_nblocy = 2 or 4
Cores per node: 8



OpenMP: 1  (Ptopo_smtdyn=1)         Blocking:   Ptopo_nblocx = 1, 2 or 4
Topology: 4x4
Ptopo_nblocy = 4
Cores per node: 4




OpenMP: 2  (Ptopo_smtdyn=2)         Blocking:   Ptopo_nblocx = 1, 2 or 4
Topology: 4x4
Ptopo_nblocy = 4
Cores per node: 8



OpenMP: 4  (Ptopo_smtdyn=4)         Blocking:   Ptopo_nblocx = 1 or 2
Topology: 2x2

Ptopo_nblocy = 2
Cores per node: 8



OpenMP: 4  (Ptopo_smtdyn=4)         Blocking:   Ptopo_nblocx = 2
Topology: 2x2

Ptopo_nblocy = 2
Cores per node: 4




Practice:
   
OpenMP: 2  (Ptopo_smtdyn=2)  
Topology: 4x4
Cores per node: 4
Blocking: ???

   
OpenMP: 2  (Ptopo_smtdyn=2)  
Topology: 6x6
Cores per node: 8
Blocking: ???






Author: Katja Winger
Last update: January 2010