Instantaneous Performance Monitor (IPM)
User's Guide
The Instantaneous Performance Monitor (IPM) was designed to provide the SX-3
performance programmer with a method of analyzing the performance profile of
a small segment of code with the same accuracy and less overhead than the
Analyzer. By design, this tool is meant for use on small sections of Fortran,
such as do loops, if-then-endif blocks, etc. The Analyzer, while providing
line-by-line cost data in Fortran, does not provide performance numbers for
user-selected segments of code. IPM is designed to provide precisely this
service.
IPM is an ensemble of 3 subroutines. To measure performance, the user
explicitly calls these routines from predetermined points of interest in the
application code. Interval timings are maintained, and, upon request,
performance data is printed by IPM. The following routines are provided:
ipm_bgn- Commences data collection
ipm_end- Ends data collection
ipm_out- Prints performance characteristics
IPM Rules:
1) Performance of the section of source contained between a call
to ipm_bgn and a corresponding call to ipm_end is measured.
2) After a call to ipm_end, ipm_out may be called to print the measured
statistics. The program can continue execution after this call.
3) Upto 1024 sections of code can be measured in a single application.
Each section is differentiated by a user-specified IPM identifier.
4) Nested calls to IPM routines are allowed, as long as no conflicting
IPM identifiers are used. Overlapping calls are also permitted.
5) Repeated calls to IPM with the same identifier (such as having an
ipm_bgn and ipm_end in a subroutine that is called many times) will
result in average statistics being generated.
6) IPM directly reads the hardware registers, and is therefore
independant of the Unix kernel. IPM overheads are extremely low.
7) The following performance data are displayed:
Mflops, Real time, CPU time, Vector time,
Bank Conflict time, Cache Mishit time,
Total Instruction count, Vector Instruction Count
The number of executions of the instrumented section of code
is also displayed.
IPM Subroutines:
1) To commence the gathering of performance data on a section of code,
call subroutine IPM_BGN:
CALL IPM_BGN(id,'string')
ID - integer*8
This is a number between 1 and 1024. It uniquely
identifies the section of code that is being
evaluated.
STRING - character*(*)
This is a character string of upto 32 bytes that is
printed along with the id when the performance data is
displayed. It makes it easier for the user to identify
the section of code that was instrumented.
2) To terminate the gathering of performance data on a section of code,
call subroutine IPM_END:
CALL IPM_END(id)
ID - integer*8
This is a number between 1 and 1024. It must correspond
to the id that was specified at the start of the
code section in the call to IPM_BGN. Repeated calls to
IPM_BGN and IPM_END with the same identifier will cause
a running average to be maintained.
3) To display the performance data gathered using IPM_BGN and IPM_END,
call subroutine IPM_OUT:
CALL IPM_OUT()
This subroutine has no arguments -- performance data for
all active identifiers is displayed. IPM_OUT merely
displays the acquired data; it does not modify it. In order
for IPM_OUT to make sense, each call to IPM_BGN with
a particular identifier, must have a corresponding call
to IPM_END with the same identifier. Output goes to
Fortran unit 6 (F_FF06).
IPM OUTPUT:
The following is a sample output of IPM (output of the example):
IPM -- Performance data at: 93-02-04 16:26:02
REAL CPU VECTOR
ID NCALLS MFLOPS Seconds Seconds Seconds
---- ---------- ---------- ---------- ---------- ----------
1 1 0.020 0.007 0.007 0.004
2 1 0.343 0.000 0.000 0.000
3 1 4.298 0.000 0.000 0.000
4 1 834.133 7.604 2.440 1.653
5 20 1028.992 0.004 0.004 0.003
6 1 13.218 3.642 1.559 1.480
7 1 514.515 11.250 4.004 3.137
BANK CACHE INSTR. V.INSTR.
ID Seconds Seconds Millions Millions
---- ---------- ---------- ---------- ----------
1 0.000 0.000 0.280 0.020
2 0.000 0.000 0.000 0.000
3 0.000 0.000 0.003 0.000
4 0.000 0.010 71.750 28.750
5 0.000 0.000 0.147 0.057
6 1.181 0.000 5.186 1.285
7 1.181 0.011 77.092 30.092
ID DESCRIPTION
---- --------------------------------
1 Loop 10
2 Loop 20
3 Subroutine Settri
4 50000 x Msoltri
5 5 x Msoltri
6 Bank Conflicts
7 Msoltri + Bank Conflict