This chapter provides a tutorial that shows you how to gather and analyze performance data in a C program, using SpeedShop tools. The tutorial covers these topics:
“Tutorial Overview”, introduces the sample program and explains the different scenarios in which it will be used.
“Tutorial Setup”, steps you through the necessary setup for running the experiment.
“Analyzing Performance Data”, steps you through five different experiments, discussing first how to do the experiments, then how to interpret the results.
![]() | Note: Because of inherent differences between systems and because of concurrent processes that may be running on your system, your experiment will produce different results from the one in this tutorial. However, the basic structure of the results should be the same. |
The C tutorial demonstrates the following experiments:
usertime
pcsamp
basic block count (bbcounts)
floating point exception (fpe)
For an example of an MPI tracing tutorial and a hardware counter experiment, see Chapter 3, “Tutorial for Fortran Users”.
This tutorial uses a sample program called generic. There are three versions of the program:
generic directory : contains files for the n32-bit ABI
generico32 directory: contains files for the (old) 32-bit ABI
generic64 directory: contains files for the 64-bit ABI
When you work with the tutorial, choose the version of generic most appropriate for your system. A good guideline is to choose the version that corresponds to the way you expect to develop your programs.
This tutorial was written and tested using the version of generic in the generic directory.
The generic program was designed as a test and demonstration application. It contains code to run scenarios that each test a different area of SpeedShop. The version of generic used in this tutorial performs scenarios that:
Build a linked list of structures
Use a lot of user time
Scan a directory and run the stat command on each file
Perform file I/O
Generate a number of floating-point exceptions
Load and call a routine in a DSO
Output from the program looks like the following:
0:00:00.000 ======== (27173) Begin script Fri 06 Feb 1998 15:03:31. begin script `ll.u.cvt.d.i.f.dso' 0:00:00.002 ======== (27173) start of linklist Fri 06 Feb 2002 15:03:31. linklist completed. 0:00:00.003 ======== (27173) start of usrtime Fri 06 Feb 2002 15:03:31. usertime completed. 0:00:25.572 ======== (27173) start of cvttrap Fri 06 Feb 2002 15:03:57. cvttrap completed, y = 2.60188e+14, z = 2.60188e+14. 0:00:25.806 ======== (27173) start of dirstat Fri 06 Feb 2002 15:03:57. dirstat of /usr/include completed, 304 files. 0:00:26.618 ======== (27173) start of iofile -- stdio Fri 06 Feb 2002 15:03:58. stdio iofile on /unix completed, 7307988 chars. 0:00:26.864 ======== (27173) start of fpetraps Fri 06 Feb 2002 15:03:58. fpetraps completed. 0:00:26.865 ======== (27173) start of libdso Fri 06 Feb 2002 15:03:58. dlslave_init executed dlslave_routine executed slaveusertime completed, x = 5000000.000000. libdso: dynamic routine returned 13 end of script `ll.u.cvt.d.i.f.dso' 0:00:27.972 ======== (27173) End script Fri 06 Feb 2002 15:03:59. |
Copy the program to a directory where you have write permission and compile it so that you can use it in the tutorial.
Change to the /usr/demos/SpeedShop directory.
Copy the appropriate generic directory and its contents to a directory where you have write permission:
% cp -r generic your_dir |
Change to the directory you just created:
% cd your_dir/generic |
Compile the program, by entering:
% make all |
This provides an executable for the experiment.
This section explains how to run the following experiments on the generic program, generate the experiment's results, and interpret the results:
usertime. As a first cut at optimization, this may be the most useful experiment. It breaks down a program into its functions and returns the CPU time used in each. See “A usertime Experiment”.
pcsamp. This experiment uses a different method to return the CPU time. See “A pcsamp Experiment”.
dsc_hwc. This experiment counts the number of times a required data item was not in secondary data cache. If the data item is not in secondary data cache, it must be fetched from memory, which requires more time. See “A Hardware Counter Experiment”.
bbcounts. This experiment counts basic block usage and estimates a linear time. It also maps out a complete call graph. See “A Basic Block Experiment”.
fpe. This experiment counts the number of floating-point exceptions in each function. See “An fpe Trace”.
You can follow the tutorial from start to finish, or you can choose the experiment you want to perform.
This section explains how to perform a usertime experiment. The usertime experiment allows you to gather data on the amount of CPU time spent in each function in your program.
![]() | Note: Due to statistical sampling of the call stack, not all functions may appear in the experiment output. |
From the command line, enter the following:
% ssrun -usertime generic |
This command starts the experiment. Output from generic and from ssrun is printed to stdout, as shown in the following example. A data file is also generated. The name consists of the process name (generic), the experiment type (usertime), and the experiment ID. In this example, the file name is generic.usertime.m10981.
0:00:00.000 ======== (16957) Begin script Mon 18 Mar 2002 06:56:38. begin script `ll.u.cvt.d.i.f.dso' 0:00:00.004 ======== (16957) start of linklist Mon 18 Mar 2002 06:56:38. linklist completed. 0:00:00.005 ======== (16957) start of usrtime Mon 18 Mar 2002 06:56:38. usertime completed. 0:00:18.736 ======== (16957) start of cvttrap Mon 18 Mar 2002 06:56:57. cvttrap completed, y = 2.60188e+14, z = 2.60188e+14. 0:00:18.906 ======== (16957) start of dirstat Mon 18 Mar 2002 06:56:57. dirstat of /usr/include completed, 264 files. 0:00:18.941 ======== (16957) start of iofile -- stdio Mon 18 Mar 2002 06:56:57. stdio iofile on /unix completed, 7965088 chars. 0:00:20.426 ======== (16957) start of fpetraps Mon 18 Mar 2002 06:56:59. fpetraps completed. 0:00:20.428 ======== (16957) start of libdso Mon 18 Mar 2002 06:56:59. dlslave_init executed dlslave_routine executed slaveusertime completed, x = 5000000.000000. libdso: dynamic routine returned 13 end of script `ll.u.cvt.d.i.f.dso' 0:00:21.217 ======== (16957) End script Mon 18 Mar 2002 06:56:59. |
To generate a report on the data collected, enter the following at the command line:
% prof your_output_file_name > usertime.results |
In this example, your_output_file_name would be generic.usertime.m10981. The prof command prints results to stdout.
![]() | Note: Because of line width restrictions, the DSO, file name, and line number information at the end of each line is wrapped to the next line in the following sample output. |
------------------------------------------------------------------------- SpeedShop profile listing generated Mon Mar 18 07:00:30 2002 prof generic.usertime.m16957 generic (n32): Target program usertime: Experiment name ut:cu: Marching orders R5000 / R5000: CPU / FPU 1: Number of CPUs 180: Clock frequency (MHz.) Experiment notes-- From file generic.usertime.m16957: Caliper point 0 at target begin, PID 16957 /speedtest/generic/generic Caliper point 1 at exit(0) ------------------------------------------------------------------------- Summary of statistical callstack sampling data (usertime)-- 664: Total Samples 0: Samples with incomplete traceback 19.920: Accumulated Time (secs.) 30.0: Sample interval (msecs.) ------------------------------------------------------------------------- Function list, in descending order by exclusive time ------------------------------------------------------------------------- [index] excl.secs excl.% cum.% incl.secs incl.% samples procedure (dso: file, line) [3] 18.570 93.2% 93.2% 18.570 93.2% 619 anneal (generic: generic.c, 1570) [5] 0.750 3.8% 97.0% 0.750 3.8% 25 slaveusrtime (dlslave.so: dlslave.c, 22) [8] 0.420 2.1% 99.1% 0.420 2.1% 14 __read (libc.so.1: read.s, 20) [12] 0.150 0.8% 99.8% 0.150 0.8% 5 cvttrap (generic: generic.c, 318) [13] 0.030 0.2% 100.0% 0.030 0.2% 1 _ngetdents (libc.so.1: ngetdents.s, 16) [1] 0.000 0.0% 100.0% 19.920 100.0% 664 main (generic: generic.c, 102) [2] 0.000 0.0% 100.0% 19.920 100.0% 664 Scriptstring (generic: generic.c, 185) [4] 0.000 0.0% 100.0% 18.570 93.2% 619 usrtime (generic: generic.c, 1385) [14] 0.000 0.0% 100.0% 0.030 0.2% 1 dirstat (generic: generic.c, 349) [15] 0.000 0.0% 100.0% 0.030 0.2% 1 _readdir (libc.so.1: readdir.c, 135) [9] 0.000 0.0% 100.0% 0.420 2.1% 14 iofile (generic: generic.c, 462) [10] 0.000 0.0% 100.0% 0.420 2.1% 14 fread (libc.so.1: fread.c, 27) [11] 0.000 0.0% 100.0% 0.420 2.1% 14 _read (libc.so.1: readSCI.c, 27) [6] 0.000 0.0% 100.0% 0.750 3.8% 25 libdso (generic: generic.c, 623) [7] 0.000 0.0% 100.0% 0.750 3.8% 25 dlslave_routine (dlslave.so: dlslave.c, 7) |
The report shows information for each function. The meanings of the column headings are described below:
The index column assigns a reference number to each function.
The excl.secs column shows how much time, in seconds, was spent in the function itself (exclusive time). Routines that begin with an underscore, such as __start, are internal routines that you cannot change.
The excl.% column shows the percentage of a program's total time that was spent in the function.
The cum.% column shows the percentage of the complete program time that has executed in the routines listed so far.
The incl.secs column shows how much time, in seconds, was spent in the function and descendents of the function.
The incl.% column shows the cumulative percentage of inclusive time spent in each function and its descendents.
The samples column shows how many samples were taken when the process was executing in the function and in all of the function's descendants.
The procedure (dso:file,line) columns list the function name, its DSO name, its file name, and its line number. For example, the top line reports statistics for the function anneal, the DSO generic, in the file generic.c, at line 1570.
This section explains how to perform a pcsamp experiment. The pcsamp experiment allows you to gather information on actual CPU time for each function in your program. For more information on pcsamp, see “PC Sampling Experiment (pcsamp)” in Chapter 4.
From the command line, enter the following:
% ssrun -fpcsamp generic |
This starts the experiment. The f prefix is added to pcsamp for this program because the program runs quickly and does not gather much data using the default pcsamp experiment name; adding the f prefix results in more data samples. Output from generic and from ssrun is printed to stdout, as shown in the following example.
A data file is also generated. The name consists of the process name (generic), the experiment type (fpcsamp), and the experiment ID. In this example, the file name is generic.fpcsamp.m11140.
0:00:00.000 ======== (16969) Begin script Mon 18 Mar 2002 07:02:19. begin script `ll.u.cvt.d.i.f.dso' 0:00:00.005 ======== (16969) start of linklist Mon 18 Mar 2002 07:02:19. linklist completed. 0:00:00.008 ======== (16969) start of usrtime Mon 18 Mar 2002 07:02:19. usertime completed. 0:00:18.260 ======== (16969) start of cvttrap Mon 18 Mar 2002 07:02:37. cvttrap completed, y = 2.60188e+14, z = 2.60188e+14. 0:00:18.430 ======== (16969) start of dirstat Mon 18 Mar 2002 07:02:37. dirstat of /usr/include completed, 264 files. 0:00:18.464 ======== (16969) start of iofile -- stdio Mon 18 Mar 2002 07:02:37. stdio iofile on /unix completed, 7965088 chars. 0:00:18.813 ======== (16969) start of fpetraps Mon 18 Mar 2002 07:02:38. fpetraps completed. 0:00:18.815 ======== (16969) start of libdso Mon 18 Mar 2002 07:02:38. dlslave_init executed dlslave_routine executed slaveusertime completed, x = 5000000.000000. libdso: dynamic routine returned 13 end of script `ll.u.cvt.d.i.f.dso' 0:00:19.577 ======== (16969) End script Mon 18 Mar 2002 07:02:39. |
To generate a report on the data collected, and to redirect the output to a file, enter the following:
% prof your_output_file_name > pcsamp.results |
Output similar to the following is generated:
------------------------------------------------------------------------- SpeedShop profile listing generated Mon Mar 18 07:03:54 2002 prof generic.fpcsamp.m16969 generic (n32): Target program fpcsamp: Experiment name pc,2,1000,0:cu: Marching orders R5000 / R5000: CPU / FPU 1: Number of CPUs 180: Clock frequency (MHz.) Experiment notes-- From file generic.fpcsamp.m16969: Caliper point 0 at target begin, PID 16969 /speedtest/generic/generic Caliper point 1 at exit(0) ------------------------------------------------------------------------- Summary of statistical PC sampling data (fpcsamp)-- 19329: Total samples 19.329: Accumulated time (secs.) 1.0: Time per sample (msecs.) 2: Sample bin width (bytes) ------------------------------------------------------------------------- Function list, in descending order by time ------------------------------------------------------------------------- [index] secs % cum.% samples function (dso: file, line) [1] 18.084 93.6% 93.6% 18084 anneal (generic: generic.c, 1570) [2] 0.716 3.7% 97.3% 716 slaveusrtime (dlslave.so: dlslave.c, 22) [3] 0.329 1.7% 99.0% 329 __read (libc.so.1: read.s, 20) [4] 0.147 0.8% 99.7% 147 cvttrap (generic: generic.c, 318) [5] 0.031 0.2% 99.9% 31 _xstat (libc.so.1: xstat.s, 12) [6] 0.012 0.1% 99.9% 12 __write (libc.so.1: write.s, 20) [7] 0.004 0.0% 100.0% 4 fread (libc.so.1: fread.c, 27) [8] 0.002 0.0% 100.0% 2 iofile (generic: generic.c, 462) [9] 0.001 0.0% 100.0% 1 fprintf (libc.so.1: fprintf.c, 23) [10] 0.001 0.0% 100.0% 1 _cerror (libc.so.1: cerror.s, 30) 0.002 0.0% 100.0% 2 **OTHER** (includes excluded DSOs, rld, etc.) 19.329 100.0% 100.0% 19329 TOTAL |
The report has the following columns:
The [index] column assigns a reference number to each function.
The secs column shows the amount of CPU time, in seconds, that was spent in the function.
The % column shows the percentage of the total program time that was spent in the function.
The cum.% column shows the percentage of the complete program time in functions that have been listed so far.
The samples column shows how many samples were taken when the process was executing in the function.
The function (dso: file, line) columns list the function, its DSO name, its file name, and its line number.
![]() | Note: This experiment can be performed only on systems that have built-in hardware counters (machines with the R10000, R12000, R14000, or R16000 class of CPU). |
This section takes you through the steps to perform a hardware counter experiment. There are a number of hardware counter experiments, but this tutorial describes the steps involved in performing the dsc_hwc experiment. This experiment captures information about secondary data cache misses. For more information on hardware counter experiments, see “Hardware Counter Experiments (*_hwc, *_hwctime)” in Chapter 4.
From the command line, enter:
% ssrun -dsc_hwc generic |
This starts the experiment. Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (dsc_hwc), and the experiment ID. In this example, the file name is generic.dsc_hwc.m294398.
To generate a report on the data collected and redirect the output to a file, enter the following:
% profyour_output_file_name > dsc_hwc.results |
The report should look similar to the following listing:
------------------------------------------------------------------------- SpeedShop profile listing generated Mon Feb 2 11:11:44 1998 prof generic.dsc_hwc.m294398 generic (n32): Target program dsc_hwc: Experiment name hwc,26,131:cu: Marching orders R10000 / R10010: CPU / FPU 16: Number of CPUs 195: Clock frequency (MHz.) Experiment notes-- From file generic.dsc_hwc.m294398: Caliper point 0 at target begin, PID 294398 /usr/demos/SpeedShop/linpack.demos/c/generic Caliper point 1 at exit(0) ------------------------------------------------------------------------- Summary of R10K perf. counter overflow PC sampling data (dsc_hwc)-- 6: Total samples Sec cache D misses (26): Counter name (number) 131: Counter overflow value 786: Total counts ------------------------------------------------------------------------- Function list, in descending order by counts ------------------------------------------------------------------------- [index] counts % cum.% samples function (dso: file, line) [1] 131 16.7% 16.7% 1 init2da (generic: generic.c, 1430) [2] 131 16.7% 33.3% 1 genLog (generic: generic.c, 1686) [3] 131 16.7% 50.0% 1 _write (libc.so.1: writeSCI.c, 27) 393 50.0% 100.0% 3 **OTHER** (includes excluded DSOs, rld, etc.) 786 100.0% 100.0% 6 TOTAL |
The information immediately preceding the function list displays the following:
The Total samples is the number of times the program counter was sampled. It is sampled once for each overflow, or once each time the hardware counter exceeds the specified value.
The Counter name (number) indicates the hardware counter used in the experiment. In this case, hardware counter 26 counts the number of times a value required in a calculation was not available in secondary cache. For a complete list of the hardware counters and their numbers, see Table 4-2.
The Counter overflow value is the number at which the hardware counter overflows or exceeds its preset value. In this case, the value is 131, which is the default. The fdsc_hwc experiment runs the same hardware counter experiment with the preset value of 29. You can change the overflow value by setting the _SPEEDSHOP_HWC_COUNTER_OVERFLOW environment variable to a value larger than 0, the _SPEEDSHOP_HWC_COUNTER_NUMBER environment variable to 26, and running the prof_hwc experiment instead of dsc_hwc. See “_hwc Hardware Counter Experiments” in Chapter 4 to learn how to choose a counter overflow value.
The Total counts is the total number of times a value was not in secondary cache when needed. This value is determined by multiplying the total number of samples by the overflow value; extra counts that do not cause an overflow are not recorded.
The function list has the following columns:
The index column assigns a reference number to each function.
The counts column shows the number of times a data item was not in secondary cache when needed for a calculation during the execution of the function. As with Total counts, a function's counts value is determined by multiplying its samples value by the overflow value.
The % column shows the percentage of the program's overflows that occurred in the function.
The cum.% column shows the percentage of the program's overflows that occurred in the functions listed so far. A function might have a low number in its % column but a high value in its cum.% column if it executed late in the program.
The samples column shows the number of times the program counter was sampled during execution of the function. A sample is taken for each overflow of the hardware counter.
The function (dso: file, line) columns list the function name, the DSO, the file name, and line number of the function.
This section takes you through the steps to perform an bbcounts experiment. The times returned represent an idealized computation. This experiment ignores potential floating-point interlocks and memory latency time (cache misses and memory bus contention). The times returned will always be lower than the times for an actual run. For more information on the bbcounts experiment, see “Basic Block Counting Experiment (bbcounts)” in Chapter 4.
From the command line, enter
% ssrun -bbcounts generic |
This starts the experiment. First the executable, rld, and the DSOs are instrumented. This entails making copies of the libraries and executables, giving the copies an extension of .pixie.
Output from generic and from ssrun is printed to stdout. A data file is also generated. The name consists of the process name (generic), the experiment type (bbcounts), and the experiment ID. In this example, the file name is generic.bbcounts.m10966, and the following is written to stdout:
instrumenting /lib32/rld instrumenting /usr/lib32/libssrt.so instrumenting /usr/lib32/libss.so instrumenting /usr/lib32/libm.so instrumenting /usr/lib32/libc.so.1 instrumenting /speedtest/generic/generic 0:00:00.001 ======== (16991) Begin script Mon 18 Mar 2002 07:05:46. begin script `ll.u.cvt.d.i.f.dso' 0:00:00.016 ======== (16991) start of linklist Mon 18 Mar 2002 07:05:46. linklist completed. 0:00:00.025 ======== (16991) start of usrtime Mon 18 Mar 2002 07:05:46. usertime completed. 0:00:19.943 ======== (16991) start of cvttrap Mon 18 Mar 2002 07:06:06. cvttrap completed, y = 2.60188e+14, z = 2.60188e+14. 0:00:20.201 ======== (16991) start of dirstat Mon 18 Mar 2002 07:06:06. dirstat of /usr/include completed, 264 files. 0:00:20.245 ======== (16991) start of iofile -- stdio Mon 18 Mar 2002 07:06:06. stdio iofile on /unix completed, 7965088 chars. 0:00:20.623 ======== (16991) start of fpetraps Mon 18 Mar 2002 07:06:07. fpetraps completed. 0:00:20.631 ======== (16991) start of libdso Mon 18 Mar 2002 07:06:07. instrumenting /speedtest/generic/./dlslave.so dlslave_init executed dlslave_routine executed slaveusertime completed, x = 5000000.000000. libdso: dynamic routine returned 13 end of script `ll.u.cvt.d.i.f.dso' 0:00:22.058 ======== (16991) End script Mon 18 Mar 2002 07:06:08. |
The output statements beginning with “instrumenting declares that ssrun is instrumenting first the libraries and then the generic executable itself.
To generate a report on the data collected, enter the following at the command line:
% prof your_output_file_name > bbcounts.results |
This command redirects output to a file called bbcounts.results. The file contains results that look similar to the following partial listing. The number of functions and their names may also vary.
------------------------------------------------------------------------- SpeedShop profile listing generated Mon Mar 18 07:07:06 2002 prof generic.bbcounts.m16991 generic (n32): Target program bbcounts: Experiment name it:cu: Marching orders R5000 / R5000: CPU / FPU 1: Number of CPUs 180: Clock frequency (MHz.) Experiment notes-- From file generic.bbcounts.m16991: Caliper point 0 at target begin, PID 16991 /speedtest/generic/generic Caliper point 1 at exit(0) ------------------------------------------------------------------------- Summary of ideal time data (bbcounts)-- 2048459432: Total number of instructions executed 3266522347: Total computed cycles 18.147: Total computed execution time (secs.) 1.595: Average cycles / instruction ------------------------------------------------------------------------- Function list, in descending order by exclusive ideal time ------------------------------------------------------------------------- [index] excl.secs excl.% cum.% cycles instructions calls function (dso: file, line) [1] 17.304 95.4% 95.4% 3114690027 1956780024 1 anneal (generic.pixbb: generic.c, 1570) [2] 0.694 3.8% 99.2% 125000842 75000732 1 slaveusrtime (dlslave.so: dlslave.c, 22) [3] 0.139 0.8% 99.9% 25000068 15000054 1 cvttrap (generic.pixbb: generic.c, 318) [4] 0.002 0.0% 100.0% 348146 324941 1286 general_find_symbol (rld: rld.c, 2038) [5] 0.002 0.0% 100.0% 337349 305518 2874 resolve_relocations (rld: rld.c, 2636) [6] 0.001 0.0% 100.0% 148338 148338 1301 elfhash (rld: obj.c, 1184) [7] 0.001 0.0% 100.0% 95076 95076 4138 obj_dynsym_got (rld: objfcn.c, 46) [8] 0.000 0.0% 100.0% 88459 88459 1663 strcmp (rld: strcmp.s, 34) [9] 0.000 0.0% 100.0% 78410 69944 1 fix_all_defineds (rld: rld.c, 3419) [10] 0.000 0.0% 100.0% 76859 74309 1289 resolve_symbol (rld: rld.c, 1828) [11] 0.000 0.0% 100.0% 75590 58123 1 init2da (generic.pixbb: generic.c, 1427) [12] 0.000 0.0% 100.0% 72817 71565 1256 resolving (rld: rld.c, 1499) [13] 0.000 0.0% 100.0% 67753 62361 487 fread (libc.so.1: fread.c, 27) [14] 0.000 0.0% 100.0% 48270 45600 53 _doprnt (libc.so.1: doprnt.c, 227) [15] 0.000 0.0% 100.0% 48000 35200 1600 _drand48 (libc.so.1: drand48.c, 116) [16] 0.000 0.0% 100.0% 38783 27864 628 __sinf (libm.so: fsin.c, 97) [17] 0.000 0.0% 100.0% 34408 34320 6 search_for_externals (rld: rld.c, 3987) . . . . |
The columns in the report provide the following information:
The index column assigns a reference number to each function.
The excl.secs column shows the minimum number of seconds that might be spent in the function under ideal conditions.
The excl.% column shows how much of the program's total time was spent in the function.
The cum.% column shows the cumulative percentage of time spent in the functions listed so far.
The cycles column shows the total number of machine cycles used by the function.
The instructions column shows the total number of instructions executed by a function.
The calls column shows the total number of calls made to the function.
The function (dso:file, line) columns list the function, its DSO name, its file name, and the line number.
This section takes you through the steps to perform a floating-point exception (fpe) trace, which identifies functions in which floating-point exceptions have occurred. For more information on the fpe trace, see “Floating-Point Exception Trace Experiment (fpe)” in Chapter 4.
From the command line, enter:
% ssrun -fpe generic |
Output from generic and from ssrun is printed to stdout. A data file is created with a name generated by concatenating the process name (generic), the experiment type (fpe), and the experiment ID. In this example, the file name is generic.fpe.m12213.
To generate a report on the data collected, enter the following at the command line:
% prof your_output_file_name > fpe.results |
The report should look similar to the following partial listing:
------------------------------------------------------------------------- SpeedShop profile listing generated Mon Mar 18 07:09:16 2002 prof generic.fpe.m17002 generic (n32): Target program fpe: Experiment name fpe:cu: Marching orders R5000 / R5000: CPU / FPU 1: Number of CPUs 180: Clock frequency (MHz.) Experiment notes-- From file generic.fpe.m17002: Caliper point 0 at target begin, PID 17002 /speedtest/generic/generic Caliper point 1 at exit(0) ------------------------------------------------------------------------- Summary of FPE callstack tracing data (fpe)-- 4: Total FPEs 0: Samples with incomplete traceback ------------------------------------------------------------------------- Function list, in descending order by exclusive FPEs ------------------------------------------------------------------------- [index] excl.FPEs excl.% cum.% incl.FPEs incl.% function (dso: file, line) [1] 4 100.0% 100.0% 4 100.0% fpetraps (generic: generic.c, 406) [2] 0 0.0% 100.0% 4 100.0% main (generic: generic.c, 102) [3] 0 0.0% 100.0% 4 100.0% Scriptstring (generic: generic.c, 185) |
The report shows information for each function:
The index column assigns a reference number to each function.
The excl.FPEs column shows how many floating-point exceptions were found in the function. .
The excl.% column shows the percentage of the total number of floating-point exceptions that were found in the function.
The cum.% column shows the percentage of exclusive floating-point exceptions in the functions that have been listed so far. The list is sorted by the number of floating-point exceptions, with the most in the top line and the least in the bottom line. Because all of the exceptions are in the first function listed in this example, all entries in this column are 100%.
The incl.FPEs column shows how many floating-point exceptions were generated by the function and the functions it called.
The incl.% column shows the percentage of the program's total number of floating-point exceptions in this function and the functions it called. Because fpetraps is called through all of the other functions, they are all listed as 100%.
The function (dso:file, line) columns list the routine name, its DSO name, its file name, and its line number.