Performance monitoring and management in complex systems demands the ability to accurately capture performance characteristics for subsequent review, analysis, and comparison. Performance Co-Pilot (PCP) provides extensive support for the creation and management of archive logs that capture a user-specified profile of performance information to support retrospective performance analysis.
The following major sections are included in this chapter:
“Introduction to Archive Logging”, presents the concepts and issues involved with creating and using archive logs.
“Using Archive Logs with Performance Visualization Tools”, describes the interaction of the PCP tools with archive logs.
“Archive Log File Management”, provide information to assist you in PCP archive log file management.
“Other Archive Logging Features and Services”, provides information about other archive logging features and sevices.
“Cookbook for Archive Logging”, provides a checklist of tasks that may be performed to enable PCP archive logging with minimal effort.
“Archive Logging Troubleshooting”, presents helpful directions if your archive logging implementation is not functioning correctly.
Within the PCP, the pmlogger utility may be configured to collect archives of performance metrics. The archive creation process is easy and very flexible, incorporating the following features:
Archive log creation at either a PCP collector (typically a server) or a PCP monitor system (typically a workstation), or at some designated PCP archive logger host.
Concurrent independent logging, both local and remote. The performance analyst can activate a private pmlogger instance to collect only the metrics of interest for the problem at hand, independent of other logging on the workstation or remote host.
Record mode in various GUI monitoring tools to create archives as needed from the current visualization.
Independent determination of logging frequency for individual metrics or metric instances. For example, you could log the “5 minute” load average every half hour, the write I/O rate on the DBMS log spindle every 10 seconds, and aggregate I/O rates on the other disks every minute.
Dynamic adjustment of what is to be logged, and how frequently, via pmlc. This feature may be used to disable logging or to increase the sample interval during periods of low activity or chronic high activity (to minimize logging overhead and intrusion). A local pmlc may interrogate and control a remote pmlogger, subject to the access control restrictions implemented by pmlogger.
Self-contained logs that include all system configuration and metadata required to interpret the values in the log. These logs can be kept for analysis at a much later time, potentially after the hardware or software has been reconfigured and the logs have been stored as discrete, autonomous files for remote analysis.
Archive folios as a convenient aggregation of multiple archive logs. Archive folios may be created with the mkaf utility and processed with the pmafm tool.
Critical to the success of the PCP archive logging scheme is the fact that the library routines providing access to real-time feeds of performance metrics also provide access to the archive logs.
Live feeds (or real-time) sources of performance metrics and archives are literally interchangeable, with a single Performance Metrics Application Programming Interface (PMAPI) that preserves the same semantics for both styles of metric source. In this way, applications and tools developed against the PMAPI can automatically process either live or historical performance data.
The only restriction is that both live and historical data cannot be monitored simultaneously with the same invocation of a visualization tool.
One of the most important applications of archive logging services provided by PCP is in the area of retrospective analysis. In many cases, understanding today's performance problems can be assisted by side-by-side comparisons with yesterday's performance. With routine creation of performance archive logs, you can concurrently replay pictures of system performance for two or more periods in the past.
Archive logs are also an invaluable source of intelligence when trying to diagnose what went wrong, as in a performance postmortem. Because the PCP archive logs are entirely self-contained, this analysis can be performed off-site if necessary.
Each archive log contains metric values from only one host. However, many PCP tools can simultaneously visualize values from multiple archives collected from different hosts.
The archives can be replayed against the inference engine (pmie is an application that uses the PMAPI). This allows you to automate the regular, first-level analysis of system performance.
Such analysis can be performed by constructing suitable expressions to capture the essence of common resource saturation problems, then periodically creating an archive and playing it against the expressions. For example, you may wish to create a daily performance audit (run by the cron command) to detect performance regressions.
For more about pmie, see Chapter 5, “Performance Metrics Inference Engine”.
By collecting performance archives with relatively long sampling periods, or by reducing the daily archives to produce summary logs, the capacity planner can collect the base data required for forward projections, and can estimate resource demands and explore “what if” scenarios by replaying data using visualization tools and the inference engine.
Most PCP tools default to real-time display of current values for performance metrics from PCP collector host(s). However, most PCP tools also have the capability to display values for performance metrics retrieved from PCP archive log(s). The following sections describe plans, steps, and general issues involving archive logs and the PCP tools.
Most commonly, a PCP tool would be invoked with the -a option to process an archive log some time after pmlogger had finished creating the archive. However, a tool such as oview that uses a Time Control dialog (see “Time Duration and Control” in Chapter 3) stops when the end of archive is reached, but could resume if more data is written to the PCP archive log.
PCP archive log files can occupy a great deal of disk space, and management of archive logs can be a large task in itself. The following sections provide information to assist you in PCP archive log file management.
When a PCP archive is created by pmlogger, an archive basename must be specified and several physical files are created, as shown in Table 6-1.
Table 6-1. Filenames for PCP Archive Log Components (archive.*)
Filename | Contents |
---|---|
archive.index | Temporal index for rapid access to archive contents. |
archive.meta | Metadata descriptions for performance metrics and instance domains appearing in the archive. |
archive.N | Volumes of performance metrics values, for N = 0,1,2,... |
The PCP archive management tools support a consistent scheme for selecting the basenames for the files in a collection of archives and for mapping these files to a suitable directory hierarchy.
Once configured, the PCP tools that manage archive logs employ a consistent scheme for selecting the basename for an archive each time pmlogger is launched, namely the current date and time in the format YYYYMMDD.HH.MM. Typically, at the end of each day, all archives for a particular host on that day would be merged to produce a single archive with a basename constructed from the date, namely YYYYMMDD. The pmlogger_daily script performs this action and a number of other routine housekeeping chores.
If you are using a deployment of PCP tools and daemons to collect metrics from a variety of hosts and storing them all at a central location, you should develop an organized strategy for storing and naming your log files.
Typically, the IRIX filesystem structure can be used to reflect the number of hosts for which a pmlogger instance is expected to be running locally, obviating the need for lengthy and cumbersome filenames. It makes considerable sense to place all logs for a particular host in a separate directory named after that host. Because each instance of pmlogger can only log metrics fetched from a single host, this also simplifies some of the archive log management and administration tasks.
For example, consider the filesystem and naming structure shown in Figure 6-1.
The specification of where to place the archive log files for particular pmlogger instances is encoded in the configuration file /var/pcp/config/pmlogger/control, and this file should be customized on each host running an instance of pmlogger.
If many archives are being created, and the associated PCP collector systems form peer classes based upon service type (for example, Web servers, DBMS servers, NFS servers, and so on), then it may be appropriate to introduce another layer into the directory structure, or use symbolic links to group together hosts providing similar service types.
A single PCP archive may be partitioned into a number of volumes. These volumes may expedite management of the archive; however, the metadata file and at least one volume must be present before a PCP tool can process the archive.
You can control the size of an archive log volume by using the -v command line option to pmlogger. This option specifies how large a volume should become before pmlogger starts a new volume. Archive log volumes retain the same base filename as other files in the archive log, and are differentiated by a numeric suffix that is incremented with each volume change. For example, you might have a log volume sequence that looks like this:
netserver.log.0 netserver.log.1 netserver.log.2 |
You can also cause an existing log to be closed and a new one to be opened by sending a SIGHUP signal to pmlogger, or by using the pmlc command to change the pmlogger instructions dynamically, without interrupting pmlogger operation. Complete information on log volumes is found in the pmlogger(1) man page.
The configuration files used by pmlogger describe which metrics are to be logged. Groups of metrics may be logged at different intervals to other groups of metrics. Two states, mandatory and advisory, also apply to each group of metrics, defining whether metrics definitely should be logged or not logged, or whether a later advisory definition may change that state.
The mandatory state takes precedence if it is on or off, causing any subsequent request for a change in advisory state to have no effect. If the mandatory state is maybe, then the advisory state determines if logging is enabled or not.
The mandatory states are on, off, and maybe. The advisory states, which only affect metrics that are mandatory maybe, are on and off. Therefore, a metric that is mandatory maybe in one definition and advisory on in another definition would be logged at the advisory interval. Metrics that are not specified in the pmlogger configuration file are mandatory maybe and advisory off by default and are not logged.
A complete description of the pmlogger configuration format can be found on the pmlogger(1) man page.
Once a PCP archive log has been created, the pmdumplog utility may be used to display various information about the contents of the archive. For example, start with the following command:
pmdumplog -l /var/adm/pcplog/www.sgi.com/960731
It might produce the following output:
Log Label (Log Format Version 1) Performance metrics from host www.sgi.com commencing Wed Jul 31 00:16:34.941 1996 ending Thu Aug 1 00:18:01.468 1996 |
The simplest way to discover what performance metrics are contained within an archive is to use pminfo as shown in Example 6-1:
Example 6-1. Using pminfo to Obtain Archive Information
pminfo -a /var/adm/pcplog/www.sgi.com/960731 network.mbuf network.mbuf.alloc network.mbuf.typealloc network.mbuf.clustalloc network.mbuf.clustfree network.mbuf.failed network.mbuf.waited network.mbuf.drained |
Other archive logging features and services include PCP archive folios, manipulating archive logs, primary logger, and using pmlc.
A collection of one or more PCP archive logs may be combined with a control file to produce a PCP archive folio. Archive folios are created using either mkaf or the interactive record mode services of various PCP GUI monitoring tools.
Checking the integrity of the archives in the folio.
Displaying information about the component archives.
Executing PCP tools with their source of performance metrics assigned concurrently to all of the component archives (where the tool supports this), or serially executing the PCP tool once per component archive.
If the folio was created by a single PCP monitoring tool, replaying all of the archives in the folio with that monitoring tool.
Restricting the processing to particular archives, or the archives associated with particular hosts.
You may tailor pmlogger dynamically with the pmlc command. Normally, the pmlogger configuration is read at startup. If you choose to modify the config file to change the parameters under which pmlogger operates, you must stop and restart the program for your changes to have effect. Alternatively, you may change parameters whenever required by using the pmlc interface.
To run the pmlc tool, enter:
pmlc |
By default, pmlc acts on the primary instance of pmlogger on the current host. See the pmlc(1) man page for a description of command line options. When it is invoked, pmlc presents you with a prompt:
pmlc> |
You may obtain a listing of the available commands by entering a question mark (?) and pressing Enter. You see output similar to that in Example 6-2:
Example 6-2. Listing Available Commands
show loggers [@<host>] display <pid>s of running pmloggers connect _logger_id [@<host>] connect to designated pmlogger status information about connected pmlogger query metric-list show logging state of metrics new volume start a new log volume flush flush the log buffers to disk log { mandatory | advisory } on <interval> _metric-list log { mandatory | advisory } off _metric-list log mandatory maybe _metric-list timezone local|logger|'<timezone>' change reporting timezone help print this help message quit exit from pmlc _logger_id is primary | <pid> | port <n> _metric-list is _metric-spec | { _metric-spec ... } _metric-spec is <metric-name> | <metric-name> [ <instance> ... ] |
Here is an example:
pmlc pmlc> show loggers @babylon The following pmloggers are running on babylon: primary (1892) pmlc> connect 1892 @babylon pmlc> log advisory on 2 secs disk.dev.read pmlc> query disk.dev disk.dev.read adv on nl 5 min [131073 or “dks0d1”] adv on nl 5 min [131074 or “dks0d2”] pmlc> quit |
![]() | Note: Any changes to the set of logged metrics made via pmlc are not saved, and are lost the next time pmlogger is started with the same configuration file. Permanent changes are made by modifying the pmlogger configuration file(s). |
Refer to the pmlc(1) and pmlogger(1) man pages for complete details.
The following sections present a checklist of tasks that may be performed to enable PCP archive logging with minimal effort. For a complete explanation, refer to the other sections in this chapter and the man pages for pmlogger and related tools.
Assume you wish to activate primary archive logging on the PCP collector host pluto. Execute all of the following tasks while logged into pluto as the superuser (root).
Create the directory to hold the archive logs:
mkdir /var/adm/pcplog/pluto |
Choose a suitable pmlogger configuration file. Here are some examples:
The default configuration: /var/pcp/config/pmlogger/config.default.
A broad summary configuration, sufficient to be used with dkvis, mpvis, nfsvis, and pmkstat: /var/pcp/config/pmlogger/config.Summary.
One of the other config.* files in the /var/pcp/config/pmlogger directory, tailored for an application, a PCP add-on product, a pmchart view, or a PCP monitor tool.
Copy the chosen configuration file to /var/adm/pcplog/pluto/config.default (possibly after some customization).
Edit /var/pcp/config/pmlogger/control. Using the line for the “local primary logger” as a template, add the following line to the file:
pluto y n /var/adm/pcplog/pluto -c config.default |
Make sure PMCD and pmlogger are enabled and running:
chkconfig pmcd on chkconfig pmlogger on /etc/init.d/pcp start Performance Co-Pilot PMCD started (logfile is .... /pmcd.log) Performance Co-Pilot Primary Logger started |
Verify that the primary pmlogger instance is running:
pmlc pmlc> connect primary pmlc> status pmlogger [primary] on host pluto is logging metrics from host pluto log started Thu Aug 8 14:33:01 1996 (times in local time) last log entry Thu Aug 8 14:34:11 1996 current time Thu Aug 8 14:36:54 1996 log volume 0 log size 284 |
Verify that the archive files are being created in the correct place:
ls /var/adm/pcplog/pluto 960808.14.33.0 960808.14.33.index 960808.14.33.meta Latest pmlogger.log |
Assume you wish to create archive logs on the local host for performance metrics collected from the remote host bert. Execute all of the following tasks while logged into the local host as the superuser (root).
Procedure 6-1. Creating Archive Logs
Create the directory to hold the archive logs:
mkdir /var/adm/pcplog/bert |
Choose a suitable pmlogger configuration file. Here are three examples:
The default configuration: /var/pcp/config/pmlogger/config.default.
A broad summary configuration, sufficient to be used with dkvis, mpvis, nfsvis, and pmkstat: /var/pcp/config/pmlogger/config.Summary.
One of the other config.* files in the /var/pcp/config/pmlogger directory, tailored for an application, a PCP add-on product, a pmchart view, or a PCP monitor tool.
Copy the chosen configuration file to /var/adm/pcplog/bert/config.default (possibly after some customization).
Edit /var/pcp/config/pmlogger/control. Using the line for remote as a template, add the following line to the file:
bert n n /var/adm/pcplog/bert -c ./config.default |
Start pmlogger:
/usr/pcp/bin/pmlogger_check Restarting pmlogger for host "bert" ..... done |
Verify that the pmlogger instance is running:
pmlc pmlc> show loggers The following pmloggers are running on bert: primary (19144) pmlc> connect 19144 pmlc> status pmlogger [19144] on host ernie is logging metrics from host bert log started Thu Aug 8 10:10:10 1996 (times in local time) last log entry Thu Aug 8 14:50:54 1996 current time Thu Aug 8 14:55:48 1996 log volume 0 log size 256 |
To create archive logs on the local host for performance metrics collected from multiple remote hosts, repeat the steps in Procedure 6-1 for each remote host.
Assume the local host has been set up to create archive logs of performance metrics collected from one or more hosts (which may be either the local host or a remote host).
To activate the maintenance and housekeeping scripts for a collection of archive logs, execute the following tasks while logged into the local host as the superuser (root):
Augment the crontab file for root. For example:
crontab -l >/tmp/foo |
Edit /tmp/foo, adding lines similar to those from /var/pcp/config/pmlogger/crontab for pmlogger_daily and pmlogger_check; for example:
# daily processing of archive logs 10 0 * * * /usr/pcp/bin/pmlogger_daily # every 30 minutes, check pmlogger instances are running 25,55 * * * * /usr/pcp/bin/pmlogger_check |
Make these changes permanent with this command:
crontab </tmp/foo |
The pmlogextract tool takes a number of PCP archive logs from a single host and performs the following tasks:
Merges the archives into a single log, while maintaining the correct time stamps for all values.
Extracts all metric values within a temporal window that could encompass several archive logs.
Extracts only a configurable subset of metrics from the archive logs.
See the pmlogextract(1) man page for full information on this command. It replaced functionality of the pmlogmerge tool.
On each system for which PMCD is active (each PCP collector system), there is an option to have a distinguished instance of the archive logger pmlogger (the “primary” logger) launched each time PMCD is started. This may be used to ensure the creation of minimalist archive logs required for ongoing system management and capacity planning in the event of failure of a system where a remote pmlogger may be running, or because the preferred archive logger deployment is to activate pmlogger on each PCP collector system.
Run the following command as superuser on each PCP collector system where you want to activate the primary pmlogger:
chkconfig pmlogger on |
The primary logger launches the next time PMCD is started. If you wish this to happen immediately, follow up with this command:
/etc/init.d/pcp start |
When it is started in this fashion, the /etc/config/pmlogger.options file provides command line options for pmlogger. In the default setup, this in turn means that the initial logging state and configuration is specified in the file /var/pcp/config/pmlogger/config.default. Either one or both of these files may be modified to tailor pmlogger operation to the local requirements.
The following issues concern the creation and use of logs using pmlogger.
Symptom: | The pmlogger utility does not start, and you see this message:
| ||
Cause: | Archive logs are considered sufficiently precious that pmlogger does not empty or overwrite an existing set of archive log files. The log named foo actually consists of the physical file foo.index, foo.meta, and at least one file foo.N, where N is in the range 0, 1, 2, 3, and so on. A message similar to the one above is produced when a new pmlogger instance encounters one of these files already in existence. | ||
Resolution: | If you are sure, remove all of the parts of the archive log. For example, use the following command:
Then rerun pmlogger. |
Symptom: | The pmdumplog utility, or any tool that can read an archive log, displays this message:
| |||
Cause: | An archive consists of at least three physical files. If the base name for the archive is mylog, then the archive actually consists of the physical files mylog.index, mylog.meta, and at least one file mylog.N, where N is in the range 0, 1, 2, 3, and so on. The above message is produced if one or more of the files is missing. | |||
Resolution: | Use this command to check which files the utility is trying to open:
Turn on the internal debug flag DBG_TRACE_LOG (-D 128) to see which files are being inspected by the _pmOpenLog routine as shown in the following example:
Locate the missing files and move them all to the same directory, or remove all of the files that are part of the archive, and recreate the archive log. |
Symptom: | You have a PCP archive log that is demonstrably growing, but do not know the identify of the associated pmlogger process. | ||
Cause: | The PID is not obvious from the log, or the archive name may not be obvious from the output of the ps command. | ||
Resolution: | If the archive basename is foo, run the following commands:
All of the information describing the creator of the archive is revealed and, in particular, the instance identifier for the PMCD metrics (10728 in the example above) is the PID of the pmlogger instance, which may be used to control the process via pmlc. |
Symptom: |
| ||
Cause: | Either you are attempting to read a Version 2 archive with a PCP 1.x tool, or the archive log has become corrupted. | ||
Resolution: | By default, pmlogger in PCP release 2.0 and later generates Version 2 archives that PCP 1.0 to 1.3 tools cannot interpret. If you must use older tools, pass the -V 1 option to pmlogger, forcing it to generate Version 1 archives. |
Symptom: | Archive log files are zero size, requested metrics are not being logged, or pmlogger exits immediately with no error messages. | ||
Cause: | Either pmlogger encountered errors in the configuration file or has not flushed its output buffers yet or some (or all) metrics specified in the pmlogger configuration file have had their state changed to advisory off or mandatory off via pmlc. It is also possible that the logging interval specified in the pmlogger configuration file for some or all of the metrics is longer than the period of time you have been waiting since pmlogger started. | ||
Resolution: | If pmlogger exits immediately with no error messages, check the pmlogger.log file in the directory pmlogger was started in for any error messages. If pmlogger has not yet flushed its buffers, enter the following command:
Otherwise, use the status command for pmlc to interrogate the internal pmlogger state of specific metrics. |