If you are experiencing problems with your Silicon Graphics® Tezro™ visual workstation, please review the material in this chapter. If you are unable to resolve the problem, contact your service provider as follows:
If you are located in North America, contact the Customer Support Center at 1-800-800-4SGI. SGI personnel will guide you through the troubleshooting process.
If you are located outside of North America, contact your local SGI subsidiary or authorized distributor.
This chapter includes the following sections:
This section covers the following topics:
The workstation monitors its environment to ensure proper operation. It will automatically power off if any of the following faults are found:
Any fan spins at less than 80% of nominal speed.
Any temperature sensor registers 158 °F (70 °C) or above.
Any voltage reaches +/- 20% of nominal.
If your workstation is powering off unexpectedly, check for these conditions.
The LEDs in the workstation bezel can provide important troubleshooting information. Table 4-1 shows a list of LED signals and what they mean.
LED Signal | Explanation |
---|---|
Blinking White | Power button pressed (On or Off) |
Solid white | Successful PROM boot/ OS running |
Solid yellow | L1 has detected a problem. Check the L1 display for more information |
Blinking red | General system failure |
Solid red | System node board failure |
The Silicon Graphics Tezro visual workstation is equipped with diagnostics to test the system hardware and diagnose part failures. These diagnostics are grouped into three categories:
Power-on diagnostics (POD)
Power-on diagnostics are PROM-resident tests that run automatically when you power on the system. As the boot process discovers hardware components, it runs power-on diagnostics to verify that each component that is needed to boot the system is working correctly. Refer to “Power-on Diagnostics” for more information about POD.
Offline diagnostics
Offline diagnostics use a standalone diagnostic environment to test the system hardware; the operating system cannot be running while you use offline diagnostics. Refer to “Offline Diagnostics
” for more information.
Online diagnostics
Online diagnostics are tests that verify system hardware while the operating system is running. To prevent data loss, you should use the online diagnostics only when the system is idle. Refer to “Online Diagnostics
” for more information.
All diagnostics are loaded on your workstation when you receive it. To upgrade to future revisions of the diagnostics, download the appropriate Customer Diagnostics package from Supportfolio (http://support.sgi.com ). Contact your service representative for more information.
![]() | Note: The diagnostics described in this document run only on Silicon Graphics Tezro visual workstations. They will not work on any other SGI systems. |
The power-on diagnostics run automatically when you power on or reset the system. As the boot process discovers hardware, it verifies that each component is functional enough to load the operating system.
The power-on diagnostics test the hardware in the following order:
CPU
Bedrock ASIC
PROM
Memory DIMMs
Secondary cache
PIC ASICs
PCI slots
Serial ports
SCSI controller
VPro graphics
If the power-on diagnostics complete successfully, the System Maintenance menu appears or the system automatically boots, depending on how the system is configured.
If the power-on diagnostics detect errors, the diagnostics disable the failing hardware and continue testing. When testing completes, the system may or may not be able to boot, depending on the hardware that has been disabled. If the system does not boot, contact your service representative.
Offline diagnostics run a sequence of tests on the system hardware under a standalone diagnostic environment; the operating system cannot be running while the offline diagnostics test the system
The offline diagnostics include a “launcher” that automatically runs a sequence of tests. In most cases, you should run the offline diagnostics automatically with the launcher. Use the following procedure to run the launcher:
Power on the system.
Wait until the System Maintenance menu appears.
![]() | Note: If the Autoload PROM variable is set to Yes, you must click on the Stop for Maintenance button to access the System Maintenance menu. |
Select the Run Diagnostics option.
![]() | Note: You can also start the launcher by entering the following command at the command monitor (PROM) prompt (>>): boot -f dksc (0,1,0) /stand/smdk/smdk --a |
The launcher automatically runs the offline diagnostics on system components in the following order:
CPU
![]() | Note: The CPU test supports single-CPU systems; if a system has more than one CPU, the CPU test does not run. |
Secondary cache
Memory DIMMs
I/O components: IO9 card and audio and I/O daughtercard (including the SCSI controller, serial ports, Ethernet port, mouse port, keyboard port, and RTO/RTI connectors)
![]() | Note: The offline diagnostics test the simpler components first and then proceed to the more complex components. |
Table 4-2 shows the approximate time required (in minutes and seconds format) to automatically run the offline diagnostics on various workstation configurations. (Your testing time will vary, depending on your hardware configuration.)
Table 4-2. Time Required to Run Offline Diagnostics
| Total Elapsed Time |
|
|
---|---|---|---|
Testing Progress | 1-CPU Workstation with 512MB memory | 2-CPU Workstation with 1GB memory | 4-CPU Workstation with 1GB memory |
CPU testing completes | 0:26 | N/A[a] | N/Aa |
Secondary cache testing completes | 1:18 | 0:25 | 1:54 |
Memory DIMM testing completes | 4:47 | 4:32 | 5:07 |
I/O testing completes | 6:15 | 5:34 | 6:09 |
[a] CPU testing is not performed on systems that have more than one CPU. |
The offline diagnostics display test status information as they run. If the diagnostics complete testing without detecting errors, the output is similar to the following example:
Starting diagnostic program... Press <Esc> to return to the menu. SMDK SGI Version 6.152 TEST built 08:41:26 AM Mar 6, 2003 smdk loading io discovery code... smdk loading launcher code... smdk> sMDK Diagnostic Launcher: Version 2.0 Built 00:42:56 Mar 6 2003 Setting up diagnostics..... term none Starting diagnostics..... Testing CACHE.......... PASSED Testing DIMM................................................. PASSED Testing IO................... PASSED FINISHED Reseting... resetting the system... |
If the launcher detects an error, it displays a FAILED status message for the hardware it is testing and stops testing. If any of the components do not pass the offline diagnostics, contact your service representative.
![]() | Caution: The runalldiags script should be run while the system is idle. If you run the online diagnostics while the system is in use, data may be lost. |
Online diagnostics are tests that verify system hardware while the operating system is running. When you run the online diagnostics from the IRIX operating system prompt, each diagnostic runs a set of tests for a certain number of loops. The online diagnostics test the following areas of the system:
CPU
Memory
I/O
Graphics
Storage devices
Network devices
The online diagnostics also run a system stress test, which tests all areas of the system under heavy load.
The runalldiags script automatically runs a sequence of online diagnostics. It runs in three modes:
Basic mode verifies memory and performs 30 minutes of stress testing. (If you want to perform regularly scheduled testing, use basic mode.)
Normal mode performs the same tests as basic mode and also performs I/O testing. (The I/O testing may disrupt any serial port and USB devices.)
Extensive mode performs more disruptive I/O testing. (Ethernet is unavailable, and USB operations are disrupted.) It also performs more intensive CPU, memory, and stress testing. Use this mode only if you suspect there is a problem with the system.
Follow these steps to run the runalldiags script:
![]() | Note: You must have root level access to the system to run online diagnostics. |
Enter the following command at the IRIX command prompt to change to the directory that contains the diagnostics:
#>cd /usr/diags/bin
Enter the following command to start the script:
#>./runalldiags [options]
![]() | Note: When you run runalldiags in -normal or -extensive modes, you should run it from the console. The Ethernet testing that runalldiags performs in -normal and -extensive modes disrupts any telnet sessions on the system. |
Refer to Table 4-3 for descriptions of the command-line options.
Table 4-3. runalldiags Command-line Options
Option | Description |
---|---|
-h | -help | Displays help information |
-basic | Runs the script in basic mode |
-normal | Runs the script in normal mode (default) |
-extensive | Runs the script in extensive mode |
-host <host> | Specifies a system to target for network tests |
-d <directory> | Specifies the directory that contains the online diagnostics |
If a diagnostic fails, the script saves the output from the diagnostic in a file in the /tmp directory (for example, /tmp/diagTestOutput.1.olenet). Output from the script indicates the actual name of the file. When a diagnostic fails, the script continues to run the remaining diagnostics.
![]() | Note: If you have USB devices connected to your workstation, you must disconnect the USB cables from the rear of the enclosure after the online diagnostics have finished running. Then reconnect the cables to restore the USB devices. |
Online diagnostics display PASS [testname] when a test passes and FAIL [testname] when a test fails. If any of the components do not pass the online diagnostics, contact your service representative.
The following example shows output from running runalldiags in basic mode with no errors:
olab1 12# ./runalldiags -basic Running online diagnostics at Basic level Time: Tue Jun 24 16:25:36 CDT 2003 System Information: IRIX64 olab1 6.5 6.5.20m 04091957 IP35 Plan on running: olmem pandora olmem - Online Memory Diagnostic (Check /var/adm/SYSLOG for error message) PASS(olmem) pandora - System Stress Test PASS(pandora) Finished running at Tue Jun 24 17:00:05 CDT 2003 Ran: 2 Failed: 0 |
The following example shows output from running runalldiags in basic mode with one error:
olab1 3# ./runalldiags -basic Running online diagnostics at Basic level Time: Tue Jun 24 10:55:36 CDT 2003 System Information: IRIX64 olab1 6.5 6.5.20m 04091957 IP35 Plan on running: olmem pandora olmem - Online Memory Diagnostic (Check /var/adm/SYSLOG for error message) PASS(olmem) pandora - System Stress Test FAIL(pandora): see /tmp/diagFailure.0.pandora Time: Tue Jun 24 11:35:38 CDT 2003 Ran: 1 Failed: 1 |