If you are experiencing problems with your Silicon Graphics Fuel workstation, contact your service provider:
If you are located in North America, contact the Technical Assistant Center at 1-800-800-4SGI. SGI personnel will guide you through the troubleshooting process.
If you are located outside of North America, contact your local SGI subsidiary or authorized distributor.
This chapter includes the following sections:
This section covers the following topics:
Environmental Fault Monitoring
LED Lightbar
The workstation monitors its environment to ensure proper operation. It will automatically power off if any of the following faults are found:
Any fan spins at less than 80% of nominal speed.
Any temperature sensor registers 158 °F (70 °C) or above.
Any voltage reaches +/- 20% of nominal.
If your workstation is powering off unexpectedly, check for these conditions.
The LED lightbar on the workstation bezel can provide important troubleshooting information. Table 5-1 shows a list of LED signals and what they mean.
Table 5-1. LED Lightbar Signals
LED Lightbar Signal | Explanation |
---|---|
Blinking white | Power button pressed (On or Off) |
Solid white | Successful PROM boot/ OS running |
Solid red | System board failure |
Blinking red | During boot sequence: memory error While OS is running: kernel panic |
Blinking red and white | Graphics configuration error |
The Silicon Graphics Fuel visual workstation is equipped with diagnostics to test the system hardware and diagnose part failures. These diagnostics are grouped into three categories:
Power-on diagnostics (POD)
Power-on diagnostics are PROM-resident tests that run automatically when you power on the system. As the boot process discovers hardware components, it runs power-on diagnostics to verify that each component that is needed to boot the system is working correctly. Refer to “Power-on Diagnostics” for more information about POD.
Offline diagnostics
Offline diagnostics use a standalone diagnostic environment to test the system hardware; the operating system cannot be running while you use offline diagnostics. Refer to “Offline Diagnostics” for more information.
Online diagnostics
Online diagnostics are tests that verify system hardware while the operating system is running. To prevent data loss, you should use the online diagnostics only when the system is idle. Refer to “Online Diagnostics” for more information.
![]() | Note: The diagnostics described in this document run only on Silicon Graphics Fuel visual workstations. They will not work on any other SGI systems. |
The power-on diagnostics run automatically when you power on or reset the system. As the boot process discovers hardware, it verifies that each component is functional enough to load the operating system.
The power-on diagnostics test the hardware in the following order:
CPU
Bedrock ASIC
PROM
Memory DIMMs
Secondary cache
Xbridge ASIC
PCI slots
Serial ports
SCSI controller
Keyboard and mouse
VPro graphics
Ethernet port
If the power-on diagnostics complete successfully, the System Maintenance menu appears or the system automatically boots, depending on how the system is configured.
If the power-on diagnostics detect errors, the diagnostics disable the failing hardware and continue testing. When testing completes, the system may or may not be able to boot, depending on the hardware that has been disabled. If the system does not boot, contact your service representative. For more information about product support, refer to “ Product Support”.
Offline diagnostics run a sequence of tests on the system hardware under a standalone diagnostic environment; the operating system cannot be running while the offline diagnostics test the system
The offline diagnostics include a “launcher” that automatically runs a sequence of tests. In most cases, you should run the offline diagnostics automatically with the launcher. Use the following procedure to run launcher:
Power on the system.
Wait until the System Maintenance menu appears.
![]() | Note: If the Autoload PROM variable is set to Yes, you must click on the Stop for Maintenance button to access the System Maintenance menu. |
Select the Run Diagnostics option.
![]() | Note: You can also start the launcher by entering the following command at the command monitor (PROM) prompt (>>): boot -f dksc (0,1,0) /stand/smdk/smdk --a |
The launcher automatically runs the offline diagnostics on system components in the following order:
CPU
Secondary cache
Memory DIMMs
Motherboard (including the USB ports, serial ports, Ethernet port, parallel port, mouse port, keyboard port, Xbridge ASIC, and PCI slots)
![]() | Note: The offline diagnostics test the simpler components first and then proceed to the more complex components. |
Table 5-2 shows the approximate time required (in minutes and seconds format) to automatically run the offline diagnostics on a workstation with a 500-MHz processor and 512 MB of memory. (Your testing time will vary, depending on your hardware configuration.)
Table 5-2. Time Required to Run Offline Diagnostics
Testing Progress | Total Elapsed Time |
---|---|
The launcher boot-up sequence starts | 0:00 |
The launcher boot-up sequence completes | 0:10 |
PIMM testing completes | 0:40 |
Secondary cache testing completes | 1:17 |
Memory DIMM testing completes | 5:05 |
Motherboard testing completes | 7:30 |
The offline diagnostics display test status information as they run. If the diagnostics complete testing without detecting errors, the output is similar to the following example:
SMDK SGI Version 6.93 TEST built 10:20:12 AM Sep 21, 2001 smdk loading io discovery code... smdk loading launcher code... smdk>term none Setting up diagnostics..... Starting diagnostics..... Testing PIMM........ PASSED Testing CACHE................ PASSED Testing DIMM........................................................................................................................................................................................................................................................................................................ PASSED Testing Mother Board... FINISHED All diagnostics passed. resetting the system... |
If the launcher detects an error, it displays a FAILED status message for the hardware it is testing and stops testing. If any of the components do not pass the offline diagnostics, contact your service representative.
![]() | Caution: The runalldiags script should be run while the system is idle. If you run the online diagnostics while the system is in use, data may be lost. |
Online diagnostics are tests that verify system hardware while the operating system is running. When you run the online diagnostics from the IRIX operating system prompt, each diagnostic runs a set of tests for a certain number of loops. The online diagnostics test the following areas of the system:
CPU
Memory
I/O
Graphics
Storage devices
Network devices
The online diagnostics also run a system stress test, which tests all areas of the system under heavy load.
The Customer Diagnostics 1.0 CD, SGI part number 812-1122-001, includes the online diagnostics that are available for customer use. This CD ships with all Silicon Graphics Fuel visual workstations. You need to install files from the CD on a system before you can run the online diagnostics. The CD booklet includes installation procedures.
The runalldiags script automatically runs a sequence of online diagnostics. It runs in three modes:
Basic mode verifies memory and performs 30 minutes of stress testing. (If you want to perform regularly scheduled testing, use basic mode.)
Normal mode performs the same tests as basic mode and also performs I/O testing. (The I/O testing may disrupt the serial port and USB devices.)
Extensive mode performs more disruptive I/O testing. (Ethernet is unavailable, and USB operations are disrupted.) It also performs more intensive CPU, memory, and stress testing. Use this mode only if you suspect there is a problem with the system.
Follow these steps to run the runalldiags script:
![]() | Note: You must have root level access to the system to run online diagnostics. |
Enter the following command at the command prompt to change to the directory that contains the diagnostics:
cd /usr/diags/bin
Enter the following command to start the script:
./runalldiags [options]
![]() | Note: When you run runalldiags in -normal or -extensive modes, you should run it from the console. The Ethernet testing that runalldiags performs in -normal and -extensive modes disrupts any telnet sessions on the system. |
Refer to Table 5-3 for descriptions of the command-line options.
Table 5-3. runalldiags Command-line Options
Option | Description |
---|---|
-h | -help | Displays help information |
-basic | Runs the script in basic mode |
-normal | Runs the script in normal mode (default) |
-extensive | Runs the script in extensive mode |
-host <host> | Specifies a system to target for network tests |
-d <directory> | Specifies the directory that contains the online diagnostics |
If a diagnostic fails, the script saves the output from the diagnostic in a file in the /tmp directory (for example, /tmp/diagTestOutput.1.olenet). Output from the script indicates the actual name of the file. When a diagnostic fails, the script continues to run the remaining diagnostics.
![]() | Note: If you have USB devices connected to your workstation, you must disconnect the USB cables from the rear of the enclosure after the online diagnostics have finished running. Then reconnect the cables to restore the USB devices. |
Online diagnostics display PASS [testname] when a test is passed, and FAIL [testname] when a test is failed.
The following example shows output from running runalldiags in basic mode with no errors:
shad# ./runalldiags -basic
Running online diagnostics at Basic level
Time: Mon Oct 1 10:55:53 CDT 2001
System Information: IRIX64 shad 6.5-wolfi-root-SN1O 6.5.10m 07171440 IP35
Plan on running: olmem pandora
olmem - Online Memory Diagnostic (Check /var/adm/SYSLOG for error message)
/usr/diags/bin/olmem
PASS(olmem)
pandora - System Stress Test
/usr/diags/bin/pandora -runtime 30
PASS(pandora)
Finished running at Mon Oct 1 11:35:38 CDT 2001
Ran: 2 Failed: 0
|
The following example shows output from running runalldiags in basic mode with one error:
shad# ./runalldiags -basic
Running online diagnostics at Basic level
Time: Mon Oct 1 10:55:53 CDT 2001
System Information: IRIX64 shad 6.5-wolfi-root-SN1O 6.5.10m 07171440 IP35
Plan on running: olmem pandora
olmem - Online Memory Diagnostic (Check /var/adm/SYSLOG for error message)
/usr/diags/bin/olmem
PASS(olmem)
pandora - System Stress Test
/usr/diags/bin/pandora -runtime 30
FAIL(pandora): see /tmp/diagFailure.0.pandora
Finished running at Mon Oct 1 11:35:38 CDT 2001
Ran: 1 Failed: 1
|
If any of the components do not pass the online diagnostics, contact your service representative.