This chapter provides the following sections to help you troubleshoot your system:
Table 13-1 lists recommended actions for problems that can occur. To solve problems that are not listed in this table, use the SGI Electronic Support system or contact your SGI system support engineer (SSE). For more information about the SGI Electronic Support system, see the “SGI Electronic Support ”.
Table 13-1. Troubleshooting Chart
Recommended Action | |
---|---|
The system will not power on. | Ensure that the power cord of the PDU is seated properly in the power receptacle. Ensure that the PDU circuit breaker is on. If the power cord is plugged in and the circuit breaker in on, contact your SSE. |
An individual brick will not power on. | Ensure that the power switch (if applicable) at the rear of the brick is on (1 position). View the L1 display; see Table 13-2 if an error message is present. If the L1 controller is not running, contact your SSE. Check the connection between the brick and its power source. |
The system will not boot the operating system. | Contact your SSE. |
The Service Required LED illuminates on a CR-brick, an R-brick, an IX-brick, or a PX-brick. | View the L1 display of the failing brick; see Table 13-2 for a description of the error message. |
The Failure LED illuminates on a CR-brick, an R-brick, an IX-brick, or a PX-brick. | View the L1 display of the failing brick; see Table 13-2 for a description of the error message. |
The green or yellow LED of a NUMAlink port (rear of R-brick) is not illuminated. | Ensure that the NUMAlink cable is seated properly on the R-brick and the destination brick. |
The PWR LED of a populated PCI slot is not illuminated. | Reseat the PCI card. |
The Fault LED of a populated PCI slot is illuminated (on). | Reseat the PCI card. If the fault LED remains on, replace the PCI card. |
The System Status LED of the TP900 is amber. | Contact your SSE. |
The Power Status LED of the TP900 is amber. | Contact your SSE to replace the power supply module. The power supply module also has an amber LED that indicates a fault. |
The Cooling Status LED of the TP900 is amber. | Contact your SSE to replace the cooling module. The cooling module also has an amber LED that indicates a fault. |
The amber LED of a disk drive is on. | Replace the disk drive. |
Table 13-2 lists error messages that the L1 controller generates and displays on the L1 display. This display is located on the front of the CR-bricks, R-bricks, IX-brick, and PX-bricks.
![]() | Note: In Table 13-2, a voltage warning occurs when a supplied level of voltage is below or above the nominal (normal) voltage by 10 percent. A voltage fault occurs when a supplied level is below or above the nominal voltage by 20 percent. |
Table 13-2. L1 Controller Messages
L1 System Controller Message | Message Meaning and Action Needed |
---|---|
Internal voltage messages: |
|
ATTN: <power VRM description> high fault limit reached @ x.xxV | 30-second power-off sequence for the brick. |
ATTN: <power VRM description> low fault limit reached @ x.xxV | 30-second power-off sequence for the brick. |
ATTN: <power VRM description> high warning limit reached @ x.xxV | A higher than nominal voltage condition is detected. |
ATTN: <power VRM description> low warning limit reached @ x.xxV | A lower than nominal voltage condition is detected. |
ATTN: <power VRM description> level stabilized @ x.xxV | A monitored voltage level has returned to within acceptable limits. |
Fan messages: |
|
ATTN: FAN <fan description> fault limit reached @ xx RPM | A fan has reached its maximum RPM level. The ambient temperature may be too high. Check to see if a fan has failed. |
ATTN: FAN <fan description> warning limit reached @ xx RPM | A fan has increased its RPM level. Check the ambient temperature. Check to see if the fan stabilizes. |
ATTN: FAN <fan description> stabilized @ xx RPM | An increased fan RPM level has returned to normal. |
ATTN: <temp sensor description> advisory temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 30° C. |
ATTN: <temp sensor description> critical temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 35 °C. |
ATTN: <temp sensor description> fault temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 40 °C. |
Temperature messages: high alt. |
|
ATTN: <temp sensor description> advisory temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 27 °C. |
ATTN: <temp sensor description> critical temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 31 °C. |
ATTN: <temp sensor description> fault temperature reached @ xxC xxF | The ambient temperature at the brick's air inlet has exceeded 35 °C. |
Temperature stable message: |
|
ATTN: <temp sensor description> stabilized | The ambient temperature at the brick's air inlet has returned to an acceptable level. |
Power-off messages: |
|
Auto power down in xx seconds | The L1 controller has registered a fault and is shutting down. The message displays every five seconds until shutdown. |
Brick appears to have been powered down | The L1 controller has registered a fault and has shut down. |
SGI Electronic Support provides system support and problem-solving services that function automatically, which helps resolve problems before they can affect system availability or develop into actual failures. SGI Electronic Support integrates several services so they work together to monitor your system, notify you if a problem exists, and search for solutions to problems.
Figure 13-1 shows the sequence of events that occurs if you use all of the SGI Electronic Support capabilities.
The sequence of events can be described as follows:
Embedded Support Partner (ESP) monitors your system 24 hours a day.
When a specified system event is detected, ESP notifies SGI via e-mail (plain text or encrypted).
Applications that are running at SGI analyze the information, determine whether a support case should be opened, and open a case if necessary. You and SGI support engineers are contacted (via pager or e-mail) with the case ID and problem description.
SGI Knowledgebase searches thousands of tested solutions for possible fixes to the problem. Solutions that are located in SGI Knowledgebase are attached to the service case.
You and the SGI support engineers can view and manage the case by using Supportfolio Online as well as search for additional solutions or schedule maintenance.
Implement the solution.
Most of these actions occur automatically, and you may receive solutions to problems before they affect system availability. You also may be able to return your system to service sooner if it is out of service.
In addition to the event monitoring and problem reporting, SGI Electronic Support monitors both system configuration (to help with asset management) and system availability and performance (to help with capacity planning).
The following three components compose the integrated SGI Electronic Support system:
SGI Embedded Support Partner (ESP) is a set of tools and utilities that are embedded in the SGI Linux ProPack release. ESP can monitor a single system or group of systems for system events, software and hardware failures, availability, performance, and configuration changes, and then perform actions based on those events. ESP can detect system conditions that indicate potential problems, and then alert appropriate personnel by pager, console messages, or e-mail (plain text or encrypted). You also can configure ESP to notify an SGI call center about problems; ESP then sends e-mail to SGI with information about the event.
SGI Knowledgebase is a database of solutions to problems and answers to questions that can be searched by sophisticated knowledge management tools. You can log on to SGI Knowledgebase at any time to describe a problem or ask a question. Knowledgebase searches thousands of possible causes, problem descriptions, fixes, and how-to instructions for the solutions that best match your description or question.
Supportfolio Online is a customer support resource that includes the latest information about patch sets, bug reports, and software releases.
The complete SGI Electronic Support services are available to customers who have a valid SGI Warranty, FullCare, FullExpress, or Mission-Critical support contract. To purchase a support contract that allows you to use the complete SGI Electronic Support services, contact your SGI sales representative. For more information about the various support contracts, see the following Web page:
http://www.sgi.com/support/customerservice.html
For more information about SGI Electronic Support, see the following Web page: