Chapter 3. System Overview

This chapter provides an overview of the physical and architectural aspects of your SGI Altix 3000 series system. The major components of the Altix 3000 series systems are described and illustrated.

The Altix 3000 series is a family of multiprocessor distributed shared memory (DSM) computer systems that scale from 4 to 64 Intel Itanium 2 processors as a cache-coherent single system image (SSI). In a DSM system, each processor contains memory that it shares with the other processors in the system. Because the DSM system is modular, it combines the advantages of low entry-level cost with global scalability in processors, memory, and I/O. You can install and operate the Altix 3000 series system in a rack in your lab or server room.

This chapter consists of the following sections:

Figure 3-1 shows the front views of a single-rack system (the Altix 3300 system) and a multiple-rack system (the Altix 3700 system).

Figure 3-1. SGI Altix 3000 Series Systems

SGI Altix 3000 Series Systems

System Models

The C-brick contains the processors (zero or four processor per C-brick) for the server system. The number of processors and whether or not a router (R-brick) is configured determines the Altix 3000 server model. The following two models, discussed in the sections that follow, are available:

  • Altix 3300 server. A 17U rack is used to house the power bay, up to three C-bricks, and one IX-brick.

  • Altix 3700 server. The 40U rack in this server houses all bricks, drives, and other components.

SGI Altix 3300 Server System

The Altix 3300 server system has up to 12 Intel Itanium 2 processors, one I/O brick (an IX-brick), no routers, and a single power bay. The system is housed in a short 17U rack enclosure with a single power distribution system (PDS). Although the L2 controller is optional with the Altix 3300 server, the L2 controller touch display is not an option. You can also add additional racks containing D-brick2s and TP900 storage modules to your Altix 3300 server system.

Figure 3-2 shows one possible configuration of an Altix 3300 server system.

Figure 3-2. SGI Altix 3300 Server System (Example Configuration)

SGI Altix 3300 Server 
System (Example Configuration)

SGI Altix 3700 Server System

The Altix 3700 server system has up to 512 Intel Itanium 2 processors, a minimum of one IX-brick for every 64 processors, and a minimum of four 8-port routers for every 32 processors. The system requires a minimum of one 40U tall rack with at least one power bay and one single-phase PDU per rack. (The single-phase PDU has two openings with three cables that extend from each opening to connect to the power bay. The three-phase PDU has two openings with six cables that extend from each opening to connect to two power bays.)

Each tall rack enclosure containing C-bricks comes with an L2 controller. An L2 controller touch display is provided in the first rack of the system (rack 001).

You can also add additional racks containing C-bricks, R-bricks, I/O bricks, and disk storage to your server system.

Figure 3-3 shows an example configuration of a 32-processor Altix 3700 server.

Figure 3-3. SGI Altix 3700 Server System (Example Configuration)

SGI Altix 3700 Server 
System (Example Configuration)

System Architecture

The Altix 3000 computer system is based on a distributed shared memory (DSM) architecture. The Altix 3000 computer system uses a global-address-space, cache-coherent multiprocessor that scales to 64 Intel Itanium 2 processors in a cache-coherent domain. Because it is modular, the DSM combines the advantages of low entry cost with the ability to scale processors, memory, and I/O independently.

The system architecture for the Altix 3000 system is a third-generation NUMAflex DSM architecture known as NUMA 3. In the NUMA 3 architecture, all processors and memory are tied together into a single logical system with special crossbar switches (R-bricks). This combination of processors, memory, and crossbar switches constitute the interconnect fabric called NUMAlink.

The basic building block for the NUMAlink interconnect is the C-brick, which is sometimes referred to as the compute node. A C-brick contains two processor nodes; each processor node consists of a Super-Bedrock ASIC and two processors with large on-chip secondary caches. The two Intel Itanium 2 processors are connected to the Super-Bedrock ASIC via a single high-speed front side bus. The two Super-Bedrock ASICS are then interconnected internally by a single 6.4-GB/s NUMAlink 4 channel.

The Super-Bedrock ASIC is the heart of the C-brick. This specialized ASIC acts as a crossbar between the processors, local SDRAM memory, the network interface, and the I/O interface. The Super-Bedrock ASIC has a total aggregate peak bandwidth of 6.4 GB/s. Its memory interface enables any processor in the system to access the memory of all processors in the system. Its I/O interface connects processors to system I/O, which allows every processor in a system direct access to every I/O slot in the system.

Another component of the NUMA 3 architecture is the router ASIC. The router ASIC is a custom designed 8-port crossbar ASIC in the R-brick. If you use the router ASIC with highly specialized cables, the R-bricks provide a high-bandwidth, extremely low-latency interconnect between all C-bricks in the system. This interconnection creates a single contiguous system memory of up to 2 TB (terabytes).

Figure 3-4 shows a functional block diagram of the Altix 3000 series system.

Figure 3-4. Functional Block Diagram of Basic System

Functional Block Diagram of Basic System

System Features

The main features of the Altix 3000 series server systems are introduced in the following sections:

Modularity and Scalability

The Altix 3000 series systems are modular systems. The components are housed in building blocks referred to as bricks. You can add different brick types to a system to achieve the desired system configuration. You can easily configure systems around processing capability, I/O capability, memory size, and storage size. You place individual bricks that create the basic functionality (compute/memory, I/O, and power) into custom 19-inch racks. The air-cooled system has redundant, hot-swap fans at the brick level and redundant, hot-swap power supplies at the rack level.

Distributed Shared Memory (DSM)

In the Altix 3000 series server, memory is physically distributed among the C-bricks (compute nodes); however, it is accessible to and shared by all compute nodes. Note the following types of memory:

  • If a processor accesses memory that is physically located on a compute node, the memory is referred to as the node's local memory.

  • The total memory within the system is referred to as global memory.

  • If processors access memory located in other C-bricks, the memory is referred to as remote memory.

Memory latency is the amount of time required for a processor to retrieve data from memory. Memory latency is lowest when a processor accesses local memory.

Distributed Shared I/O

Like DSM, I/O devices are distributed among the compute nodes (each computed node has an I/O port that can connect to an I/O brick) and are accessible by all compute nodes through the NUMAlink interconnect fabric.

ccNUMA Architecture

As the name implies, the cache-coherent non-uniform memory access (ccNUMA) architecture has two parts, cache coherency and nonuniform memory access, which are discussed in the sections that follow.

Cache Coherency

The Altix 3000 server series use caches to reduce memory latency. Although data exists in local or remote memory, copies of the data can exist in various processor caches throughout the system. Cache coherency keeps the cached copies consistent.

To keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol. In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table that is referred to as a directory. Like the blocks of memory that they represent, the directories are distributed among the compute nodes. A block of memory is also referred to as a cache line.

Each directory entry indicates the state of the memory block that it represents. For example, when the block is not cached, it is in an unowned state. When only one processor has a copy of the memory block, it is in an exclusive state. And when more than one processor has a copy of the block, it is in a shared state; a bit vector indicates which caches contain a copy.

When a processor modifies a block of data, the processors that have the same block of data in their caches must be notified of the modification. The Altix 3000 server series use an invalidation method to maintain cache coherence. The invalidation method purges all unmodified copies of the block of data, and the processor that wants to modify the block receives exclusive ownership of the block.

Non-uniform Memory Access (NUMA)

In DSM systems, memory is physically located at various distances from the processors. As a result, memory access times (latencies) are different or “non-uniform.” For example, it takes less time for a processor to reference its local memory than to reference remote memory.

Reliability, Availability, and Serviceability (RAS)

The Altix 3000 server series components have the following features to increase the reliability, availability, and serviceability (RAS) of the systems.

  • Power and cooling:

    • Power supplies are redundant and can be hot-swapped.

    • Bricks have overcurrent protection.

    • Fans are redundant and can be hot-swapped.

    • Fans run at multiple speeds in all bricks except the R-brick. Speed increases automatically when temperature increases or when a single fan fails.

  • System monitoring:

    • System controllers monitor the internal power and temperature of the bricks, and automatically shut down bricks to prevent overheating.

    • Memory, L2 cache, L3 cache, and all external bus transfers are protected by single-bit error correction and double-bit error detection (SECDED).

    • The NUMAlink interconnect network is protected by cyclic redundancy check (CRC).

    • The L1 primary cache is protected by parity.

    • Each brick has failure LEDs that indicate the failed part; LEDs are readable via the system controllers.

    • Systems support Embedded Support Partner (ESP), a tool that monitors the system; when a condition occurs that may cause a failure, ESP notifies the appropriate SGI personnel.

    • Systems support remote console and maintenance activities.

  • Power-on and boot:

    • Automatic testing occurs after you power on the system. (These power-on self-tests or POSTs are also referred to as power-on diagnostics or PODs).

    • Processors and memory are automatically de-allocated when a self-test failure occurs.

    • Boot times are minimized.

  • Further RAS features:

    • Systems support partitioning.

    • Systems have a local field-replaceable unit (FRU) analyzer.

    • All system faults are logged in files.

    • Memory can be scrubbed when a single-bit error occurs.

System Components

The Altix 3000 series system features the following major components:

  • 17U rack. This deskside rack is used for the Altix 3300 systems.

  • 40U rack. This is a custom rack used for both the compute rack and I/O rack in the Altix 3700 system. The power bays are mounted vertically on one side of the rack.

  • C-brick. This contains the compute power and memory for the Altix 3000 series system. The C-brick is 3U high and contains two Super-Bedrock ASICs, four Intel Intanium 2 processors and up to 32 memory DIMMs.

  • IX-brick. This 4U-high brick provides the boot I/O functions and 12 PCI-X slots.

  • PX-brick. This 4U-high brick provides 12 PCI-X slots on 6 buses for PCI expansion.

  • R-brick. This is a 2U-high, 8-port router brick.

  • Power bay. The 3U-high power bay holds a maximum of six power supplies that convert 220 VAC to 48 VDC. The power bay has eight 48-VDC outputs.

  • D-brick2. This is a 3U-high disk storage enclosure that holds a maximum of 16 low-profile Fibre Channel disk drives.

  • TP900 disk storage module. This is a 2U-high disk storage enclosure that holds a maximum of eight low-profile Ultra160 SCSI disk drives.

  • SGIconsole. This is a combination of hardware and software that allows you to manage multiple SGI servers.

Figure 3-5 shows the Altix 3300 system components, and Figure 3-6 shows the Altix 3700 system components.

Figure 3-5. Altix 3300 System Components

Altix 3300 System
 Components

Figure 3-6. Altix 3700 System Components

Altix 3700 
System Components

Bay (Unit) Numbering

Bays in the racks are numbered using standard units. A standard unit (SU) or unit (U) is equal to 1.75 inches (4.445 cm). Because bricks occupy multiple standard units, brick locations within a rack are identified by the bottom unit (U) in which the brick resides. For example, in a tall 40U rack, the C-brick positioned in U05, U06, and U07 is identified as C05. In a short 17U rack, the IX-brick positioned in U13, U14, U15, and U16 is identified as I13.

Rack Numbering

A rack is numbered with a three-digit number. Compute racks are numbered sequentially beginning with 001. A compute rack is a rack that contains C-bricks. I/O racks are numbered sequentially and by the physical quadrant in which the I/O rack resides. Figure 3-7 shows the rack numbering scheme for multi-rack Altix 3700 systems. The Altix 3300 system is a single compute rack system; therefore, the rack number is 001.

Figure 3-7. Rack Numbering

Rack Numbering

Optional System Components

The Altix 3000 series system has the following external storage options:

  • Host bus adapter interfaces (HBA)

    • 2Gbit Fibre Channel, 200MB/s peak bandwidth

    • Ultra160 SCSI, 160MB/s peak bandwidth

    • Gigabit Ethernet copper and optical

  • JBOD (just a bunch of disks)

    • SGI TP900 9Ultra160 SCSI

  • RAID

    • D-brick2, 2 Gbit Fibre Channel (Model 3700 only)

    • SGI TP9500, 2 Gbit Fibre Channel

  • Data servers

    • SGI File Server 830, Gigabit Ethernet interface

    • SGI File Server 850, Gigabit Ethernet interface

    • SGI SAN Server 1000, 2 Gbit Fibre Channel interface

  • Tape libraries

    • STK L20, L40, L80, L180, and L700

  • Tape drives

    • STK 9840B, 9940B, LTO, SDLT, and DLT

    • ADIC Scalar 100, Scalar 1000, Scalar 10000, and AIT