This chapter discusses the following:
Also see “Communication Paths in a Coexecution Cluster” in Appendix A.
CXFS allows groups of computers to coherently share large amounts of data while maintaining high performance.
The SGI IRIS FailSafe product provides a general facility for providing highly available services. If one of the IRIX nodes in the cluster or one of the node's components fails, a different IRIX node in the cluster restarts the highly available services of the failed node. To CXFS clients, the services on the replacement node are indistinguishable from the original services before failure occurred. It appears as if the original node has crashed and rebooted quickly. The CXFS clients notice only a brief interruption in the highly available service.
You can therefore use FailSafe in a CXFS cluster (known as coexecution) to provide highly available services (such as NFS or web) running on a CXFS filesystem. This combination provides high-performance shared data access for highly available applications in a clustered system.
CXFS 6.5.10 or later and IRIS FailSafe 2.1 or later (plus relevant patches) may be installed and run on the same system.
A subset of IRIX nodes in a coexecution cluster can be configured to be used as FailSafe nodes; a coexecution cluster can have up to eight nodes that run FailSafe.
All nodes in a CXFS cluster will run CXFS, and up to eight of those IRIX nodes can also run FailSafe. Even when you are running CXFS and FailSafe, there is still only one pool, one cluster, and one cluster configuration.
It is recommended that a production cluster be configured with a minimum of 3 server-capable nodes; there can be a maximum of 32 nodes, of which can be a maximum of 16 CXFS administration nodes. (A cluster with serial hardware reset cables and only two server-capable nodes is supported, but there are inherent issues with this configuration; see “CXFS Recovery Issues in a Cluster with Only Two Server-Capable Nodes ” in Appendix B.)
The cluster can be one of three types:
FailSafe. In this case, all nodes will also be of type FailSafe. The nodes must all be IRIX nodes.
CXFS. In this case, all nodes will be of type CXFS. The nodes can be either IRIX nodes or nodes running other operating systems as the Solaris or Windows.
CXFS and FailSafe (coexecution). In this case, all nodes will be a mix of type CXFS (IRIX nodes or nodes running other operating systems) and type CXFS and FailSafe (IRIX nodes), using FailSafe for application-level high availability and CXFS.
![]() | Note: Although it is possible to configure a coexecution cluster with type FailSafe only nodes, SGI does not support this configuration. |
Figure 7-1 describes some of the various legal and illegal combinations.
All potential metadata server nodes must be of one of the following types:
CXFS
CXFS and FailSafe
There is one cmgr(1M) (cluster_mgr) command but separate graphical user interfaces (GUIs) for CXFS and for FailSafe. You must manage CXFS configuration with the CXFS GUI and FailSafe configuration with the FailSafe GUI; you can manage both with cmgr.
Using the CXFS GUI or cmgr(1M), you can convert an existing FailSafe cluster and nodes to type CXFS or to type CXFS and FailSafe. You can perform a parallel action using the FailSafe GUI. A converted node can be used by FailSafe to provide application-level high-availability and by CXFS to provide clustered filesystems. See “Set Up an Existing FailSafe Cluster for CXFS with the GUI” in Chapter 4.
However:
You cannot change the type of a node if the respective high availability (HA) or CXFS services are active. You must first stop the services for the node.
The cluster must support all of the functionalities (FailSafe and/or CXFS) that are turned on for its nodes; that is, if your cluster is of type CXFS, then you cannot modify a node that is already part of the cluster so that it is of type FailSafe. However, the nodes do not have to support all the functionalities of the cluster; that is, you can have a CXFS node in a CXFS and FailSafe cluster.
See “Convert a Node to CXFS or FailSafe with cmgr” in Chapter 5, and “Convert a Cluster to CXFS or FailSafe with cmgr” in Chapter 5.
For FailSafe, you must have at least two network interfaces. However, CXFS uses only one interface for both heartbeat and control messages. (The CXFS GUI appears to let you select only heartbeat or only control for a network, but you must not choose these selections.)
When using FailSafe and CXFS on the same node, only the priority 1 network will be used for CXFS and it must be set to allow both heartbeat and control messages.
![]() | Note: CXFS will not fail over to the second network. If the priority 1 network fails, CXFS will fail but FailSafe services may move to the second network if the node is CXFS and FailSafe.
If CXFS resets the node due to the loss of the priority 1 network, it will cause FailSafe to remove the node from the FailSafe membership; this in turn will cause resource groups to fail over to other FailSafe nodes in the cluster. |
Do not use a CXFS tie-breaker node if you have only two FailSafe nodes.
The metadata server list must exactly match the failover domain list (the names and the order of names).
FailSafe provides a CXFS resource type that can be used to fail over applications that use CXFS filesystems. CXFS resources must be added to the resource group that contain the resources that depend on a CXFS filesystem. The CXFS resource type name is the CXFS filesystem mount point.
The CXFS resource type has the following characteristics:
It does not start all resources that depend on CXFS filesystem until the CXFS filesystem is mounted on the local node.
The start and stop action scripts for the CXFS resource type do not mount and unmount CXFS filesystems, respectively. (The start script waits for the CXFS filesystem to become available; the stop script does nothing but its existence is required by FailSafe.) Users should use the CXFS GUI or cmgr(1M) command to mount and unmount CXFS filesystems.
It monitors CXFS filesystem for failures.
Optionally, for applications that must run on a CXFS metadata server, the CXFS resource type relocates the CXFS metadata server when there is an application failover. In this case, the application failover domain (AFD) for the resource group should consists of the CXFS metadata server and the metadata server backup nodes.
The CXFS filesystems that an NFS server exports should be mounted on all nodes in the failover domain using the CXFS GUI or the cmgr(1M) command.
For example, following are the commands used to create resources NFS, CXFS, and statd_unlimited based on a CXFS filesystem mounted on /FC/lun0_s6. (This example assumes that you have defined a cluster named test-cluster and have already created a failover policy named cxfs-fp and a resource group named cxfs-group based on this policy. Line breaks added for readability.)
cmgr> define resource /FC/lun0_s6 of resource_type CXFS in cluster test-cluster Enter commands, when finished enter either "done" or "cancel" Type specific attributes to create with set command: Type Specific Attributes - 1: relocate-mds No resource type dependencies to add resource /FC/lun0_s6 ? set relocate-mds to false resource /FC/lun0_s6 ? done ============================================ cmgr> define resource /FC/lun0_s6 of resource_type NFS in cluster test-cluster Enter commands, when finished enter either "done" or "cancel" Type specific attributes to create with set command: Type Specific Attributes - 1: export-info Type Specific Attributes - 2: filesystem No resource type dependencies to add resource /FC/lun0_s6 ? set export-info to rw resource /FC/lun0_s6 ? set filesystem to /FC/lun0_s6 resource /FC/lun0_s6 ? done ============================================ cmgr> define resource /FC/lun0_s6/statmon of resource_type statd_unlimited in cluster test-cluster Enter commands, when finished enter either "done" or "cancel" Type specific attributes to create with set command: Type Specific Attributes - 1: ExportPoint Resource type dependencies to add: Resource Dependency Type - 1: NFS resource /FC/lun0_s6/statmon ? set ExportPoint to /FC/lun0_s6 resource /FC/lun0_s6/statmon ? add dependency /FC/lun0_s6 of type NFS resource /FC/lun0_s6/statmon ? done ============================================== cmgr> define resource_group cxfs-group in cluster test-cluster Enter commands, when finished enter either "done" or "cancel" resource_group cxfs-group ? set failover_policy to cxfs-fp resource_group cxfs-group ? add resource /FC/lun0_s6 of resource_type NFS resource_group cxfs-group ? add resource /FC/lun0_s6 of resource_type CXFS resource_group cxfs-group ? add resource /FC/lun0_s6/statmon of resource_type statd_unlimited resource_group cxfs-group ? done |
For more information about resource groups and failover domains, see the IRIS FailSafe Version 2 Administrator's Guide.