Fujitsu SPARC Enterprise M4000 Features Manual
Fujitsu SPARC Enterprise M4000 Features Manual

Fujitsu SPARC Enterprise M4000 Features Manual

Dynamic reconfiguration (dr) user's guide
Hide thumbs Also See for SPARC Enterprise M4000:
Table of Contents

Quick Links

SPARC Enterprise
M4000/M5000/M8000/M9000 Servers
Dynamic Reconfiguration (DR) User's Guide
Part No.: E27809-01,
Manual Code: C120-E335-09EN
January 2012
Table of Contents
loading

Summary of Contents for Fujitsu SPARC Enterprise M4000

  • Page 1 SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User's Guide Part No.: E27809-01, Manual Code: C120-E335-09EN January 2012...
  • Page 2 INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Unless otherwise expressly set forth in such agreement, to the extent allowed by applicable law, in no event shall Oracle or Fujitsu Limited, and/or any of their affiliates have any liability to any third party under any legal theory for any loss of revenues or profits, loss of use or data, or business interruptions, or for any indirect, special, incidental or consequential damages, even if advised of the possibility of such damages.
  • Page 3 OU À L’ABSENCE DE CONTREFAÇON, SONT EXCLUES, DANS LA MESURE AUTORISÉE PAR LA LOI APPLICABLE. Sauf mention contraire expressément stipulée dans ce contrat, dans la mesure autorisée par la loi applicable, en aucun cas Oracle ou Fujitsu Limited et/ou l’une ou l’autre de leurs sociétés affiliées ne sauraient être tenues responsables envers une quelconque partie tierce, sous quelque théorie juridique que ce soit, de tout manque à...
  • Page 5: Table Of Contents

    Contents Preface ix Overview of Dynamic Reconfiguration 1–1 DR 1–1 Basic DR Functions 1–5 1.2.1 Adding a System Board 1–6 1.2.2 Deleting a System Board 1–6 1.2.3 Moving a System Board 1–6 1.2.4 Replacing a System Board 1–7 Security 1–7 Overview of DR User Interfaces 1–7 What You Must Know Before Using DR 2–1 System Configuration 2–1...
  • Page 6 2.1.4 Checklists for System Configuration 2–11 2.1.5 Reservation of Domain Configuration Changes 2–12 Conditions and Settings Using XSCF 2–13 2.2.1 Conditions Using XSCF 2–13 2.2.2 Settings Using XSCF 2–13 2.2.2.1 Configuration Policy Option 2–14 2.2.2.2 Floating Board Option 2–14 2.2.2.3 Omit-memory Option 2–15 2.2.2.4 Omit-I/O Option 2–16...
  • Page 7 2.5.5 Capacity on Demand (COD) 2–29 2.5.6 XSCF Failover 2–29 2.5.7 Kernel Memory Board Deletion 2–29 2.5.8 Deletion of Board with CD-RW/DVD-RW Drive 2–30 2.5.9 SPARC64 VII+, SPARC64 VII, and SPARC64 VI Processors and CPU Operational Modes 2–30 2.5.9.1 CPU Operational Modes 2–31 DR User Interface 3–1 How To Use the DR User Interface 3–1 3.1.1...
  • Page 8 Example: Adding a System Board 4–7 Example: Deleting a System Board 4–9 Example: Moving a System Board 4–11 Examples: Replacing a System Board 4–13 4.5.1 Example: Replacing a Uni-XSB System Board 4–13 4.5.2 Example: Replacing a Quad-XSB System Board 4–16 Examples: Reserving Domain Configuration Changes 4–20 4.6.1 Example: Reserving a System Board Add 4–20...
  • Page 9: Preface

    Preface This guide describes the Dynamic Reconfiguration (DR) feature of SPARC Enterprise M4000/M5000/M8000/M9000 servers from Oracle and Fujitsu. DR enables users to add, remove or exchange system boards in the M4000/M5000 (midrange) and M8000/M9000 (high-end) servers while the domains that contain these boards remain up and running.
  • Page 10: Related Documentation

    SPARC Enterprise M4000/M5000 Servers Overview Guide SPARC Enterprise M8000/M9000 Servers Overview Guide SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Important Legal and Safety Information SPARC Enterprise M4000/M5000 Servers Safety and Compliance Guide SPARC Enterprise M8000/M9000 Servers Safety and Compliance Guide External I/O Expansion Unit Safety and Compliance Guide SPARC Enterprise M4000 Server Unpacking Guide SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide •...
  • Page 11 SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User’s Guide SPARC Enterprise M4000/M5000/M8000/M9000 Servers Capacity on Demand (COD) User’s Guide † SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers Product Notes...
  • Page 12: Text Conventions

    Text Conventions This manual uses the following fonts and symbols to express specific types of information. Font/Symbol Meaning Example What you type, when contrasted AaBbCc123 XSCF> adduser jsmith with on-screen computer output. This font represents the example of command input in the frame. The names of commands, files, and AaBbCc123 XSCF>...
  • Page 13: Documentation Feedback

    If you have any comments or requests regarding this document, go to the following websites: For Oracle users: ■ http://www.oracle.com/goto/docfeedback Include the title and part number of your document with your feedback: SPARC Enterprise M4000/M5000/M8000/M9000 Servers Dynamic Reconfiguration (DR) User’s Guide, part number E27809-01 For Fujitsu users: ■ http://www.fujitsu.com/global/contact/computing/sparce_index.html Preface...
  • Page 14 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • January 2012...
  • Page 15: Overview Of Dynamic Reconfiguration

    C H A P T E R Overview of Dynamic Reconfiguration This chapter provides an overview of Dynamic Reconfiguration, which is controlled by the eXtended System Control Facility (XSCF). This chapter includes these sections: Section 1.1, “DR” on page 1-1 ■...
  • Page 16 I/O resources between domains. SPARC Enterprise M4000/M5000/M8000/M9000 servers have a unique partitioning feature that can divide one physical system board (PSB) into one logical board (undivided status) or four logical boards. A PSB that is logically divided into one board (undivided status) is called a Uni-XSB, whereas a PSB that is logically divided into four boards is called a Quad-XSB.
  • Page 17 Uni-XSB and Quad-XSB (High-end Servers) FIGURE 1-2 Uni-XSB Quad-XSB XSB XSB XSB XSB System boards list DR-related terms. TABLE 1-1 TABLE 1-2 Basic DR Terms TABLE 1-1 Term Definition To connect a system board to a domain and configure it into the Oracle Solaris OS of the domain.
  • Page 18 Basic DR Terms (Continued) TABLE 1-1 Term Definition Unconfigure To unconfigure a system board in the Oracle Solaris OS. Reserve To reserve a system board such that it is assigned to or unassigned from a domain on the next reboot or power-cycle. Install To insert a system board into a system.
  • Page 19: Basic Dr Functions

    Terms Related to Hardware Configurations (Continued) TABLE 1-2 Term Definition System board The hardware resources of a PSB or an XSB. A system board is used to describe the hardware resources for operations such as domain construction and identification. In this manual, this refers to the XSB. Uni-XSB One of the division types of a PSB.
  • Page 20: Adding A System Board

    In the example shown in , system board #2 is deleted from domain A and FIGURE 1-3 added to domain B. In this way, the physical configuration of the hardware (mounting locations) is not changed but the logical configuration is changed for management of the system boards.
  • Page 21: Replacing A System Board

    1.2.4 Replacing a System Board You can use DR to remove a system board from a domain and either add it back later, or replace it with another system board, provided both boards satisfy DR requirements as described in this document. You can do so without stopping the Oracle Solaris OS running in either domain.
  • Page 22 For details of XSCF shell commands provided for DR, see Section 3.1, “How To Use the DR User Interface” on page 3-1. XSCF Web is beyond the scope of this document. See the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF User’s Guide for further information. SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide •...
  • Page 23: What You Must Know Before Using Dr

    C H A P T E R What You Must Know Before Using This chapter provides information you must know to successfully use the DR functions. This chapter includes these sections: Section 2.1, “System Configuration” on page 2-1 ■ Section 2.2, “Conditions and Settings Using XSCF” on page 2-13 ■...
  • Page 24 Note – Due to diagnostic requirements, the DR function works only on boards that have at least one CPU and memory. Example of Hardware Configuration (with Uni-XSB of Midrange Server) FIGURE 2-1 Memory I/O device Memory I/O device XSB 00-0 Memory Memory Memory...
  • Page 25 Example of Hardware Configuration (with Quad-XSBs of Midrange Server) FIGURE 2-2 Memory XSB 00-0 I/O device Memory XSB 00-1 I/O device Memory XSB 00-2 Memory XSB 00-3 Memory I/O device XSB 01-0 Memory I/O device XSB 01-1 Memory XSB 01-2 Memory XSB 01-3 Chapter 2...
  • Page 26: Cpu

    Example of a Hardware Configuration (with Uni-XSBs of High-end Server) FIGURE 2-3 Memory I/O device Memory I/O device XSB 00-0 Memory I/O device Memory I/O device Example of a Hardware Configuration (with Quad-XSBs of High-end Server) FIGURE 2-4 Memory I/O device XSB 00-0 Memory I/O device...
  • Page 27: Memory

    A CPU to be deleted must meet the following conditions: No running process is bound to the CPU to be deleted. If a running process is ■ bound to the target CPU, you must unbind or stop the process. The CPU to be deleted does not belong to any processor set. If the target ■...
  • Page 28 (1) Kernel Memory Board A kernel memory board is a system board on which kernel memory (memory internally used by the Oracle Solaris OS and containing an OpenBoot PROM program) is loaded. Kernel memory cannot be removed from the system. But the location of kernel memory can be controlled, and kernel memory can be copied from one board to another.
  • Page 29 (1.3) Kernel Memory Assignment When a domain is powered on, the Power On Self Test (POST) initially assigns an address space to each system board in that domain. The order in which address spaces are assigned depends on the LSB number and floating board option of each system board.
  • Page 30 If more than one system board satisfies all the selection criteria to the same degree ■ of satisfaction, the one with the lowest LSB number is selected as the copy-destination board. Note – If no system boards meet the selection criteria, the DR operation to delete the kernel memory board will fail.
  • Page 31: I/O Device

    Although such moving of memory (called save processing) requires a certain length of time, system operations can continue during save processing because it is executed as a background task. Note – The Dynamic Intimate Shared Memory (DISM) is a feature that allows applications to dynamically resize their ISM segments.
  • Page 32: System Board Configuration Requirements

    configuration software to make the access path to each requisite I/O device redundant. For a disk drive unit, you can make the unit redundant by using disk mirroring software. If a device driver that does not support DR is used in the domain, all access to I/O devices controlled by the device driver must be stopped, and the device driver must be unloaded by using the modunload(1M) command.
  • Page 33: Checklists For System Configuration

    This function can be effectively used to move a system board among multiple domains as needed. For example, a system board can be added from the system board pool to a domain where CPU or memory has a high load. When the added system board becomes unnecessary, the system board can be returned to the system board pool.
  • Page 34: Reservation Of Domain Configuration Changes

    4. Allocation of Sufficient Memory and Distributed Swap Areas - You must allocate sufficient memory resources to be used when the memory on a system board is disconnected. Performing a DR operation with a high load already applied to memory may significantly lower job process performance and DR operability. 5.
  • Page 35: Conditions And Settings Using Xscf

    Conditions and Settings Using XSCF This section describes the operating conditions required for XSCF to start DR operations and the settings that are established by XSCF. 2.2.1 Conditions Using XSCF The DR operation to add a system board cannot be executed when the system board has only been mounted.
  • Page 36: Configuration Policy Option

    2.2.2.1 Configuration Policy Option DR operations involve automatic hardware diagnosis to add or move a system board safely. Degradation of components occurs when the components are set according to the configuration of this option, and a hardware error is detected. This option specifies the range of degradation.
  • Page 37: Omit-Memory Option

    Kernel memory is allocated to the non-floating boards in a domain by priority in ascending order of LSB number. When only floating boards are set in the domain, one of them is selected and used as a kernel memory board. In that case, the status of the board is changed from floating board to non-floating board.
  • Page 38: Omit-I/O Option

    2.2.2.4 Omit-I/O Option The omit-I/O option disables the PCI cards, disk drives, and basic local-area network (LAN) ports on a system board to prevent the target domain from using them. Set this option to “true” if the domain needs to use only the system board’s CPU and memory.
  • Page 39: Settings Of Kernel Cage Memory

    Even if the DDI_DETACH interface is supported, DDI_DETACH processing fails when the relevant driver is in use. Before starting the deletion of a system board, you must stop using all devices on the system board to be deleted. The device drivers that do not support DR must be unloaded before a system board is deleted.
  • Page 40: Setting Of Oracle Solaris Service Management Facility (Smf)

    2.3.3 Setting of Oracle Solaris Service Management Facility (SMF) Certain DR operations succeed only when the following Oracle Solaris Service Management Facility (SMF) services are active on the domain: Domain SP Communication Protocol (dscp) ■ Domain Configuration Server (dcs) ■ Sun Cryptographic Key Management Daemon (sckmd) ■...
  • Page 41: System Board Status

    Domain Status (Continued) TABLE 2-2 Status Description Booting Oracle Solaris OS is being booted or, due to the domain being shutdown or reset, the system is in the OpenBoot PROM running state or is suspended in the OpenBoot PROM (ok prompt) state. Running Oracle Solaris OS is running.
  • Page 42 System Board Management Items (Continued) TABLE 2-4 Management item Status Description Test Unmount The system board is not mounted or cannot be recognized, perhaps because it is faulty. Unknown The system board is not being diagnosed. Testing Testing Passed Passed Failed A system board error was detected and the board has been deconfigured.
  • Page 43: Flow Of Dr Processing

    2.4.3 Flow of DR Processing This section describes the flow of DR processing and the changes in system board status during individual DR operations. 2.4.3.1 Flowchart: Adding a System Board The flow of DR operations and the transition of system board status when a system board has been added or reserved for addition are described in the schematic flowchart, below.
  • Page 44: Flowchart: Deleting A System Board

    Flow of System Board Addition Processing FIGURE 2-5 DCL registration status System board pool registration Test: passed Test: passed Addition or Assignment: assigned operation Assignment: available reserva- reservation tion, DCL registration Request to add system board, Request to add process or domain reboot after system board registration/reservation...
  • Page 45: Flowchart: Moving A System Board

    Flow of System Board Deletion Processing FIGURE 2-6 Status of addition into OS Status of deletion from OS Request of Test: passed Test: passed Deletion/ deletion Assignment: assigned Assignment: assigned deletion from OS Connectivity: connected Connectivity: connected reservation Configuration: unconfigured Configuration: configured Reboot of Deletion from...
  • Page 46 Each system board status indicated in is the main status that is changed. FIGURE 2-7 For the flow of system board addition processing or deletion processing and the related system board status, see Section 2.4.3.1, “Flowchart: Adding a System Board” on page 2-21 Section 2.4.3.2, “Flowchart: Deleting a System Board”...
  • Page 47: Flowchart: Replacing System Board

    2.4.3.4 Flowchart: Replacing System Board The flow of DR operations and the transition of system board status when a system board has been replaced are described using the schematic flowchart. Each system board state indicated in is the main status that is changed. FIGURE 2-8 The sample status before and after replacement as shown in the figure are explained below.
  • Page 48 Flow of System Board Replacement Processing FIGURE 2-8 Deletion process Deleting a system board Deletion of system boards also Request to delete from from system board pool DCL registration status DCL registration status System board pool Assignment: assigned Assignment: available Replacement Replacement process...
  • Page 49: Operation Management

    Operation Management This section describes the premises and the actions for DR operations. 2.5.1 I/O Device Management Upon the addition of a system board, device information is reconfigured automatically. However, addition of the system board and the reconfiguration of device information do not end at the same time. Sometimes, device link in /dev directory is not automatically cleaned up by devfsadmd(1M) daemon.
  • Page 50: Real-Time Processes

    memory contents. Be aware that some of the total swap space may be supplied by disks that are attached to the board to be deleted. When making your assessment, be certain to also account for the swap space that will be lost. If the size of available memory (e.g., 1.5 gigabytes) is larger than the size of ■...
  • Page 51: Capacity On Demand (Cod)

    For example, when a kernel memory board with memory mirror mode enabled is deleted or moved, kernel memory is moved from the kernel memory board to another system board. Kernel memory is moved normally even if memory mirror mode is disabled for the move-destination system board. However, this operation results in lowered reliability of memory on the new kernel memory board.
  • Page 52: Deletion Of Board With Cd-Rw/Dvd-Rw Drive

    2.5.8 Deletion of Board with CD-RW/DVD-RW Drive To delete the system board to which the server’s CD-RW/DVD-RW drive is connected, execute the following steps: 1. Stop the vold(1M) daemon by disabling the volfs service. # /usr/sbin/svcadm disable volfs 2. Execute the DR operation. 3.
  • Page 53: Operational Modes

    CPUs on CPU/Memory Board Unit (CMU) and Domain Configuration FIGURE 2-9 CMU#2 CMU#1 CMU#3 CMU#0 CMU mounted with CMU mounted with CMU of mixed CPU CMU of mixed CPU SPARC64 VII only SPARC64 VI only configuration configuration Domain 1 Domain 0 Domain 2 : SPARC64 VI processor : SPARC64 VII processor...
  • Page 54 To check the CPU operational mode, execute the prtdiag (1M) command on the Oracle Solaris OS. If the domain is in SPARC64 VII Enhanced Mode, the output will display SPARC64-VII on the System Processor Mode line. If the domain is in SPARC64 VI Compatible Mode, nothing is displayed on that line.
  • Page 55: Dr User Interface

    C H A P T E R DR User Interface This chapter describes the user interfaces for DR. Section 3.1, “How To Use the DR User Interface” on page 3-1 ■ Section 3.2, “Command Reference” on page 3-26 ■ Section 3.3, “XSCF Web” on page 3-27 ■...
  • Page 56: Displaying Domain Information

    XSCF shell commands for DR operations are classified into two types: DR display and DR operation commands. DR Display Commands TABLE 3-1 Command name Function Display the DCL and domain status. showdcl Display domain status. showdomainstatus Display system board information. showboards Display information about the CPUs, memory, and I/O devices on showdevices...
  • Page 57 The showdcl(8) command is used before a DR operation to determine whether the domain status permits DR operation, and confirm the registration of the DR-target system board in the DCL. The showdcl(8) command is also used after a DR operation to confirm domain status and configuration. To change domain settings or register a system board in the DCL, use the setdcl(8) command.
  • Page 58 Items of Domain Information to be Displayed (Continued) TABLE 3-4 Display items Description Status Domain Status Powered Off Domain power is off. Initialization POST processing or OpenBoot PROM initialization is in Phase progress. OpenBoot Initialization of OpenBoot PROM is completed. Executing Completed Running...
  • Page 59: Displaying Domain Status

    The following shows examples of displays by the showdcl(8) command. Example 1: Display of information on domain #0 ■ XSCF> showdcl -d 0 Status Running 00-0 01-0 01-01 01-2 01-3 02-0 Example 2: Display of detailed information on domain #0 ■...
  • Page 60 Options of the showdomainstatus Command TABLE 3-5 Option Description Displays the status of all domains. -d domain_id Displays information about the specified domain, where domain_id is the domain number, possibly 0 to 23, depending on your server. Only one domain ID can be specified. Displays usage information.
  • Page 61: Displaying System Board Information

    3.1.3 Displaying System Board Information The showboards(8) command displays system board information including the domain ID of the domain to which the target system board belongs and various kinds of system board status in list format. Use the showboards(8) command before a DR operation to determine whether the system board status permits DR operations, and to confirm the domain ID of the domain to which the target system board belongs.
  • Page 62 The table below lists the items displayed by the showboards(8) command. Items of System Board Information to be Displayed TABLE 3-8 Display items Description System board number. Reservation status of a system board. “*” is displayed for a system board when the board is reserved for addition, deletion, or a move.
  • Page 63 Items of System Board Information to be Displayed (Continued) TABLE 3-8 Display items Description Test Diagnostic status of system board Unmount The system board is not mounted or cannot be recognized because it is faulty. Unknown The system board is not being diagnosed. Testing testing.
  • Page 64: Displaying Device Information

    Example 2: Display of detailed information on all system boards ■ XSCF> showboards -v -a DID(LSB) Assignment Conn Conf Test Fault -------------------------------------------------------------------------- 00-0 00(00) Assigned Passed Normal 00-1 00(01) Assigned Passed Degraded 00-2 Available Unknown Normal 00-3 01(15) Assigned Passed Normal Example 3: Display of information on the system board in the system board pool ■...
  • Page 65 Note – (Note 2) The showdevices(8) command will succeed only if the following Oracle Solaris Service Management Facility (SMF) services are active on that domain: - Domain SP Communication Protocol (dscp) - Domain Configuration Server (dcs) - Oracle Sun Cryptographic Key Management Daemon (sckmd). Options of the showdevices Command TABLE 3-9 Option...
  • Page 66 Domain Information Displayed by the showdevices command TABLE 3-10 Display items Description CPU information. Domain ID. System board number. CPU ID. state CPU status. speed CPU frequency (MHz). ecache CPU cache size (Megabyte: MB). usage Description of instance using resources. Memory Memory information.
  • Page 67: Displaying System Board Configuration Information

    Example: Display of device information on XSB00-0 ■ XSCF> showdevices 00-0 CPU: ---- state speed ecache 00-0 on-line 2048 00-0 on-line 2048 Memory: ------- board perm base domain target deleted remaining mem MB mem MB address mem MB mem MB mem MB 00-0 8192...
  • Page 68 Options of the showfru Command TABLE 3-11 Option Description Specifies that the command display all configuration information on devices of the type specified by devtype. Displays usage information. device Specifies a device type. Specify “sb” for DR. location Specifies a device name. Specifies a physical system board (PSB) number.
  • Page 69: Adding A System Board

    3.1.6 Adding a System Board Use the addboard(8) command to add a system board to a domain or reserve the addition of a system board to a domain based on the DCL. The system board must already be registered in the target domain’s DCL. Use the showdcl(8) command to check whether a system board is registered in the DCL.
  • Page 70 Options of the addboard Command (Continued) TABLE 3-13 Option Description Displays the usage information. Specifies that the command add a system board to the domain. If no -c configure other -c option is specified, -c configure is the default. Specifies that the command assign a system board to the domain. -c assign With this option specified, the command assigns the target system board to the domain.
  • Page 71: Deleting A System Board

    Note – (Note 3) If a system board has been forcibly added to a domain by the addboard(8) command with the -f option specified, normal operation of all added hardware resources may be disabled. For this reason, you should avoid using the -f option for normal DR operations.
  • Page 72 Options of the deleteboard Command TABLE 3-14 Option Description Specifies the suppression of output message display. The -y or -n option determines how output messages are automatically answered, whether or not the messages themselves are suppressed (with the -q option) or displayed. Specifies that a response of "yes"...
  • Page 73: Moving A System Board

    Note – (Note 1) The time required for system board deletion processing depends on the amount of hardware resources mounted on the target system board. For this reason, much time may be required for the command to end its operation. If the system board contains kernel memory, the OS is suspended for a while.
  • Page 74 Use the showdcl(8) command to check whether a system board is registered in the DCL. To register a system board in the DCL, use the setdcl(8) command. Before executing the moveboard(8) command, check the status of the move-source and move-destination domains and move-target system board, and the device usage status on the system board.
  • Page 75 Options of the moveboard Command (Continued) TABLE 3-15 Option Description Specifies that the command delete a system board from the -c configure move-source domain and adds it to the move-destination domain. If no other -c option is specified, -c configure is the default. The move operation from the move-source domain is performed when the domain power is off or the Oracle Solaris OS is running in the move-source domain.
  • Page 76: Replacing A System Board

    Note – (Note 1) The time required for system board deletion processing in the move-source domain depends on the amount of hardware resources mounted on the target system board. Moreover, in the system board addition processing in the move-destination domain, the system board to be added is first diagnosed, and then added to the domain.
  • Page 77 Note – In a midrange server, you cannot use DR commands to replace a system board. Instead, turn off the power of all domains, and then replace the target system board. To replace a system board in a domain, first delete the target system board from the domain by using the deleteboard(8) command to make the PSB replaceable.
  • Page 78 Note – (Note 4) To execute the addboard(8) command to add a system board by DR, the system board must already be registered in DCL. Use the showdcl(8) command to check whether a system board is registered in the DCL. To register a system board in the DCL, use the setdcl(8) command.
  • Page 79: Reserving A Domain Configuration Change

    3.1.10 Reserving a Domain Configuration Change Use the addboard(8), deleteboard(8), or moveboard(8) command to reserve a domain configuration change. A domain configuration change is reserved when a system board cannot be added, deleted, or moved immediately for operational reasons. The reserved addition, deletion, or move of the system board is executed when the power of the target domain is turned on or off, or the domain rebooted.
  • Page 80: Command Reference

    Command Reference This section lists the DR commands and other commands related to DR. For details of the commands, see the SPARC Enterprise M3000/M4000/M5000/M8000/M9000 Servers XSCF Reference Manual. For the DR commands, see Section 3.1, “How To Use the DR User Interface” on page 3-1.
  • Page 81: Xscf Web

    DR-related Commands TABLE 3-18 Command name Function Turns on the power of all domains or a specified domain. poweron Turns off the power of all domains or a specified domain. poweroff Configures DSCP network. setdscp Displays the DSCP network configuration. showdscp Installs a Field Replaceable Unit (FRU).
  • Page 82 Note – (Note 1) An RCM script can only automate actions performed to prepare for the deletion of a system board. When a system board is added to a domain, any actions required for use of the added resources must be manually performed. Note –...
  • Page 83: Practical Examples Of Dr

    C H A P T E R Practical Examples of DR This chapter provides examples of DR operations, such as the addition, deletion, move, and replacement of system boards. Each example shows an operation procedure using the command line interface of the XSCF shell.
  • Page 84: Flow Of Dr Operation

    Flow of DR Operation This section provides the flows of basic DR operations to add, delete, move, and replace system boards, along with flow diagrams. SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • January 2012...
  • Page 85: Flow: Adding A System Board

    4.1.1 Flow: Adding a System Board Flow: Adding a System Board FIGURE 4-1 Checking operation and selecting a DR operation - Operation status and configuration of a domain - Judgment of whether the DR operation can be performed DR operation not operation possible, or Stop status...
  • Page 86: Flow: Deleting A System Board

    4.1.2 Flow: Deleting a System Board Flow: Deleting a System Board FIGURE 4-2 Checking operation and selecting a DR operation - Operation status and configuration of a domain - Judgment of whether the DR operation can be performed DR operation DR operation not possible, possible...
  • Page 87: Flow: Moving A System Board

    4.1.3 Flow: Moving a System Board Flow: Moving a System Board FIGURE 4-3 Checking operation and selecting a DR operation - Operation status and configuration of the move-source domain - Operation status and configuration of the move-destination domain - Judgment of whether the DR operation can be performed DR operation DR operation...
  • Page 88: Flow: Replacing A System Board

    4.1.4 Flow: Replacing a System Board Flow: Replacing a System Board FIGURE 4-4 Stop status Pooled Checking operation and selecting a DR operation of the system - Operation status and configuration of a domain domain board - Adjustment between other domains - Configuration of the system board to be replaced - Checking the device status Deletion...
  • Page 89: Example: Adding A System Board

    Example: Adding a System Board This section provides an example of the DR operation to add a system board to a domain. In the example, a procedure conforming to section 4.1.1, "Flow: Adding a System Board.", is used, and the system board shown in the figure is added by using the XSCF shell.
  • Page 90 If you need to change the PSB configuration, use the setupfru(8) command. If the system board to be added is not registered in the DCL, register the system board in the DCL of the target domain by using the setdcl(8) command. XSCF>...
  • Page 91: Example: Deleting A System Board

    Example: Deleting a System Board This section provides an example of operation to delete a system board from a domain. In the example, a procedure conforming to Section 4.1.2, “Flow: Deleting a System Board” on page 4-4, is used, and the system board shown in the figure is deleted using the XSCF shell.
  • Page 92 3. Check the status of the system board to be deleted. Execute the showboards(8) command to display system board information, and then check the status of the system board to be deleted. XSCF> showboards -a DID(LSB) Assignment Conn Conf Test Fault ------------------------------------------------------------------- 00-0...
  • Page 93: Example: Moving A System Board

    Example: Moving a System Board This section provides an example of an operation to move a system board between domains. In the example, a procedure conforming to Section 4.1.3, “Flow: Moving a System Board” on page 4-5, is used, and the system board shown in the figure is moved using the XSCF shell.
  • Page 94 4. Check the status of the system board to be moved. Execute the showboards(8) command to display system board information, and then check the status of the system board to be moved. XSCF> showboards 00-1 DID(LSB) Assignment Conn Conf Test Fault ---- -------- ----------- ---- ---- ---- ------- --------------- 00-1...
  • Page 95: Examples: Replacing A System Board

    Examples: Replacing a System Board This section provides examples of operations to replace a system board in a domain. The examples illustrate replacement of a system board in a Uni-XSB environment and a system board in a Quad-XSB environment. In each sample operation, a procedure conforming to Section 4.1.4, “Flow: Replacing a System Board”...
  • Page 96 2. Check the status of the domain. Execute the showdcl(8) command to display domain information, and then check the operation status of the domain. Based on the operation status of the domain, determine whether to perform the DR operation or replace the system board after stopping the domain.
  • Page 97 7. Check the status of the replaced system board. Execute the showboards(8) command to display system board information, and then check the status of all related system boards and confirm their registration in the DCL. If necessary to change the system board configuration (e.g., number of divisions), do so by using the setupfru(8) command.
  • Page 98: Example: Replacing A Quad-Xsb System Board

    XSCF> showboards 01-0 DID(LSB) Assignment Conn Conf Test Fault ----------------------------------------------------------------- 01-0 00(01) Assigned Passed Normal 4.5.2 Example: Replacing a Quad-XSB System Board Example: Replacing a Quad-XSB System Board FIGURE 4-9 Domain#0 Faulty Delete system XSB#01-0 XSB#00-0 board XSB#01-1 Replace Domain#1 XSB#01-2 system XSB#01-3...
  • Page 99 2. Check the configurations and status of all domains to which the relevant system boards belong. Execute the showdcl(8) command to display domain information, and then check the configurations and operation status of all domains to which the relevant XSBs belong.
  • Page 100 5. Power off Domain #1 so the CMU can be replaced. Execute the poweroff(8) command so that the CMU being replaced will not be in use by domain #1. XSCF> poweroff -d 1 6. Check the status of all related system boards. Execute the showboards(8) command to display system board information, and then check the status of all related system boards.
  • Page 101 9. Check the status of all related domains. Execute the showdcl(8) command to display domain information, and then check the operation status of all related domains. Based on the operation status of the domain, determine whether to perform the DR operation or reboot the domains. XSCF>...
  • Page 102: Examples: Reserving Domain Configuration Changes

    XSCF> showboards -a DID(LSB) Assignment Conn Conf Test Fault ------------------------------------------------------------------ 00-0 00(00) Assigned Passed Normal 01-0 00(01) Assigned Passed Normal 01-1 00(02) Assigned Passed Normal 01-2 01(00) Assigned Passed Normal 01-3 01(01) Assigned Passed Normal Examples: Reserving Domain Configuration Changes This section provides examples of operations to reserve a change in domain configuration by DR.
  • Page 103 2. Check the status of the system board to be added. Execute the showboards(8) command to display system board information, and then check the status of the system board to be added and confirm its registration in the DCL. If you need to change the PSB configuration, use the setupfru(8) command. If the system board is not registered in the DCL, register the system board in the DCL for the target domain by using the setdcl(8) command.
  • Page 104: Example: Reserving A System Board Delete

    4.6.2 Example: Reserving a System Board Delete Example: Reserving a System Board Delete FIGURE 4-11 Domain#0 Domain#0 XSB#01-0 XSB#00-0 XSB#00-0 XSB#01-0 Delete 1. Login to XSCF. 2. Check the status of the domain. Execute the showdcl(8) command to display domain information, and then check the operation status of the domain.
  • Page 105: Example: Reserving A System Board Move

    5. Check the reserved status of the system board. Execute the showboards(8) command with the -v option specified to display system board information, and then confirm that deletion of the system board has been reserved. XSCF> showboards -v 01-0 DID(LSB) Assignment Conn Conf...
  • Page 106 3. Check the status of the move-destination domain. Execute the showdcl(8) command to display domain information, and then check the operation status of the move-destination domain. Based on the operation status of the move-source and move-destination domains, determine whether to perform the DR operation or change the domain configuration.
  • Page 107 8. Check the status of the move-destination domain and moved system board. Execute the showdcl(8) command to check the operation status of the move-destination domain, and then execute the showboards(8) command to check the status of the system board and confirm that addition of the system board has been reserved in the move-destination domain.
  • Page 108 4-26 SPARC Enterprise Mx000 Servers Dynamic Reconfiguration User’s Guide • January 2012...
  • Page 109: Message Meaning And Handling

    A P P E N D I X Message Meaning and Handling This appendix explains the meaning and handling of DR-related messages. This appendix includes these sections: Section A.1, “Oracle Solaris OS Messages” on page A-1 ■ Section A.2, “Command Messages” on page A-24 ■...
  • Page 110 OS unconfigure dr@0:SBX::cpuY [Explanation] Unconfigure CPU Y on system board X. OS unconfigure dr@0:SBX::memory [Explanation] Unconfigure memory on system board X. OS unconfigure dr@0:SBX::pciY [Explanation] Unconfigure PCI Y on system board X. suspending @ (aka ) [Explanation] Suspending the device suspending @...
  • Page 111: Panic Messages

    [Explanation] User command requests DR operation without checking for unsafe conditions DR: suspending drivers [Explanation] Suspending device drivers DR: in-kernel unprobe board [Explanation] Unprobing the board. A.1.2 PANIC Messages URGENT_ERROR_TRAP is detected during FMA. [Explanation] A fatal HW error was encountered during copy-rename. [Remedy] Please contact customer service.
  • Page 112: Warning Messages

    [Explanation] Internal error during kernel migration [Remedy] Please contact customer service. scf_fmem_end() failed rv=0x [Explanation] Internal error during kernel migration [Remedy] Please contact customer service. CPU nn hang during Copy Rename [Explanation] A fatal HW error was encountered during copy-rename. [Remedy] Please contact customer service.
  • Page 113 dr_cancel_cpu: failed to disable interrupts on cpu X [Explanation] Failed to disable interrupt on CPU X. [Remedy] Disable interrupt on cpu X with psradm -I and if this command fails again, respond in the manner directed by command message. dr_cancel_cpu: failed to online cpu X [Explanation] Failed to online CPU X.
  • Page 114 [Remedy] Please contact customer service. dr_copyout_iocmd: failed to copyout sbdcmd-struct [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. dr_status: failed to copyout status for board # [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. dr_status: unknown dev type (#) [Explanation] There may be inconsistency in the system.
  • Page 115 [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. dr_pre_release_cpu: thread(s) bound to cpu X [Explanation] The thread in the process is bound to the detached CPU X. [Remedy] Check if the process bound to the CPU exists by pbind(1M) command. If it exists, unbind from the CPU and repeat the action.
  • Page 116 dr_reserve_mem_spans memory reserve failed. Unexpected kphysm_del_span return value #; basepfn=# npages=# [Explanation] The selected target board can no longer fit all the kernel memory of the source board since it was last selected. [Remedy] Please repeat the action. If the problem remains, please contact customer service.
  • Page 117 dr_stop_user_threads: failed to stop thread: process=, pid=# [Explanation] Cannot stop the user thread. [Remedy] Please contact customer service. Cannot stop user thread: ... [Explanation] The DR driver cannot stop all the user processes in the list. [Remedy] Please contact customer service. [Output] Console and Standard Output Cannot setup memory node [Explanation] DR is unable to read the HW information for the memory device.
  • Page 118 I/O error: dr@0:SBX::memory [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. [Output] Console and Standard Output Invalid argument [Explanation] Invalid argument is passed to the driver or there may be inconsistency in the system. [Remedy] Repeat the action. If this error message appears again, please contact customer service.
  • Page 119 [Remedy] Repeat the action. If this error message appears again, please contact customer service. [Output] Console and Standard Output Bad address: dr@0:SBX::memory [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. [Output] Console and Standard Output Cannot read property value: device node XXXXXX property: name [Explanation] Fail to get the property from OBP.
  • Page 120 [Remedy] Repeat the action. If this error message appears again, please contact customer service. [Output] Console and Standard Output Error setting up FMEM buffer [Explanation] DR fails to allocate enough memory to perform copy rename. [Remedy] Retry and if the problem persists, contact customer service. Failed to off-line: dr@0:SBX::cpuY [Explanation] Failed to off-line CPU Y on board X.
  • Page 121 [Remedy] Respond in the manner directed by the other message. [Output] Console and Standard Output Insufficient memory: dr@0:SBX::memory [Explanation] Detected lack of memory resource. [Remedy] Check the size of memory, detach the board and attach again. If the problem still exists, please contact customer service. [Output] Console and Standard Output Internal error: dr.c # [Explanation] There may be inconsistency in the system.
  • Page 122 Memory operation failed: dr@0:SBX::memory [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. [Output] Console and Standard Output Memory operation refused: dr@0:SBX::memory [Explanation] The DR operation is refused. [Remedy] Respond in the manner directed by the other message. Memory operation cancelled: dr@0:SBX::memory [Explanation] The DR operation is canceled.
  • Page 123 [Output] Console and Standard Output Device busy: dr@0:SBX::cpuY [Explanation] CPU Y on system board X is busy during release operation. [Remedy] Repeat the action. If this error message appears again, please contact customer service. [Output] Console and Standard Output Insufficient memory: dr@0:SBX::cpuY [Explanation] Lack of memory resources detected.
  • Page 124 No such device: dr@0:SBX::cpuY [Explanation] There may be inconsistency in the system. [Remedy] Please contact customer service. [Output] Console and Standard Output Operation already in progress: dr@0:SBX::cpuY [Explanation] The operation on cpu Y on system board X is in progress. [Remedy] Repeat the action.
  • Page 125 [Output] Console and Standard Output Operation not supported: ERROR [Explanation] Invalid operation. [Remedy] Repeat the action. If this error message appears again, please contact customer service. [Output] Console and Standard Output Cannot setup resource map opl-fcodemem [Explanation] Resource memory mapping cannot be set up. [Remedy] Please contact customer service.
  • Page 126 opl_claim_memory - unable to allocate contiguous memory of size zero [Explanation] A claim request with size zero was issued by the fcode interpreter. [Remedy] If DR failed after this message, please contact customer service. opl_claim_memory - vhint is not zero vhint=0x - Ignoring Argument [Explanation] A claim request with a nonzero hint came from the fcode interpreter.
  • Page 127 IKP: destroy pci (--) failed [Explanation] The node was not destroyed. [Remedy] Please contact customer service. IKP: destroy pseudo-mc () failed [Explanation] The node was not destroyed. [Remedy] Please contact customer service. IKP: destroy chip (-) failed [Explanation] The node was not destroyed. [Remedy] Please contact customer service.
  • Page 128 [Explanation] IKP initialization failed [Remedy] Please contact customer service. I/O callback failed in pre-release [Explanation] I/O callback failed in pre-release [Remedy] Please contact customer service. I/O callback failed in post-attach [Explanation] I/O callback failed in post-attach [Remedy] Please contact customer service. Kernel Migration fails.
  • Page 129 [Explanation] Internal error during DR operation. [Remedy] Please contact customer service. drmach_node_ddi_get_parent: NULL parent dip [Explanation] Internal error during DR operation. [Remedy] Please contact customer service. Failed to remove CMP xx on board n [Explanation] Internal error during DR operation. [Remedy] Please contact customer service.
  • Page 130 opl_fc_ops_free_handle: DMA seen! [Explanation] A DMA resource was found in the resource list that is being freed while the board is unprobed. [Remedy] Please contact customer service. opl_fc_ops_free: unknown resource type [Explanation] An unknown resource type was found in the resource list that is being freed while the board is unprobed.
  • Page 131 [Remedy] Please contact customer service. Memory copy error [Explanation] Memory copy error happened during kernel migration. [Remedy] Retry and if the problem persists, contact customer service. Appendix A Message Meaning and Handling A-23...
  • Page 132: Command Messages

    SCF error [Explanation] Internal error happened during kernel migration. [Remedy] Please contact customer service. Cannot add SPARC64-VI to domain booted with all SPARC64-VII CPUs [Explanation] System board with SPARC64-VI cannot be added into a domain booted with all SPARC64-VII CPUs when the domain's CPU mode is set as 'auto' via XSCF.
  • Page 133 XSB#XX-X is already assigned to another domain. [Explanation] The specified system board (XSB#XX-X) has already been assigned to another domain. [Remedy] XSB has already been assigned to another domain. Confirm the XSB by showboards(8). XSB#XX-X is not installed. [Explanation] System board (XSB#XX-X) is not installed. [Remedy] Specify the wrong XSB.
  • Page 134 IP address of DSCP path is not specified. [Explanation] DR cannot communicate with the domain because the DSCP IP Address is not set up or registered. [Remedy] Register the DSCP IP Address. An internal error has occurred. This may have been caused by a DR library error.
  • Page 135: Deleteboard

    [Remedy] Confirm the current hardware configuration and support status. A hardware error occurred. Please check the error log for details. [Explanation] Hardware error occurred. Please confirm monitoring message and the error log. [Remedy] Find out the cause of the DR failure referring monitoring message and error log.
  • Page 136 DR operation canceled by operator. [Explanation] DR operation canceled by operator XSB#XX-X is not installed. [Explanation] System board (XSB#XX-X) is not installed. [Remedy] Specify the wrong XSB. Confirm the XSB by showboards(8). XSB#XX-X is currently unavailable for DR. Try again later. [Explanation] The specified system board (XSB#XX-X) has already been executed by another operation.
  • Page 137: Moveboard

    [Remedy] Confirm the domain powered off, DSCP setting, DSCP error with monitoring message and errorlog. XSB#XX-X could not be unconfigured from DomainID X due to operating system error. [Explanation] An error occurred from DR library of domain OS at DR process. The error occurred at configuration management of domain OS.
  • Page 138 [Explanation] Confirming whether DR operation is going to be executed or not. Input "y" to execute it and "n" to stop it. XSB#XX-X will be assigned to DomainID X immediately. Continue? [y|n]: [Explanation] Confirming whether DR operation is going to be executed or not. Input "y"...
  • Page 139 Another DR operation is in progress. Try again later. [Explanation] The specified system board (XSB#XX-X) has already been executed by another session. [Remedy] DR operation is in progress by another session. Try again after waiting for a while, with the confirmation of the XSB status. XSB#XX-X is the last LSB for DomainID X, and this domain is still running.
  • Page 140 [Remedy] Confirm the domain powered off, DSCP setting, DSCP error with monitoring message and errorlog. XSB#03-0 could not be unconfigured from DomainID 1 due to operating system error, or XSB#03-0 could not be configured into DomainID 0 due to operating system error. [Explanation] An error occurred in DR library of domain OS at DR process.
  • Page 141: Setdcl

    Timeout detected during self-test of XSB#XX-X. [Explanation] Because the hardware diagnosis in DR did not complete, a timeout occurred. There is a possibility that a hardware error occurred. [Remedy] Find out the cause of the DR failure referring to the monitoring message and error log.
  • Page 142: Setupfru

    DomainID X does not exist. [Explanation] No LSB was set up on the domain, when the DCL of configuration- policy was changed. [Remedy] Set up the domain and LSB. Try again. Invalid parameter. [Explanation] There is an error in the specified argument or operand. [Remedy] Confirm the specified argument or operand and execute the command once again.
  • Page 143: Showdevices

    [Explanation] Although configuration of PSB is changed, configuration error is occurring on the system board created. Confirm the CPU module and DIMM slot on the specified PSB and status of Memory Mirror Mode. [Remedy] Confirm the CPU module and DIMM slot on the PSB board and status of Memory Mirror Mode.
  • Page 144 [Explanation] The system was not able to get some parameter for the XSB. [Remedy] Confirm the information for the XSB via the showboards command. cannot get device information from DomainID. [Explanation] The system was unable to collect the requested information from the domain.
  • Page 145: Example: Confirm Swap Space Size

    Example: Confirm Swap Space Size This example shows one way to analyze the physical memory on a system board in a SPARC Enterprise M4000/M5000/M8000/M9000 server from Oracle and Fujitsu to determine whether the system has enough swap space to support deletion of a board.
  • Page 146 XSCF> showdevices 00-0 CPU: ---- state speed ecache 00-0 on-line 2048 00-0 on-line 2048 00-0 on-line 2048 00-0 on-line 2048 Memory: ------- board perm base domain target deleted remaining mem MB mem MB address mem MB mem MB mem MB 00-0 2048 0x0000000000000000...
  • Page 147: Index

    Index domain component list, 1-3 domain status, 2-18, 3-2, 3-5 Add, 1-3 DR functions, 1-1, 1-5 addboard, 3-2, 3-15, 3-22 addfru, 3-27 addition, 1-6, 2-21, 2-27, 3-15, 4-3, 4-7 eXtended System Board, 1-4 Assign, 1-3 eXtended System Control Facility (XSCF), 1-7 Basic DR Terms, 1-3 Floating Boards, 2-6, 2-14 Capacity on Demand, 2-29...
  • Page 148 memory mirror mode, 2-28 showdomainstatus, 3-2, 3-5 memory mirroring mode, 3-13 showdscp, 3-27 Move, 1-3 showfru, 3-2, 3-13 move, 1-6, 2-23, 3-19, 4-5, 4-11 SPARC64 VI Compatible Mode, 2-31 moveboard, 3-2, 3-19 SPARC64 VII Enhanced Mode, 2-31 swap area, 2-12, 2-27 system board, 1-5 omit-I/O, 2-16 system board pool, 2-10...

Table of Contents