10.2 Powering On and Off Oracle Big Data Appliance

10.2.1 Nonemergency Power Procedures

This section contains the procedures for powering on and off the components of Oracle Big Data Appliance in an orderly fashion.

10.2.1.1Powering On Oracle Big Data Appliance

Oracle Big Data Appliance is powered on by either pressing the power button on the front of the servers, or by logging in to the Oracle ILOM interface and applying power to the system.

To power on Oracle Big Data Appliance:

Turn on all 12 breakers on both PDUs.

Allow 4 to 5 minutes for Oracle ILOM to start.

Power up the servers.

10.2.1.2 Powering On Servers Remotely Using Oracle ILOM

You can power on the servers remotely using the Oracle ILOM interface. You can access Oracle ILOM using the web console, the command-line interface (CLI), the intelligent platform management interface (IPMI), or the simple network management protocol interface (SNMP). For example, to apply power to server bda1node01 using IPMI, run the following command as root from a server that has ipmitool installed:

ipmitool -H bda1node01-c -U root chassis power on

In this example, bda1node01-c is the host name of Oracle ILOM for the server to be powered on. You are prompted for the password.

10.2.1.3 Powering Off Oracle Big Data Appliance

10.2.1.3.1 Powering Off the Servers

Use the Linux shutdown command to power off or restart the servers. Enter this command as root to shut down a server immediately:

# shutdown -hP now

The following command restarts a server immediately:

# shutdown -r now

See Also:

Linux SHUTDOWN manual page for details

10.2.1.3.2 Powering Off Multiple Servers at the Same Time

Use the dcli utility to run the shutdown command on multiple servers at the same time. Do not run the dcli utility from a server that will be shut down. Set up passwordless SSH for root, as described in "Setting Up Passwordless SSH".

The following command shows the syntax of the command:

# dcli -l root -g group_name shutdown -hP now

In this command, group_name is a file that contains a list of servers.

The following example shuts down all Oracle Big Data Appliance servers listed in the server_group file:

10.2.1.4 Powering On and Off Network Switches

The network switches do not have power switches. They power off when power is removed by turning off a PDU or a breaker in the data center.

10.2.2Emergency Power-Off Considerations

In an emergency, halt power to Oracle Big Data Appliance immediately. The following emergencies may require powering off Oracle Big Data Appliance:

Natural disasters such as earthquake, flood, hurricane, tornado, or cyclone

Abnormal noise, smell, or smoke coming from the system

Threat to human safety

10.2.2.1Emergency Power-Off Procedure

To perform an emergency power-off procedure for Oracle Big Data Appliance, turn off power at the circuit breaker or pull the emergency power-off switch in the computer room. After the emergency, contact Oracle Support Services to restore power to the system.

10.2.2.2 Emergency Power-Off Switch

Emergency power-off (EPO) switches are required when computer equipment contains batteries capable of supplying more than 750 volt-amperes for more than 5 minutes. Systems that have these batteries include internal EPO hardware for connection to a site EPO switch or relay. Use of the EPO switch removes power from Oracle Big Data Appliance.

10.2.3 Cautions and Warnings

The following cautions and warnings apply to Oracle Big Data Appliance:

WARNING:

Do not touch the parts of this product that use high-voltage power. Touching them might result in serious injury.

You can mix DIMM sizes, but they must be installed in order from largest to smallest. You can achieve the best performance by preserving symmetry. For example, add four of the same size DIMMs, one for each memory channel, to each processor, and ensure that both processors have the same size DIMMs installed in the same order.

To add memory to a Sun Server X3-2L server:

If you are mixing DIMM sizes, then review the DIMM population rules in the Sun Server X3-2L Service Manual at

Install the new DIMMs. If you are installing 16 or 32 GB DIMMs, then replace the existing 8 GB DIMMs first, and then replace the plastic fillers. You must install the largest DIMMs first, then the next largest, and so forth. You can reinstall the original 8 GB DIMMs last.

10.3.2 Adding Memory to Sun Fire X4270 M2 Servers

Oracle Big Data Appliance ships from the factory with 48 GB of memory in each server. Six of the 18 DIMM slots are populated with 8 GB DIMMs. You can populate the empty slots with 8 GB DIMMs to bring the total memory to either 96 GB (12 x 8 GB) or 144 GB (18 x 8 GB). An upgrade to 144 GB may slightly reduce performance because of lower memory bandwidth; memory frequency drops from 1333 MHz to 800 MHz.

10.4.1 Verifying the Server Configuration

The 12 disk drives in each Oracle Big Data Appliance server are controlled by an LSI MegaRAID SAS 92610-8i disk controller. Oracle recommends verifying the status of the RAID devices to avoid possible performance degradation or an outage. The effect on the server of validating the RAID devices is minimal. The corrective actions may affect operation of the server and can range from simple reconfiguration to an outage, depending on the specific issue uncovered.

10.4.1.1 Verifying Disk Controller Configuration

Enter this command to verify the disk controller configuration:

# MegaCli64 -AdpAllInfo -a0 | grep "Device Present" -A 8

The following is an example of the output from the command. There should be 12 virtual drives, no degraded or offline drives, and 14 physical devices. The 14 devices are the controllers and the 12 disk drives.

10.5.2 About Disk Drive Identifiers

The Oracle Big Data Appliance servers contain a disk enclosure cage that is controlled by the host bus adapter (HBA). The enclosure holds 12 disk drives that are identified by slot numbers 0 to 11. The drives can be dedicated to specific functions, as shown in Table 10-1.

Oracle Big Data Appliance uses symbolic links, which are defined in /dev/disk/by_hba_slot, to identify the slot number of a disk. The links have the form snpm, where n is the slot number and m is the partition number. For example, /dev/disk/by_hba_slot/s0p1 initially corresponds to /dev/sda1.

When a disk is hot swapped, the operating system cannot reuse the kernel device name. Instead, it allocates a new device name. For example, if you hot swap /dev/sda, then the disk corresponding /dev/disk/by-hba-slot/s0 might link to /dev/sdn instead of /dev/sda. Therefore, the links in /dev/disk/by-hba-slot/ are automatically updated when devices are added or removed.

The command output lists device names as kernel device names instead of symbolic link names. Thus, /dev/disk/by-hba-slot/s0 might be identified as /dev/sda in the output of a command.

10.5.2.1 Standard Disk Drive Mappings

Table 10-1 shows the mappings between the RAID logical drives and the operating system identifiers, and the dedicated function of each drive in an Oracle Big Data Appliance server. Nonetheless, you must confirm that these mappings are correct on your system.

Table 10-1 Disk Drive Identifiers

Symbolic Link to Physical Slot

Initial Operating System Location

Dedicated Function

/dev/disk/by-hba-slot/s0

/dev/sda

Operating system

/dev/disk/by-hba-slot/s1

/dev/sdb

Operating system

/dev/disk/by-hba-slot/s2

/dev/sdc

HDFS

/dev/disk/by-hba-slot/s3

/dev/sdd

HDFS

/dev/disk/by-hba-slot/s4

/dev/sde

HDFS

/dev/disk/by-hba-slot/s5

/dev/sdf

HDFS

/dev/disk/by-hba-slot/s6

/dev/sdg

HDFS

/dev/disk/by-hba-slot/s7

/dev/sdh

HDFS

/dev/disk/by-hba-slot/s8

/dev/sdi

HDFS

/dev/disk/by-hba-slot/s9

/dev/sdj

HDFS

/dev/disk/by-hba-slot/s10

/dev/sdk

HDFS or Oracle NoSQL Database

/dev/disk/by-hba-slot/s11

/dev/sdl

HDFS or Oracle NoSQL Database

10.5.2.2 Standard Mount Points

Table 10-2 show the mappings between HDFS partitions and mount points.

Table 10-2 Mount Points

Symbolic Link to Physical Slot and Partition

HDFS Partition

Mount Point

/dev/disk/by-hba-slot/s0p4

/dev/sda4

/u01

/dev/disk/by-hba-slot/s1p4

/dev/sdb4

/u02

/dev/disk/by-hba-slot/s2p1

/dev/sdc1

/u03

/dev/disk/by-hba-slot/s3p1

/dev/sdd1

/u04

/dev/disk/by-hba-slot/s4p1

/dev/sde1

/u05

/dev/disk/by-hba-slot/s5p1

/dev/sdf1

/u06

/dev/disk/by-hba-slot/s6p1

/dev/sdg1

/u07

/dev/disk/by-hba-slot/s7p1

/dev/sdh1

/u08

/dev/disk/by-hba-slot/s8p1

/dev/sdi1

/u09

/dev/disk/by-hba-slot/s9p1

/dev/sdj1

/u10

/dev/disk/by-hba-slot/s10p1

/dev/sdk1

/u11

/dev/disk/by-hba-slot/s11p1

/dev/sdl1

/u12

10.5.2.3 Obtaining the Physical Slot Number of a Disk Drive

Use the following MegaCli64 command to verify the mapping of virtual drive numbers to physical slot numbers. See "Replacing a Disk Drive."

# MegaCli64 LdPdInfo a0 | more

10.5.3 Prerequisites for Replacing a Working Disk

If you plan to replace an HDFS disk or an operating system disk before it fails, then you should first dismount the HDFS partitions. You must also turn off swapping before replacing an operating system disk.

Note:

Only dismount HDFS partitions. For an operating system disk, ensure that you do not dismount operating system partitions. Only partition 4 (sda4 or sdb4) of an operating system disk is used for HDFS.

To dismount HDFS partitions:

Log in to the server with the failing drive.

If the failing drive supported the operating system, then turn off swapping:

10.5.4 What If a Server Fails to Restart?

The server may restart during the disk replacement procedures, either because you issued a reboot command or made an error in a MegaCli64 command. In most cases, the server restarts successfully, and you can continue working. However, in other cases, an error occurs so that you cannot reconnect using ssh. In this case, you must complete the reboot using Oracle ILOM.

To restart a server using Oracle ILOM:

Use your browser to open a connection to the server using Oracle ILOM. For example:

http://bda1node12-c.example.com

Note:

Your browser must have a JDK plug-in installed. If you do not see the Java coffee cup on the log-in page, then you must install the plug-in before continuing.

Log in using your Oracle ILOM credentials.

Select the Remote Control tab.

Click the Launch Remote Console button.

Enter Ctrl+d to continue rebooting.

If the reboot fails, then enter the server root password at the prompt and attempt to fix the problem.

After the server restarts successfully, open the Redirection menu and choose Quit to close the console window.

For example, [20:5] repairs the disk identified by enclosure 20 in slot 5.

Verify the disk is recognized by the operating system.

# lsscsi

The disk may appear with its original device name (such as /dev/sdc) or under a new device name (such as /dev/sdn). If the operating system does not recognize the disk, then the disk is missing from the list generated by the lsscsi command.

This example output shows two disks with new device names: /dev/sdn in slot 5, and /dev/sdo in slot 10.

10.5.6 Identifying the Function of a Disk Drive

Most disks are used for HDFS, as shown in Table 10-1. Nonetheless, you should verify that the failed disk was not used for either the operating system or Oracle NoSQL Database before configuring it for a particular function.

10.5.6.1 Checking for Use by the Operating System

Oracle Big Data Appliance is configured with the operating system on the first two disks.

To confirm that a failed disk supported the operating system:

Check whether the replacement disk corresponds to /dev/sda or /dev/sdb, which are the operating system disks.

10.5.6.2 Checking for Use by Oracle NoSQL Database

Oracle Big Data Appliance can be configured to allocate the last 0, 1, or 2 disks for the exclusive use of Oracle NoSQL Database. HDFS data does not reside on the same disks.

To discover whether a failed disk supported Oracle NoSQL Database:

Open an SSH connection to the first server in the rack and log in as the root user.

Obtain the value of NOSQLDB_DISKS from the mammoth-rack_name.params configuration file:

# cat /opt/oracle/BDAMammoth/mammoth-rackname.params | grep NOSQL

Use the value of NOSQLDB_DISKS to determine whether the replacement disk is allocated to Oracle NoSQL Database:

0: No disks are allocated to Oracle NoSQL Database.

1: The /dev/sdl disk is allocated to Oracle NoSQL Database.

2: The /dev/sdk and /dev/sdl disks are allocated to Oracle NoSQL Database.

To verify that the disks are part of a logical volume, you can run either pvscan or pvdisplay. All disks allocated for use by Oracle NoSQL Database are presented to it as a single logical volume named lvg1.

10.5.7 Configuring an Operating System Disk

The first two disks support the Linux operating system. These disks store a copy of the mirrored operating system, a swap partition, a mirrored boot partition, and an HDFS data partition.

To configure an operating system disk, you must copy the partition table from the surviving disk, create an HDFS partition (ext4 file system), and add the software raid partitions and boot partitions for the operating system.

Complete these procedures after replacing the disk in either slot 0 or slot 1.

You can use this command to restart an operating system disk configuration, if you make a mistake.

Create the partition table:

# parted /dev/disk/by-hba-slot/sn -s mklabel gpt print

List the Cylinder, Head, Sector (CHS) partition information of the surviving disk. Thus, if you are partitioning /dev/disk/by-hba-slot/s0, then enter /dev/disk/by-hba-slot/s1 for /dev/disk/by-hba-slot/sm in the following command:

10.5.7.3 Formatting the HDFS Partition of an Operating System Disk

Partition 4 (sda4) on an operating system disk is used for HDFS. After you format the partition and set the correct label, HDFS rebalances the job load to use the partition if the disk space is needed.

Repeat these steps until restarting the server results in a BDA_REBOOT_SUCCEEDED file.

10.6 Changing InfiniBand IP Addresses

You may need to change the InfiniBand network information on an existing Oracle Big Data Appliance. The change may support a media server with multiple InfiniBand cards, or keep InfiniBand traffic on a distinct InfiniBand network such as having production, test, and quality assurance (QA) environments in the same rack.

All InfiniBand addresses must be in the same subnet, with a minimum subnet mask of 255.255.240.0 (or /20). Choose a subnet mask wide enough to accommodate possible future expansion of the Oracle Big Data Appliance and InfiniBand network.

You cannot change the host names after running the Mammoth Utility.

To change the InfiniBand IP addresses:

Log in to an Oracle Big Data Appliance server as the root user.

Change to the /etc/sysconfig/network-scripts directory.

Copy the ifcfg-bondib0 file, using a name that does not start with ifcfg:

10.7.1 Backing Up and Restoring Oracle ILOM Settings

Oracle ILOM supports remote administration of the Oracle Big Data Appliance servers. This section explains how to back up and restore the Oracle ILOM configuration settings, which are set by the Mammoth Utility.

Connect to the switch as ilom_admin and open the Fabric Management shell:

-> show /SYS/Fabric_Mgmt

The prompt changes from -> to FabMan@hostname->

Disable the Subnet Manager:

FabMan@bda1sw-02-> disablesm

Connect the cables to the new switch, being careful to connect each cable to the correct port.

Verify that there are no errors on any links in the fabric:

FabMan@bda1sw-02-> ibdiagnet -c 1000 -r

Enable the Subnet Manager:

FabMan@bda1sw-02-> enablesm

Note:

If the replaced switch was the Sun Datacenter InfiniBand Switch 36 spine switch, then manually fail the master Subnet Manager back to the switch by disabling the Subnet Managers on the other switches until the spine switch becomes the master. Then reenable the Subnet Manager on all the other switches.

10.7.3 Verifying InfiniBand Network Operation

If any component in the InfiniBand network has required maintenance, including replacing an InfiniBand Host Channel Adapter (HCA) on a server, an InfiniBand switch, or an InfiniBand cable, or if operation of the InfiniBand network is suspected to be substandard, then verify that the InfiniBand network is operating properly. The following procedure describes how to verify network operation:

Note:

Use this procedure any time the InfiniBand network is performing below expectations.

To verify InfiniBand network operation:

Enter the ibdiagnet command to verify InfiniBand network quality:

# ibdiagnet -c 1000

Investigate all errors reported by this command. It generates a small amount of network traffic and can run during a normal workload.

10.7.4 Understanding the Network Subnet Manager Master

The Subnet Manager manages all operational characteristics of the InfiniBand network, such as the ability to:

Discover the network topology

Assign a local identifier to all ports connected to the network

Calculate and program switch forwarding tables

Monitor changes in the fabric

The InfiniBand network can have multiple Subnet Managers, but only one Subnet Manager is active at a time. The active Subnet Manager is the Master Subnet Manager. The other Subnet Managers are the Standby Subnet Managers. If a Master Subnet Manager is shut down or fails, then a Standby Subnet Manager automatically becomes the Master Subnet Manager.

Each Subnet Manager has a configurable priority. When multiple Subnet Managers are on the InfiniBand network, the Subnet Manager with the highest priority becomes the master Subnet Manager. On Oracle Big Data Appliance, the Subnet Managers on the leaf switches are configured as priority 5, and the Subnet Managers on the spine switches are configured as priority 8.

The following guidelines determine where the Subnet Managers run on Oracle Big Data Appliance:

Run the Subnet Managers only on the switches in Oracle Big Data Appliance. Running a Subnet Manager on any other device is not supported.

When the InfiniBand network consists of one, two, or three racks cabled together, all switches must run a Subnet Manager. The master Subnet Manager runs on a spine switch.

When the InfiniBand network consists of four or more racks cabled together, then only the spine switches run a Subnet Manager. The leaf switches must disable the Subnet Manager.

10.9 Changing the NTP Servers

The configuration information for Network Time Protocol (NTP) servers can be changed after the initial setup. The following procedure describes how to change the NTP configuration information for InfiniBand switches, Cisco switches, and Sun servers. Oracle recommends changing each server individually.

To update the Oracle Big Data Appliance servers:

Stop NTP services on the server.

Update the /etc/ntp.conf file with the IP address of the new NTP server.