Inside the Oracle Database Appliance – Part 1

We’ve had a few weeks to play around with the ODA in our office, and I’ve been able to crack it open and get to into the software and hardware that powers it.

For starters, the system runs a new model of Sun Fire – the X4370 M2. The 4U chassis is basically 2 separate 2U blades (Oracle is calling them system controllers – SCs) that have direct attached storage on the front. Here’s a listing of the hardware in each SC:

Sun X4370M2 System Controller Components (2 SCs per X4370M2)

CPU

2x 6-core Intel Xeon X5675 3.06GHz

Memory

96GB 1333MHz DDR3

Network

2x 10GbE (SFP+) PCIe card
4x 1GbE PCIe card
2x 1GbE onboard

Internal Storage

2x 500GB SATA for operating system
1x 4GB USB internal

RAID Controller

2x SAS-2 LSI HBA

Shared Storage

20x 600GB 3.5″ SAS 15,000 RPM hard drives
4x 73GB 3.5″ SSDs

External Storage

2x external MiniSAS ports

Operating System

Oracle Enterprise Linux 5.5 x86-64

Pictures of a real live ODA after the break.

If you’re anything like me, the first thing you wanted to know about the ODA is what’s inside? Follow along as we walk through the hardware involved.

What’s an SC?

The SC is essentially Oracle’s term for the blades that sit inside the Sun Fire X4370 M2.

From this view, the back of the SC is at the bottom. At the top (front), are 2 connections that plug into the chassis of the X4370 M2. The SCs slide out from the back of the chassis. Looking closer at the back of the SC, we have the external connections

From the back, you can see the PCI cards on the left, and the onboard ports on the right. In the middle are the fan modules. On the left, we have 4, gigabit ethernet ports, 2, 10GbE (SFP+) ports, and the 2 external SAS connections. On the right are the dual gigabit ethernet (onboard) ports, the serial and network ports for the ILOM, and your standard VGA and USB ports. Above the onboard ports are the 500GB SATA hard drives used for the operating system.

What’s Inside?

As mentioned above, there are 2 Seagate 500GB serial ATA hard drives that are used for the operating system:

Along with the serial ATA drives is a 4GB USB flash stick that can be used to create a bootable rescue installation of Oracle Enterprise Linux. Also, this drive is used for some firmware updates.

As for the disk controllers, each SC has 2 LSI controllers. One is on the internal PCIe slot, and another is on a standard PCIe slot. They are the SAS9211-8i controller.

Operating system

One of the many things that I found interesting on the box was that OEL 5.5 was installed, not one of the newer releases. Also, the server is running the RedHat compatible kernel, and not the Unbreakable Enterprise Kernel (UEK), which is the default on newer releases of OEL 5.

Disk configuration

As for the storage, Oracle has removed one of my biggest peeves with the Exadata storage servers. On Exadata, the operating system resides on 30GB partitions on the first 2 hard disks. Because of this, a 30GB griddisk has to be created on the remaining 10 disks, which becomes the DBFS_DG diskgroup (formerly SYSTEMDG). With Exadata, this diskgroup becomes the location for OCR/voting files, unless DATA or RECO is high redundancy. In that case, DBFS_DG is just wasted space. Anyways, going back to the ODA (that was the point of this post, wasn’t it?), this problem is no longer present thanks to the 2 500GB 2.5″ SATA drives in the back. These drives utilize software RAID (just like the Exadata storage servers), but don’t take advantage of the active/inactive partition scheme:

Let’s look at the LVM setup a little closer. We have a physical volume with a 465GB volume group (VolGroupSys). From here, we have LogVolRoot (30GB), LogVolOpt (60GB), and LogVolU01 (100GB). That leaves us with more than 250GB free on each SC for either adding new filesystems, growing existing filesystems, or taking LVM snapshots.

Network Configuration

The ODA has 8 (6 GbE and 2 10GbE) physical ethernet ports available to you, along with 2 internal fibre ports that are used for a built-in cluster interconnect. Here’s the output of running “ethtool” on the internal NICs:

Surprisingly, these NICs aren’t bonded, so we have 2 separate cluster interconnects, which means that we also have 2 HAIP devices. Also, eth1 and eth2 (the onboard NICs) were used to create a bond for the public traffic.

Note that the default configuration of the ODA doesn’t include a management network, like the Exadata does. That doesn’t mean that you can’t set up a management network, just that it’s not part of the initial setup process.

There’s a little overview of what’s inside the ODA. The next piece in the series will go into a little more detail on the disks, as well as the Oracle configuration.

Nic post questions
1)can you share the cluster interconnect cables and Internal card that has two ge and
2)there is 4 more slots open, can one add more SSD?
3)why the name oakcli?
4)what is the function of two UART port?
5)any picture of two SAS HBA connect to both server?
thx

@laotsao – The 2 UART ports are used for the internal cluster interconnect. Because the UDA is designed to only be used as a 2-node RAC, they eliminated the need for a cluster interconnect that is cabled. As for adding more SSD, there’s no room…It’s got 24 disk slots, and has 20 hard disks, 4 SSD. oakcli comes from “Oracle Appliance Kit CLI.”

When I’m back in the office, I’ll try to get some more pics of the inside. We’ll see if I can get the guys to let me take one of the nodes down.

Great post, Andy!
Only one thing: The DBFS_DG is not wasted space if you implement the DBFS database (therefore the new name of the diskgroup) with it’s tablespaces there. That is the recommended way to host flat files (that you may use for SQL*Loader or External Tables) on Exadata.

The shared storage is configured to be used within ASM diskgroups. By default, there are 3 diskgroups created: DATA, RECO, and REDO. If you want a shared filesystem between the 2 SCs, you will use either external NFS storage, or create a volume using ACFS. It is my understanding that the only protection for the ASM disks is through ASM redundancy, which we have seen to be very resilient. High redundancy is how the box was set up by the configurator, and there was not an option through the GUI to change that. If you are running this in a production environment, I would definitely recommend running with high redundancy. There is no hardware RAID used on the ASM disks. The 500GB disks that are in the back are isolated to each SC.

One thing that I mentioned above were the 2 external SAS ports. I’ve heard from a couple of people at Oracle that the ODA does not support using these external connections. It sounds like (no official confirmation) that the only supported methods of storage expansion are using NFS (preferably direct NFS) and iSCSI. We’re working on iSCSI in our lab, and it’s not as straightforward as you would expect. Results on that in a future post.

No matter what your configuration is, you will need 8 IP addresses. That includes 2 for the ILOMs (each SC has an ILOM), 2 for the SCs, 2 for VIPs (each SC will have a vip), and 2 for the scan (because the cluster will only have 2 nodes, it only needs 2 IPs for the scan). While it isn’t required to have the ILOMs connected to the network, it is definitely recommended. Also, even if you chose to not run RAC for the ODA, you will still get a clustered grid infrastructure, which will utilize the VIPs and scan. This is included free of charge when you license enterprise edition.

Hi,
There is a requirement that the interconnect will use switched network and not cross cable.
How is it implemented in ODA ? Do they have internal switch on the somehow changed the concept and are using cross cable
Hadar

They’re not really switched interfaces, but the internal NICs use the onboard Intel 82576 chip. From the RAC FAQ (note #220970.1):

——————————————————————————————
Is crossover cable supported as an interconnect with RAC on any platform ?

NO. CROSS OVER CABLES ARE NOT SUPPORTED. The requirement is to use a switch:

Detailed Reasons:

1) cross-cabling limits the expansion of RAC to two nodes

2) cross-cabling is unstable:

a) Some NIC cards do not work properly with it. They are not able to negotiate the DTE/DCE clocking, and will thus not function. These NICS were made cheaper by assuming that the switch was going to have the clock. Unfortunately there is no way to know which NICs do not have that clock.

b) Media sense behaviour on various OS’s (most notably Windows) will bring a NIC down when a cable is disconnected. Either of these issues can lead to cluster instability and lead to ORA-29740 errors (node evictions).

Due to the benefits and stability provided by a switch, and their afforability ($200 for a simple 16 port GigE switch), and the expense and time related to dealing with issues when one does not exist, this is the only supported configuration.

From a purely technology point of view Oracle does not care if the customer uses cross over cable or router or switches to deliver a message. However, we know from experience that a lot of adapters misbehave when used in a crossover configuration and cause a lot of problems for RAC. Hence we have stated on certify that we do not support crossover cables to avoid false bugs and finger pointing amongst the various parties: Oracle, Hardware vendors, Os vendors etc…

——————————————————————————————

It’s my understanding that Oracle has tested against these chips and has verified that the issues above are not present on this particular chip.

Um, well. Depending on your configuration you could run with anything from two IP adresses up to dozens or hundreds. The default configuration aims at 6 IP addresses for a DNS based configuration, or 5 addresses if you choose to go without DNS round robin. The ILOM requires 2 ip addresses (and 2 switch ports) if connected to the network, but thankfully this is optional.

One solution-in-a-box design I’ve been working on uses only 3 IP addresses, one for each physical node and one for a virtual router that acts as a gateway for the (mostly) virtual network infrastructure.

As for switch-less network configurations, as far as I know it was never a matter of Oracle not supporting RAC clusters with interconnect on crossover cables, just that they would not certify such an implementation. The difference is significant.

With the ODA, Oracle have changed their views on a number of things, including redo multiplexing, crossover cables and a couple of physical laws. ODA does use crossover copper cables (although, these days such cables are not actually crossed) or twinax cables with integrated SFP+ interfaces if you prefer to use the copper ports for the public interfaces.

Are you talking about RAC one node? When running the deployment, simply choose the advanced option, and you can choose between RAC, RAC one node (active/passive), and Enterprise Edition (only single-instance databases).

Yes, nasmel. Oracle Database Appliance Virtualized Platform is supported on the original ODA. Although, depending on the use case, the low amount of memory (when compared to ODA X3-2, X4-2 and the X5-2) may be a limiting factor.