Before Christmas 2017 I had the honour to expand the compute node storage for a virtualised exadata. Better said, OVM on exadata. This can be done for several reasons. The most logic one, also the reason why this customer bought the disk expansion, is that you want to create more virtual machines and you have ran out of space on /EXAVMIMAGES.

Add the disk expansion kit to the database server.
The kit consists of 4 additional hard drives to be installed in the 4 available slots. Remove the filler panels and install the drives. The drives may be installed in any order.

I highlight how I did it. Also Oracle Support verified this should be ok, so here we go.

Gotcha 1

Preparation

You need to set aside important information on how the system looks like.

First of all you need to ensure that reclaimdisks.sh has been run correctly after installation. As I did this installation myself, I can confirm this has been done correctly, so this one can be skipped.

Then we go the next step. Adding the disks physically in the servers. This really not so difficult, but need to be done with care and according the safety measures Oracle tells you, also watch out for ESD. They are FRU anyway, but in this case you know when the engineer should do it.
When the disks are put in the server, the raid starts to rebuild automatically. In our case it took around 14hours to finish. Dbmcli puts it in alerthistory when the rebuild is ready.

1

2

3

4

[root@demoexa01db01~]# dbmcli -e list alerthistory

23_12017-12-18T13:54:04+00:00warning"A disk expansion kit was installed. The additional physical drives were automatically added to the existing RAID5 configuration, and reconstruction of the corresponding virtual drive was automatically started."

23_22017-12-19T04:14:05+00:00clear"Virtual drive reconstruction due to disk expansion was completed."

[root@demoexa01db01~]#

It’s also good to know how the partitions look like:

1

2

3

4

5

6

[root@demoexa01db01~]# cat /proc/partitions |grep sda

804094720000sda

81524288sda1

82119541760sda2

831634813903sda3

[root@demoexa01db01~]#

While preparing the steps, I saw that we had to recreate the partition, but not on the sector but the specified size. Personally I do not like this. So it’s good to gather the information about start and end sectors as well using parted.

1

2

3

4

5

6

7

8

9

10

11

12

[root@demoexa01db01~]# parted /dev/sda 'unit s print'

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:8189440000s

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

164s1048639s1048576sext3 primary boot

21048640s240132159s239083520sprimary lvm

3240132160s3509759965s3269627806sprimary

[root@demoexa01db01~]#

Check how parted sees how the disk looks like

1

2

3

4

5

6

7

8

9

10

11

12

[root@demoexadb01~]# parted -s /dev/sda print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB1797GB1674GBprimary

[root@demoexadb01~]#

Also copy aside the information from df -h and the list of virtual machines using xm list.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

[root@demoexa01db01~]# df -h /EXAVMIMAGES

Filesystem Size Used Avail Use%Mounted on

/dev/sda31.6T1.5T117G93%/EXAVMIMAGES

[root@demoexa01db01~]# xm list

Name ID Mem VCPUs State Time(s)

Domain-0073094r-----11323224.3

demoexaadm01db01vm01.mydomain.demo32163874-b----141175.1

demoexaadm01db01vm02.mydomain.demo337782712r-----2422336.9

demoexaadm01db01vm03.mydomain.demo3412288320-b----4851298.7

demoexaadm01db01vm04.mydomain.demo35614438r-----1203861.5

demoexaadm01db01vm05.mydomain.demo36983078r-----1998874.0

demoexaadm01db01vm06.mydomain.demo377168310r-----2530870.0

demoexaadm01db01vm07.mydomain.demo38614434r-----1316172.4

demoexaadm01db01vm08.mydomain.demo391228838r-----2759130.6

demoexaadm01db01vm09.mydomain.demo40656394r-----1240107.5

[root@demoexa01db01~]#

Expansion

Partition enlargement

Then all domains can be shutdown. So do this only on one node at the time so that your database doesn’t go down

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

[root@demoexadb01~]# xm shutdown -a -w

Domain demoexaadm01vm01.mydomain.demo terminated

Domain demoexaadm01vm04.mydomain.demo terminated

Domain demoexaadm01vm07.mydomain.demo terminated

Domain demoexaadm01vm05.mydomain.demo terminated

Domain demoexaadm01vm09.mydomain.demo terminated

Domain demoexaadm01vm06.mydomain.demo terminated

Domain demoexaadm01vm03.mydomain.demo terminated

Domain demoexaadm01vm08.mydomain.demo terminated

Domain demoexaadm01vm02.mydomain.demo terminated

All domains terminated

[root@demoexadb01~]#

[root@demoexadb01~]# xm list

Name ID Mem VCPUs State Time(s)

Domain-0073094r-----11325897.9

[root@demoexadb01~]#

Make sure now to unmount the /EXAVMIMAGES filesystem.

1

2

[root@demoexadb01~]# umount /EXAVMIMAGES/

[root@demoexadb01~]#

It MIGHT (I had one node who needed it, another one who didn’t need it) be necessary to stop the xen deamon and the ocfs service. You can do this by running:

1

2

3

service xend stop

service xendomains stop

service ocfs2 stop

Then the filesystem cleanly unmounted. This is necessary because we will remove the partition in the next step.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

[root@demoexadb01~]# parted /dev/sda

GNU Parted2.1

Using/dev/sda

Welcome toGNU Parted!Type'help'toviewalist of commands.

(parted)print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB1797GB1674GBprimary

(parted)rm3

Warning:WARNING:the kernel failed tore-read the partition table on/dev/sda(Device orresource busy).Asaresult,it may notreflect all of your changes until after reboot.

(parted)quit

Information:You may need toupdate/etc/fstab.

[root@demoexadb01~]#

The warning, we’re aware off. That’s because we are interfering with the disk which is actually in use and we cannot unmount everything on it, but the change has been done, we only cannot see it (yet). So then we need to create the new partition. In this case we follow the Oracle documentation:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

[root@demoexadb01~]# parted /dev/sda

GNU Parted2.1

Using/dev/sda

Welcome toGNU Parted!Type'help'toviewalist of commands.

(parted)mkpart primary123gb4193gb

Warning:WARNING:the kernel failed tore-read the partition table on/dev/sda(Device orresource busy).Asaresult,it may notreflect all of your changes until after reboot.

(parted)print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB4193GB4070GBprimary

(parted)quit

[root@demoexadb01~]#

So far so good, so we should be able to mount the partition without errors, but it won’t be bigger yet. So let’s do that.

And here is where the journey begins. Debug is not so difficult. For your interest, I had a service request already open and the engineer responded to just drop everything and restore the partition and try to recovery using -r. I’m not a fan of this, because it’s a risky operation and it’s always better to know why it happened so join me on the reasoning to find the solution, which makes me feel more comfortable than just “restoring things”:

First confirm the parted output. Remember this?

1

2

3

4

5

6

7

8

9

10

11

12

(parted)print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB4193GB4070GBprimary

(parted)

it matched exactly with the oracle documentation. So let’s see a bit deeper and check the /proc/partitions.

1

2

3

4

5

6

[root@demoexadb01~]# cat /proc/partitions |grep sda

804094720000sda

81524288sda1

82119541760sda2

833974651904sda3<<<<---------

[root@demoexadb01~]#

that doesn’t match with the documentation. Last number should be 3! So the beginning of the partition is not on the spot we expect it and let just that part contain very interesting information.

First step is to get rid of the wrongly created partition

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

[root@demoexadb01~]# parted /dev/sda

GNU Parted2.1

Using/dev/sda

Welcome toGNU Parted!Type'help'toviewalist of commands.

(parted)mkpart primary123gb4193gb

Warning:WARNING:the kernel failed tore-read the partition table on/dev/sda(Device orresource busy).Asaresult,it may notreflect all of your changes until after reboot.

(parted)print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB4193GB4070GBprimary

(parted)rm3

Warning:WARNING:the kernel failed tore-read the partition table on/dev/sda(Device orresource busy).Asaresult,it may notreflect all of your changes until after reboot.

(parted)quit

[root@demoexadb01~]#

When we are in a situation where the partition is gone, we can recreate it (the way we want), but starting from the sector we retrieved first. That way we know for sure that it starts at the same spot and it should turn out fine:

Warning:WARNING:the kernel failed tore-read the partition table on/dev/sda(Device orresource busy).Asaresult,it may notreflect all of your changes until after reboot.

(parted)print

Model:LSI MR9361-8i(scsi)

Disk/dev/sda:4193GB

Sector size(logical/physical):512B/512B

Partition Table:gpt

Number Start EndSize File system Name Flags

132.8kB537MB537MBext3 primary boot

2537MB123GB122GBprimary lvm

3123GB4193GB4070GBprimary

(parted)quit

[root@demoexadb01~]#

This way the partition looks EXACTLY the same as the output in the documentation, but … it did before as well.

1

2

[root@demoexadb01~]# mount /EXAVMIMAGES/

[root@demoexadb01~]#

That’s good! Remember that we still have to reboot the server to make the system reread the partition table, so the reboot must be done now.

When the system is back online, first verify if all went well:

1

2

3

4

5

6

7

8

9

[root@demoexadb01~]# df -h

Filesystem Size Used Avail Use%Mounted on

/dev/mapper/VGExaDb-LVDbSys3

30G17G12G58%/

tmpfs7.8G07.8G0%/dev/shm

/dev/sda1480M59M396M13%/boot

/dev/sda31.6T1.5T117G93%/EXAVMIMAGES

none3.9G40K3.9G1%/var/lib/xenstored

[root@demoexadb01~]#

So far so good. The filesystem mounts, but it’s still not expanded. The partition should be bigger now, lets check:

1

2

3

4

5

6

[root@demoexadb01~]# cat /proc/partitions |grep sda

804094720000sda

81524288sda1

82119541760sda2

833974653903sda3

[root@demoexadb01~]#

and it matches the documentation. This means the expansion of the portion has been executed successfully. Now the filesystem has to be enlarged.

Expand filesystem

The /EXAVMIMAGES is an ocfs2 filesystem. We can expand it using tunefs.ocfs2. This command should not give any output.

1

2

[root@demoexadb01~]# tunefs.ocfs2 -S /dev/sda3

[root@demoexadb01~]#

Looks fine. Df -h:

1

2

3

4

5

6

7

8

9

[root@demoexadb01~]# df -h

Filesystem Size Used Avail Use%Mounted on

/dev/mapper/VGExaDb-LVDbSys3

30G17G12G58%/

tmpfs7.8G07.8G0%/dev/shm

/dev/sda1480M59M396M13%/boot

/dev/sda33.8T1.5T2.3T39%/EXAVMIMAGES

none3.9G40K3.9G1%/var/lib/xenstored

[root@demoexadb01~]#

Yay, this is ok. Due to the reboot the user domains are already back online, but that’s ok. If they aren’t started automatically. It’s time to start them now. After they are fully booted, then you can repeat the actions on the other node.

Gotcha 2

This one worries me a bit to be honest. This customer has a certain amount of exadata’s. They stepped in at X2-2 and are currently in the X6-2 range. Also they try to keep up with patch levels and upgrade regularly. Which is, in my opinion, a good thing. Recently, some of the racks are extended to elastic configuration. I discovered that with the 12.2 image on the compute nodes there is something odd. By default a virtualised exadata dom0 filesystem looks like this:

1

2

3

4

5

6

7

8

9

[root@demoexa01db01~]# df -h

Filesystem Size Used Avail Use%Mounted on

/dev/mapper/VGExaDb-LVDbSys3

30G17G12G58%/

tmpfs7.8G4.0K7.8G1%/dev/shm

none3.9G840K3.9G1%/var/lib/xenstored

/dev/sda1480M59M396M13%/boot

/dev/sda31.6T1.5T117G93%/EXAVMIMAGES

[root@demoexa01db01~]#

The thing with a newly imaged (or newly deployed with the oracle provided image which comes from the factory:

1

2

3

4

5

[root@demoexa02db01~]# df -h /EXAVMIMAGES

Filesystem Size Used Avail Use%Mounted on

/dev/mapper/VGExaDb-LVDbExaVMImages

1.6T86G1.5T6%/EXAVMIMAGES

[root@demoexa02db01~]#

When highlighting this to support I got an amusing talk. Apparently the disk layout is not included (yet?) in the patching / upgrading of these systems. So I asked them, when you want to expand a compute node which is already on LVM which naming you should use, or what the standards are. As in the latest EIS DVD (November2017) this was not included, neither on the oracle documentation (December2017). The answer I got was:

“From a patching perspective we don’t care about the pv names as the work is done on a much higher level. For pv names, we recommend you use the same approach as for the existing disks.”

When you know that by default the pvs used for vgexadb is /dev/sda2. So this story will still be a story to be continued. If you read this and know the answer, let me know please.

Lessons learned

Copy more information aside from the system than Oracle tells you to do and do not forget to use common sense.

Use sectors instead of other mechanisms to expand partitions or even better, use LVM.

In case of doubt, open a service request to verify thing so you can be sure before continuing.

I’ve included the full output from the commands, merely for my own reference, but in case you end up in troubles, at least you know where the default partitions in a virtualised exadata starts and ends.