Pythian - Data Experts Blog » Don Seilerhttp://www.pythian.com/blog
Official Pythian Blog - Love Your DataFri, 27 Feb 2015 16:58:06 +0000en-UShourly1Remembering RMOUG 2012http://www.pythian.com/blog/remembering-rmoug-2012/
http://www.pythian.com/blog/remembering-rmoug-2012/#commentsTue, 28 Feb 2012 17:22:02 +0000http://www.pythian.com/news/?p=30705Last week I returned to Denver, Colorado for the RMOUG Training Days conference. I had gone last year as an attendee, but this year I was invited to give my presentation on the Oracle 11g ADR (slides and info at the end of this post). While I haven’t attended very many conferences yet, I can’t imagine a bigger bang for your buck than RMOUG Training Days. This year’s conference had a great number of Oracle ACES presenting, including Jonathan Lewis, Cary Millsap, Kerry Osborne, Karl Arao, Debra Lilly, Tim Gorman, Kellyn Pot’Vin, and my Pythian colleagues Alex Gorbachev and Gwen Shapira, just to name a few. It was definitely an honor to be listed on the speaker roster along side (or at least very far below) these giants. It was also a great pleasure to meet many of the folks in the Oracle community that I had only “met” previously on twitter.

I presented an updated version of the Oracle 11g ADR presentation which I gave at the NoCOUG 2011 Summer conference. While this session wasn’t as technically meaty as some of the others, the audience was definitely engaged and I enjoyed some great Q&A both during and after the session. I’ve included these in a newly updated set of slides which I’m attaching here, and I plan to turn the whole thing into a nice blog post in the not-too-distant future. Until, feel free to email me at seiler@pythian.com with any questions, and I’ll try to include them in my upcoming blog post.

]]>http://www.pythian.com/blog/remembering-rmoug-2012/feed/0A NoCOUG to Rememberhttp://www.pythian.com/blog/a-nocoug-to-remember/
http://www.pythian.com/blog/a-nocoug-to-remember/#commentsThu, 01 Sep 2011 18:25:19 +0000http://www.pythian.com/news/?p=25677This post is long overdue, as I was supposed to blog about my appearance at NoCOUG before I left (sorry, Vanessa!). However in my efforts to rehearse and adjust my presentation, blogging about it just fell to the wayside. However now that NoCOUG 2011 Summer Conference is in the books, I’d like to take a few minutes to share my experience not only as an attendee, but also as a first-time speaker.

When I found out that NoCOUG had accepted my abstract, “Oracle 11g: Learning to Love the ADR”, I was both ecstatic and terrified. This meant that I actually had to prepare the presentation and speak in front of peers. Surely they would throw me into San Francisco Bay if I didn’t bring my A-game, so I set out to do just that.

Step 1: Learning to Love the ADR

The fact is that I chose to speak on the ADR because I knew very little about it myself. This would not only get me into the glorious world of Oracle conference speakers, complete with a lifestyle rivalled only by Caligula, but it would also ensure that I knew as much as I could about the ADR. Until this project, I had only known where to find the human-readable alert log so I could continue on with my Oracle 10g style of monitoring! I had heard tales of creating “packages” for Oracle support, but surely this was a complex, Herculean feat that would take many months of meditation to even begin to understand.

So I scoured Oracle Support notes and the public documentation to get notes on any aspect I could about the ADR. Once I felt I had enough source material, it was time to build an outline for my slides to follow. After that, it was time to eat my own dog food, so I got busy poking around the ADR and playing with adrci on my Oracle 11.2 sandbox, while concurrently applying some new-found knowledge on client installations. The latter helped me find a bug involving listener alert log purging!

Step 2: The Slides

The next step was actually creating a deck of slides. Using the Pythian template and Libre Office 3.4.2, I slowly figured out how this done in the most simple format possible. This means it was all text, baby. No cool stock photos or cute puppy clip-art. That’s something I’ll work on for next time and/or another topic.

Once I had a first draft of my slides and did a self-rehearsal, I enlisted the help of my Pythian colleagues in our Sydney office to be my test audience to not only judge my presenting pace & speech, but also criticize the content of my presentation. And they did the latter wonderfully, each asking great questions that I hadn’t previously considered and making some great suggestions on where to provide a little more info and offering anecdotes from their own ADR experiences. I can definitely say that they made my presentation twice as good as it was before! After implementing those changes, I did another dress-rehearsal for some more colleagues around the world, got a few more notes, and I felt like it was ready.

Step 3: The Conference

When I got to the conference in San Ramon, I especially keen to meet Jonathan Lewis for the first time. Jonathan is one of the people in the Oracle world that I respect the most, and it was a joy not only to be in his presence but to also later share a lunch table with him. I was also delighted to finally meet my colleague Chen Shapira, one of the most brilliant DBAs we have at Pythian, which is saying quite a lot!

After the keynote, I stayed for Jonathan’s second session, which was titled “A Beginner’s Guide to Becoming an Expert”, with emphasis on becoming, rather than being. This was another great session that emphasized on having easily reproducable tests. Jonathan noted that the script he was using could build a set of tables with over 6.5 million rows on his laptop in under a minute, showing that it doesn’t take a lot of effort to have such scripts written and readily available whenever you need to test something. His scripts directory contained over 2600 SQL scripts for all manner of scenarios. One needs to be in the mindset to test repeatedly, making slight changes between tests and understanding the difference in results.

He included a demonstration of how Oracle actually fails the ACID test, which he blogged about afterward, definitely give it a read.

After an outdoor lunch, I decided to pass on the next session to go over my slides again and ensure that everything was working on my laptop as expected. One other session I was really interested seeing was Kyle Hailey’s NFS Tuning session, but unfortunately it was at the same time as my own session.

Step 4: Spreading the Love

When the time came for my session, I felt very prepared. I did catch myself rushing through words once in a while, but otherwise felt my pace was good. There were a few questions from the audience that I couldn’t immediately answer, but was able to find the answer for during the post-session ad-hoc demo that I went through for those that stuck around and had extra questions. Having a live instance to work on was incredibly valuable to the presentation and helped not only to reinforce the information that I was sharing, but also to test scenarios that audience members came up with that I hadn’t previously illustrated.

There was some great Q & A at the end of the session, as well as after the session. As I said earlier, I stayed around answering questions and going through new scenarios for some time afterward. Even after packing up and moving out of the room I was speaking with people who had more questions about the ADR, even some that hadn’t attended my session! I spent so much time talking, in fact, that I was late for the last sessions and so decided to just catch up on email rather than burst in late and distrub the presenters. As you can see, I ended up only attending Jonathan Lewis’ two sessions in addition to my own. Not a bad day altogether, except for losing my jacket (black Outdoor Research spring jacket, if anyone finds one!).

Unfortunately I was taking the red-eye back to the midwest that night so I didn’t get to do any sightseeing aside from what I got to see from the BART train. But I did enjoy this change from home:

I could get used to the San Francisco consistency.

It’s like a year-round Wisconsin springtime. For those interested, I did provide some twitter updates between sessions. I didn’t want to live-tweet because it would have been harder for my simple mind to follow along.

Overall, I think it was a very worthwhile experience. I accomplished my goal of learning about the ADR well enough to speak confidently and competently on it. I also accomplished my year’s goal of submitting to 3 conferences and my career goal of becoming an Oracle speaker. I definitely look forward to speaking more on this and other topics at future conferences as well as presenting internally to the rest of my Pythian colleagues.

The bug was first brought to my attention about four days after completing the Grid Infrastructure upgrade. The client system administrator (SA) noticed that the disk holding the Oracle home directories was slowly filling, at the rate of about 1Gb per day. We identifed that core dump files being created under the new GRID_HOME/log//diskmon/ directory, at the rate of about 1 every 10 minutes, each one about 8M in size. That adds up to 1152M (or just over 1Gb) per 24-hour day. Add that to the 8Gb that was being held in GRID_HOME/.patch_storage (we had to rollback the 11.2.0.1 April 2010 PSU and apply the 11.2.0.1 July 2010 PSU just to upgrade to 11.2.0.2), and that put a bit of a squeeze on the free disk.

The good ol’ OTN forums led me to bug 10283819. The original poster there shared also that removing the old (11.2.0.1) grid home directory and restarting diskmon services stopped the core dump creation. The poster then went to question a second issue with increased diskmon.log writing. After a solution was found for that, Oracle Support closed the bug for some reason, without ever addressing the core dump creation.

I can verify that removing the old 11.2.0.1 grid home (I did a tar+bz2 first) and restarting the services did stop the core dump creation, and am pushing back to Oracle support to get the bug re-opened or a new bug filed to specifically address this. In the meantime, if you are unable or unsure about removing the old grid infrastructure home, it should be safe to have a regularly scheduled script remove the diskmon core dump directories and save you a full disk surprise late some night.

]]>http://www.pythian.com/blog/upgrading-standalone-asm-to-oracle-grid-infrastructure-11-2-0-2-beware-bug-10283819/feed/2Using a Custom Timezone? Beware Oracle 11.2.0.2 Grid Infrastructure!http://www.pythian.com/blog/using-a-custom-timezone-beware-oracle-11-2-0-2-grid-infrastructure/
http://www.pythian.com/blog/using-a-custom-timezone-beware-oracle-11-2-0-2-grid-infrastructure/#commentsThu, 16 Jun 2011 20:41:20 +0000http://www.pythian.com/news/?p=23885We have a client that runs an application that, for whatever reasons, does NOT like daylight saving time. For that reason, the Oracle server is kept in Eastern Standard Time and does not change with the rest of the eastern United States when DST begins and ends every year. They accomplish this with a custom /etc/localtime file. However, they left /etc/sysconfig/clock set to “TZ=America/New_York,” which would prove fateful as I shall point out. So, with the custom localtime file, the “date” command as well as selecting sysdate or systimestamp would always return the current time in Eastern Standard Time. When it is Daylight Saving Time, as it is right now, this would be one hour behind “real” time as we consider it.
Now, we recently upgraded this client from Oracle Grid Infrastructure (for single-instance ASM) from 11.2.0.1 to 11.2.0.2. The next business day, the client alerted us that their date fields were coming back in Eastern Daylight Time. While this time was still technically right, they needed the time in the EST timezone.

I first set about trying to duplicate the problem. I was able to see the same incorrect results when I connected to the database remotely (e.g. via sqlplus or Oracle SQL Developer over TNS), but not locally (i.e. “sqlplus / as sysdba”). Then I duplicated the problem when connecting locally via TNS, meaning I was going through the listener. So we had narrowed it down to only connections going through the listener. I hadn’t considered the listener to be aware of timezones, so this was rather mind-boggling for me.

Here is an example of the incorrect results we saw:
SQL> select systimestamp from dual;

Marc and I believe that the Grid Infrastructure installer grabs the value in /etc/sysconfig/clock when setting up the env file in question. We’ve asked the client to ensure that /etc/sysconfig/clock is always properly set in the future.

We imagine that most places don’t try to fight Daylight Saving Time this way, but the bug also applies if you are doing any kind of timezone slight-of-hand, like telling your database it is in US Central time when the server might be in US Pacific time. So if your organization is doing this, be sure to double check the crsconfig file after 11.2.0.2 installation!

]]>http://www.pythian.com/blog/using-a-custom-timezone-beware-oracle-11-2-0-2-grid-infrastructure/feed/0Upgrading Standalone ASM to Oracle Grid Infrastructure?http://www.pythian.com/blog/upgrading-standalone-asm-to-oracle-grid-infrastructure-11-2-0-2-beware-bug-1233183-1/
http://www.pythian.com/blog/upgrading-standalone-asm-to-oracle-grid-infrastructure-11-2-0-2-beware-bug-1233183-1/#commentsTue, 07 Dec 2010 23:27:58 +0000http://www.pythian.com/news/?p=18925The past four days have found me very frustrated and at wits’ end while testing upgrades of standalone Oracle Grid Infrastructure (ASM) 11.2.0.1 to 11.2.0.2 on RHEL/OEL 5 VMs. The upgrade would seem to go fine, but after rebooting, I would see ASM and LISTENER running under the old (11.2.0.1) grid home directories again.

grid_1 is the old grid home, I expect to see grid_2. The comment about being added by Agent led me to a path where I eventually took a look at /etc/init.d/ohasd, which is basically the master script that starts everything up. I noticed that this file hadn’t been updated as part of the patching, and contained this:

I then ran some web searches for “oracle upgrade 11.2.0.1 ohasd” and found a blog post that had the same problem. Searching My Oracle Support then turned up DocID 1233183.1, titled “Standalone GI: init.ohasd/ohasd not updated after 11201 to 11202 upgrade”.

The bug is basically what it says, those files are not being updated during upgrades of standalone grid infrastructure. This is due to a logic bug in roothas.pl. I suggest reading the document for details.

The workaround is to manually copy those two files after the upgrade finishes. First backup the old files:

Now ensure that the two new scripts have the same ownership and permissions as the old ones. Then reboot to ensure that everything takes effect. After the server comes back up, ensure that all services and oratab are still pointing to the new grid home. Be sure to check “srvctl config” for the asm and listener services, and check the paths in /etc/oratab.

This is the second bug with the 11.2.0.2 upgrade process that I’ve encountered. The first one requires patching the 11.2.0.1 Grid Infrastructure with the July 2010 PSU just to be able to upgrade to 11.2.0.2. Let’s hope for some stronger QA in the future.

]]>http://www.pythian.com/blog/upgrading-standalone-asm-to-oracle-grid-infrastructure-11-2-0-2-beware-bug-1233183-1/feed/4Beware the /var/tmp/.oracle Hidden Directory!http://www.pythian.com/blog/beware-the-vartmp-oracle-hidden-directory/
http://www.pythian.com/blog/beware-the-vartmp-oracle-hidden-directory/#commentsFri, 02 Jul 2010 19:03:19 +0000http://www.pythian.com/news/?p=13939A few months ago, we had a test instance complaining that it couldn’t write to ASM. This was an 11.1.0.7 single (non-RAC) instance on Oracle Enterprise Linux 5, using ASM for the storage. We first saw these errors in the alert log:

ORA-15032: not all alterations performed
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation
ERROR: error ORA-15032 caught in ASM I/O path

Uh-oh, that doesn’t look good. So I log into the ASM instance and try to see if the disks are OK:

SQL> select path, mount_status from v$asm_disk;
select path, mount_status from v$asm_disk
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-29702: error occurred in Cluster Group Service operation
ORA-29702: error occurred in Cluster Group Service operation

I can’t even query that. As Ted would say, “strange things are afoot at the Circle K.” To be safe, I thought I’d try to shutdown the DBMS instance, which also failed without having to abort:

We ended up booting the server altogether, after which everything came up nicely. We filed an SR with Oracle Support, who directed us to Note 391790.1 (Unable To Connect To Cluster Manager Ora-29701). This note lists the cause, quite simply, as:

The hidden directory ‘/var/tmp/.oracle’ was removed while instances & the CRS stack were up and running. Typically this directory contains a number of “special” socket files that are used by local clients to connect via the IPC protocol (sqlnet) to various Oracle processes including the TNS listener, the CSS, CRS & EVM daemons or even database or ASM instances. These files are created when the “listening” process starts.

The solution is to restart CRS or reboot the machine. Our /var/tmp/.oracle directory looked like this:

I did some sandbox testing, and found that only the Oracle and root OS users could delete that directory, and was able to duplicate the error every time when doing so.

However, I really was dumbstruck that Oracle would have so critical a directory in /var/tmp! I politely note this to Oracle Support, who justified this location with a few solid reasons:

It has always been in this location (and still is in 11gR2).

/var/tmp/.oracle is a hidden directory, so it probably won’t be noticed by any miscreants looking to cause trouble.

OK, I was being sarcastic, these reasons are awful. The only safeguard they gave was “make sure no one deletes it.” We scoured the server for cron jobs that would automatically clean out /var/tmp but didn’t find any, nor any bash history suggesting malice. The only thing that we could think of was that this test server was in a VM (Citrix Xen), although one would hope that it doesn’t happen at all, regardless. We certainly could not find an explanation, but now we’re aware to not delete /var/tmp/.oracle while the instances are running (even though we never did before).

]]>http://www.pythian.com/blog/beware-the-vartmp-oracle-hidden-directory/feed/9Installing Oracle 11gR2 Enterprise Edition on Ubuntu 10.04 (Lucid Lynx)http://www.pythian.com/blog/installing-oracle-11gr2-enterprise-edition-on-ubuntu-10-04-lucid-lynx/
http://www.pythian.com/blog/installing-oracle-11gr2-enterprise-edition-on-ubuntu-10-04-lucid-lynx/#commentsFri, 11 Jun 2010 21:08:48 +0000http://www.pythian.com/news/?p=13291I recently reformatted my laptop with the latest Ubuntu LTS release, 10.04, aka Lucid Lynx. Since I like to have a native client installation as well as a portable sandbox server, I decided to install the latest version of Oracle EE, 11.2.0.1.

Rather than re-invent the wheel, I’m going to direct you to the previous Oracle-on-Ubuntu post by my colleague Augusto Bott. Many of the directions there hold true here (even with 32-bit vs 64-bit), with a few exceptions.Download the Software
First and foremost, download the 32-or-64-bit installer files (there are 2 of them) from OTN.

Ubuntu 10.04 comes with libstdc++6 installed. However, Oracle 11gR2 requires libstdc++5. If you do not install libstdc++5, you will see errors as described in this OTN thread. The fix, as described in that thread, is to download and manually shoehorn the libstdc++5 library files onto Ubuntu 10.04:

Now we activate the new settings from /etc/sysctl.conf with this command:

# sysctl -p

You should see the new settings in the output of that command.

Unpack and Install the Software
From here we go into the directory containing our two 11gR2 database zip files. I’m using the 64-bit edition, so my files are named linux.x64_11gR2_databaseXof2.zip, where X is 1 or 2. I unzip both of these files, which creates a “database” subdirectory:

Now you simply cd into the database directory and run the runInstaller program:

$ cd database
$ ./runInstaller

This will launch the Oracle Universal Installer, or OUI, program. The rest is pretty straight forward. Since this is just a sandbox, I chose not to provide any email info and declined to receive updates. I chose to first install the software only, which went perfectly well. OUI will complain about missing packages, since it is checking for RPMs. We can safely ignore these and proceed with installation.

Once installation is done I set these variables in my bash environment:

I launched dbca to create an Oracle 11gR2 instance. The one problem I encountered here was that, at the end of dbca prompts, the “Confirmation” dialog window was blank. Other blog posts on the internet suggested blindly clicking around until you magically hit the “OK” button, but more often than not I would hit the “Cancel” button. Turns out this is a not-so-uncommon problem with Java Swing and Gnome’s visual effects. The solution for Gnome users is to disable Visual Effects in the Appearance preferences before launching dbca. If you are using a lightweight desktop environment such as LXDE, you shouldn’t have this problem.

And that’s that. The rest is nothing new to those who have done installations before. You can optionally configure TNS names or Listener with the netca tool, or log into your new instance and enjoy.

Special thanks to Augusto Bott for not only authoring the previous Oracle-on-Ubuntu articles but also for suggesting the Gnome Visual Effects conflict!

]]>http://www.pythian.com/blog/installing-oracle-11gr2-enterprise-edition-on-ubuntu-10-04-lucid-lynx/feed/74Applying Oracle 11.2 April 2010 PSU for Single-Instance ASM and DBMShttp://www.pythian.com/blog/applying-oracle-11-2-april-2010-psu-for-single-instance-asm-and-dbms/
http://www.pythian.com/blog/applying-oracle-11-2-april-2010-psu-for-single-instance-asm-and-dbms/#commentsWed, 05 May 2010 03:48:21 +0000http://www.pythian.com/news/?p=11881When news of the April 2010 PSU for Oracle 11.2 came out, I was excited to see it, since it marked the first non-one-off patch release for the 11.2 database software. I happened to have an 11gR2 test system running on 11gR2 ASM via standalone Grid Infrastructure. I applied PSU 9352237 to the DBMS home and fired it up, only to see the folly of my ways when any ASM file operations like disk resizing (or auto-extending) failed with ORA-1653. This was due to the DBMS component now having a higher version number than the ASM component, which ASM does not allow. The Grid Infrastucture PSU would need to be applied to bring the ASM component up to snuff, but that patch (9343627) was, at that time, only “announced” with no ETA. Alas, the patch was rolled back and we continued testing without it.

Then this week I checked again and saw that PSU 9343627 was released and gave it a whirl. I was a little confused when the README seemed to contain a lot of instructions that always assumed it to be on a clustered, RAC install. My setup was a single-instance Grid Infrastructure installation just to provide ASM. I soon met problem upon problem when going through first this setup step:

# GRID_HOME/crs/install/rootcrs.pl -unlock
2010-05-03 11:40:42: Parsing the host name
2010-05-03 11:40:42: Checking for super user privileges
2010-05-03 11:40:42: User has super user privileges
Using configuration parameter file: GRID_HOME/crs/install/crsconfig_params
CRS-4013: This command is not supported in a single-node configuration.
CRS-4000: Command Stop failed, or completed with errors.
You must kill crs processes or reboot the system to properly
cleanup the processes started by Oracle clusterware
The Oracle Clusterware stack failed to stop.
You should stop the stack with 'crsctl stop crs' and rerun the command

I decided to try the PSU anyway, thinking Oracle would certainly have noted in the README if this was a problem for single-node installations. Turns out I was wrong:

I created an SR with Oracle support, and after reviewing my case agreed that the instructions were not right for single-instance Grid Infrastructure installations. They will be updating the README, but in the meantime I’d like to share the revised instructions that got me through two successful GI/DBMS PSU applications today.

Naming Conventions
First a few naming conventions to avoid confusion:

DBMS_HOME – The Oracle DBMS installation directory.

GRID_HOME – The Oracle Grid Infrastructure installation directory.

I’ve replaced all references to the directories in my examples with DBMS_HOME and GRID_HOME. Whenever you see those strings, substitute the actual path to your DBMS install or GRID install, whichever the case may be.

Download the PSU and OPatch 11.2
First, you’ll obviously need to download patch 9343627 that contains the PSU for your platform. In my case that’s Linux x86_64. As of this writing, I believe it’s only available for Linux x86 and x86_64. Note that patch 9343627 also includes patch 9352237. You do not have to, and should not, download that PSU separately. Everything you need is in the 9343627 download.

Second, choose a working directory to unzip that file. In my case, this is /home/oracle/software/11gR2/psu/gi, which looked like this after unzipping the PSU zip file:

$ ls
9343627 9352237 p9343627_112010_Linux-x86-64.zip

Unlike normal one-off patches, where you would descend into the individual patch-number-named directories, you’ll stay at this level.

Note also that you’ll need OPatch 11.2 or higher. See MOS Doc ID 6880880 and get the latest version of OPatch for your platform. Then just unzip the file in the DBMS_HOME and GRID_HOME directories, overwriting the old OPatch directory.

Pre-Patch Steps
First I’m assuming that you have shut down all DBMS instances on this server. We will shut down ASM later in this process, so be sure to shutdown the DBMS instances now.

Now we’ll start the pre-patch steps. First we stop the database on this server, saving the state of the database configuration in /tmp/oracle_home.stat:

Repeat this for any additional Oracle database home installs on the server.

Next we save the database home configuration. This only needs to be done if we’re also going to patch the DBMS home in addition to the GI home. This is done from the directory where we unzipped the PSU:

Applying PSU 9343627 and 9352237 to Oracle 11.2 Grid Infrastructure
As I mentioned earlier, the PSU contains both 9343627 and 9352237, and it is advised that you apply both. I did so successfully with these commands:

For each one, you’ll go through the normal opatch [y|n] confirmation prompts, and also prompted for an email address. I entered in my MOS login email, and left the password blank (it is optional). The rest was automatic and finished without error for both cases.

Applying PSU 9343627 and 9352237 to Oracle 11.2 DBMS
Here is something slightly different when applying the 9343627 patch to the DBMS_HOME, we specify a subdirectory after the “napply” parameter:

Note that I use OPatch from DBMS_HOME when patching DBMS_HOME, and from GRID_HOME when patching GRID_HOME. I can’t say if it is actually necessary, but I like to keep things tidy. This is why I advised installing the new OPatch into both homes earlier.

As of this moment, both homes are patched, but we aren’t done yet!

Post-Patch Steps
First we run the postpatch.sh script included in patch 9343627 to reset some file permissions on the DBMS_HOME:

As I said, Oracle support will be updating the README for PSU 9343627, but in the meantime I hope this guide helps you as much as it helped me. I’d like to thank my Pythian colleague Alex Gorbachev for his help in diagnosing some of the GI problems after the first broken patching, and also Esteban B. at Oracle Support for working closely with us to get a new single-node action plan.

]]>http://www.pythian.com/blog/applying-oracle-11-2-april-2010-psu-for-single-instance-asm-and-dbms/feed/18Upgrading to Fedora 12? You might need more /boot space!http://www.pythian.com/blog/upgrading-to-fedora-12-you-might-need-more-boot-space/
http://www.pythian.com/blog/upgrading-to-fedora-12-you-might-need-more-boot-space/#commentsThu, 19 Nov 2009 22:04:14 +0000http://www.pythian.com/news/?p=5539Today, I had a spare Fedora 11 machine sitting next to me, so I thought I’d try the upgrade to the newly-released Fedora 12, aka “Constantine.” Fedora support cycles are rather short compared to Ubuntu, so Fedora 11 will likely be de-supported in 6 to 7 months. Normally I’d wait a little longer into the Fedora 12 cycle for others to find the fun upgrade bugs and have them fixed for me, but I didn’t mind having to re-install from scratch on this machine if I needed to.

Following the Fedora documentation, I decided to use the “preupgrade” tool. Everything was going smoothly until the machine restarted to begin installation of the new packages. I got a message that there wasn’t enough space in my /boot partition. Specifically, the message claimed that there was insufficient disk space in /mnt/sysimage/boot. I found this rather odd and troubling, since I had let the Fedora installer determine the /boot partition size when I originally installed Fedora 11.

Turns out that this is a known problem with the preupgrade tool. A kind soul in #fedora on IRC directed me to the list of common Fedora 12 bugs, in particular the preupgrade free space check. I installed the updated preupgrade package as directed, but again got the error. That’s when I followed the next link for additional tips to free up space in /boot. The first was to remove obsolete kernels, which I had already done. The next was to run tune2fs on /boot filesystem to free up reserved blocks, which aren’t needed for /boot. I strongly suggest you visit the links provided for helpful screenshots and commands to follow.

After making these changes, the upgrade worked and am I’m the proud owner of a Fedora 12 Constantine laptop, with a slightly brighter shade of blue desktop than that crusty old Fedora 11. ;)

]]>http://www.pythian.com/blog/upgrading-to-fedora-12-you-might-need-more-boot-space/feed/4HOWTO: Oracle Cross-Platform Migration with Minimal Downtimehttp://www.pythian.com/blog/howto-oracle-cross-platform-migration-with-minimal-downtime/
http://www.pythian.com/blog/howto-oracle-cross-platform-migration-with-minimal-downtime/#commentsThu, 20 Aug 2009 19:49:02 +0000http://www.pythian.com/news/?p=3653I recently performed a migration from Oracle 10gR2 on Solaris to the same version on Linux, immediately followed by an upgrade to 11g. Both platforms were x86-64. Migrating to Linux also included migrating to ASM, whereas we had been using ZFS to hold the datafiles on Solaris. Restoring files into ASM meant we would have to use RMAN (which we would probably choose to use anyway).

As with many databases, the client wanted minimal downtime. It was obvious to us that the most time-consuming operation would be the restore and recovery into the new instance. We were basically doing a restore and recovery from production backups and archived redo logs. It quickly dawned on me that we could start this operation well before the scheduled cutover time and downtime window, chopping at least six hours from the downtime window. The client would only need to keep the new instance in mount mode after the initial restore/recovery finished, periodically re-catalog the source instance’s FRA (which was mounted via NFS), and then re-run the recover database command in RMAN. Once the time comes to cutover, simply archivelog current the original instance and shutdown immediate. Then open the new instance with the RESETLOGS option, and voila! Migration complete!

I’ll try to recreate a simple example here.

On the new instance, assume that you have ASM set up and configured with two disk groups, DATA and REDO. Let’s also assume that you have the FRA directory (or the directory where you hold backups and archivelogs) mounted from the Solaris host to the Linux host under /solfra/prod/. Here’s the command we used:

We then need to get a pfile and edit it for ASM-ification. You can either restore the spfile from backup or just export a pfile from the current running production instance. To do the former, you’d need to go into RMAN and run an operation similar to this (best to ensure that the ORACLE_HOME, ORACLE_SID, and other Oracle-related environment variables are set properly):

Note that if you restored the spfile from backup, you don’t need to specify the recovery area in the restore controlfile from autobackup command. In this case, because I chose to create a pfile from the source instance, I needed to supply the recovery area location. Note that this is the recovery area that holds the source backups, not the recovery area that your new instance will use once it is opened.

Note that X and Y are the file# of the individual datafile and tempfile, respectively. You will probably want to generate this file if you have a large number of datafiles.

It is important that you read MetaLink Doc 415884.1, if you are performing a migration to a different platform with the same endian. Oracle suggests that you should perform a CONVERT DATABASE, but later admits that you only need to perform a CONVERT on tablespaces containing UNDO segments:

The convert database command is a no operation command for all files that do not contain undo but they are scanned regardless during the conversion. Using the convert tablespace/datafile for those containing undo will suffice and all other source files not containing undo can be moved to the target (same endian) platform without the Recovery Manager conversion. This will greatly reduce the time required to move and database with thousands of files from source to target platform also decreasing the amount of time the database is unavailable.

We had an internal debate over whether or not it is truly necessary even to do this. However, I would advise you to read this document and discuss it with Oracle Support Services. They would most likely try to convince you to do it or threaten to not support your database.

Once the script finishes with the recover database portion, this query should indicate that datafile headers and controlfiles are in sync:

The database is still in mount mode. If we were ready to cut over now, we could open the database with the RESETLOGS option. However, we are still a few hours or a day away from the cutover. What we want to do is apply more logs from the current production instance as they get archived. To minimize the babysitting, we decided to run this in a script once every few hours:

CATALOG START WITH '/solfra/PROD' NOPROMPT;
RECOVER DATABASE;

As you can guess, this re-catalogs the NFS-mounted FRA, ignoring files that were already cataloged. So now the controlfile knows about the new archivelogs. The RECOVER DATABASE command then beautifully applies logs from where it left off until it runs out of logs again. We re-run this periodically to keep our new instance as close to current as possible until it’s time to cut over.

When the cutover time comes, we run this on the current production instance:

Then, on the new instance, we do one last catalog and recover. This should bring us completely up to date with the original prod instance. Now we just need to open with resetlogs:

SQL> alter database open resetlogs;
Database altered.

In my case, I manually renamed the online and standby redo logs to +REDO (they still had the old filesystem names in v$logfile), but I believe that OPEN RESETLOGS does this automatically anyway.

And so, here we are in Linux on on the same version (10.2.0.2) as Solaris with appropriate one-off patches, using ASM. At this point, I was just under 30 minutes into my downtime window since the original prod instance was shutdown. I don’t think that’s too bad for moving a 1.3T database!

After this, I immediately went into upgrading to 11gR1 (11.1.0.7) and then building a new standby instance, but that is outside the scope of the initial migration that I wanted to share with you today. Cheers!