Musings on my family, work and things I find interesting. Mainly, this was / is intended to record things for my kids so that they can get an insight into how I feel about them, plus I haven't done any writing for a long time, so would like to get some skill in that sphere again.

Thursday, September 13, 2012

How to convert an Oracle .dmp into a more portable format

One of the things that I've come across has been legacy applications which use Oracle; where we don't have access to the database, but do get provided with Oracle .dmp files. These aren't very helpful when all you have (or are used to!) is MySQL and PostgreSQL.
One approach that I've had success with is to download a VirtualBox image of Oracle, and then play with the data in there. I chose the Database App Development VM since I wasn't sure what parts of Oracle I'd need, having strenuously, pretty successfully, avoided Oracle for most of my time in the industry.
I then imported this into VirtualBox (running on OSX Lion) and configured networking:

One adapter running NAT so that I can browse the internet in the guest OS. This let me download the .dmp file to the guest OS filesystem.

Set up Port Forwarding for the NAT interface so that I can ssh to port 2022 on the host which will go to port 22 on the guest, thus allowing ssh access.

Optionally, I set up the VM to have another adapter (Host-only) so that I can set up NFS shares to mount part of the host filesystem under the guest.

Next, I needed to get the .dmp data onto the guest OS (and later get the transformed data off the guest). ssh-copy-id is good for this, to put an SSH public key into the authorized_keys for the oracle user on the guest. You can also get data into and out of the guest using python -m SimpleHTTPServer ran in the appropriate directory, which let me browse the host filesystem or guest filesystem as needed. ifconfig in the host or guest lets me know which IP address to use.
Now, I needed to create a tablespace and user to allow me to import the data. I advise doing this, since (for me at least!) importing the data is an iterative process, and creating a separate tablespace (with separate data files) is a good practice since it avoids bloating the system tablespace and means that disk space can be reclaimed. Pretty much the only Oracle knowledge I have! Before you create the tablespace, it's a good idea to check the size of your .dmp and available space on the filesystem. I had a 1.4GB .dmp which didn't fit into the space left on the fs and I burnt a bit of time figuring out Oracle error messages for the failed import before I worked out the filesystem wasn't big enough. In this case, I created a symlink in $ORACLE_HOME/dbs/ which pointed to a large enough partition and set the owner / permissions as required. Creating the tablespace was just a case of running:

Then re-create the tablespace and the new user and try the import again.
Once the import has succeeded, you want to get the data out of the database into a less proprietary format. One way is to use SQL Developer (a GUI tool included in the VM image).
Open SQL Developer and define a new database connection:

Connection name

myuser

User name

myuser

Password

password

Save Password?

Checked

SID is orcl rather than xe, in the Developer Days VM that I used.
Test the connection. It should work. Then open the connection and examine the tables.

In the menu, click Tools | Database Export

Want to export the data only, into CSV.

Choose the connection, choose the tables, choose the destination file.

For large databases, this can take a while to process (2.5 hours for my case). It may be faster to write your own export routine using Perl, PL/SQL or similar. In the end, that's what I did, so that I could script the entire process like so:

A very helpful article. These are a few tweaks I had to use in my situation:

1) in my copy of the virtual machine, i had to run sqlplus sys as sysdba instead of sqlplus / as sysdba2) after tablespace and user creation, i had to execute:ALTER USER myuser QUOTA UNLIMITED ON mytablespace;to prevent this error on import:ORA-01950: no privileges on tablespace 'MYTABLESPACE'3) i got IMP-00038: Could not convert to environment character set's handle. After googling I decided to use impdp instead of imp.4) I got ORA-39087: directory name DATA_PUMP_DIR is invalid although the directory was defined, so I had to run in sqlplus also:CREATE DIRECTORY DUMPDIR AS '/u01/oracle/dumpdir';GRANT READ ON DIRECTORY DUMPDIR TO myuser;GRANT WRITE ON DIRECTORY DUMPDIR TO myuser; and specify DIRECTORY=DUMPDIR on impdp command line