This blog posting is about XML namespaces, and how to deal with them when
processing XML with SQL functions. This often leads to confusion and questions -
the following sections will try to shed some light into this. We'll start with a simple example ...

But, although all tag names are correct, it does not work. The reason
is the XML namespace declaration at the beginning of the document.

<document xmlns="a">...</document>

XML has the concept of namespaces. An XML tag is being identified not only by its
name (here: document), but also by its namespace. Using this, XML allows
tags having the same name, but different semantics. By considering both name and
namespace, an XML engine is able to differentiate between the tags. In practice, namespaces
are being named with URLs like http://mycompany.com/myproject/mynamespace
(these URLs don't have to exist physically - it's just a name).
Instead of a URL, we can also use any character string for a namespace, like in the
example above: "a" (which is not likely to be globally unqiue, of course).
If a URL, or a long string is being used for a namespace, we seem to have a problem: We
need to add this string to each and every XML Tag, don't we ...?

Namespace prefixes map long namespaces to short prefixes

The default namespace is being used for tags without a prefix

Both are declared with the xmlns attribute: xmlns="a" declares "a" as
the default namespace wheres xmlns:pr="a" maps "a" to the namespace prefix "pr".
In our example, a default namespace is being used.
As an alternative, we can declare a namespace prefix. Then we need to add it to the
XML tags - like in the following XML document.

Both XML documents have exactly the same semantics - for an XML parser there is
no difference. And that's not all: XML tags can also have no namespace at all. Some refer
to this as the null or empty namespace. This is
the case, when the xmlns attribute is either not present or when the
XML document has no default namespace, but contains XML tags without a namespace prefix.
Therefore we can have XML tags with, and without a namespace and - of course -
mutliple different namespaces - all within the same XML document.

We need to take care about this when authoring XMLTABLE queries. In the above example,
all XML tags were part of the "a" namespace. But the XMLTABLE query did not
contain a namespace declaration - so it looked for tags without a namespace. Since
the XML document does not contain these, the query result is correct. To get the query
working - we need to add the XMLNAMESPACES clause. A working solution can look
like this ...

The XMLNAMESPACES clause within an XMLTABLE query has the same meaning
as the xmlns attribute within an XML document: a namespace
is mapped to a prefix or declared as default.
The prefixes
and defaults within a query are independent from the prefixes and defaults in the document -
only the namespaces themselves (here "a") are important.
Having this in mind,
we can now also imagine documents with tags from multiple namespaces.

This XML document is semantically different to the previous ones, since each XML tag
belongs to another namespace: a, b and c. a is being
declared as the default (XML tag has no prefix); b and
c are mapped to prefixes ns1 and ns2. A working XMLTABLE
query can look as follows.

As already said, the prefixes used in the XMLTABLE query can be different from the prefixes
in the document: the namespace itself is important - not the prefix. We do also not need to
use a default namespace - another possibile solution would be to have three different namespace prefixes.
Taking care about this, XML documents with namespaces are no problem at all.

In this blog posting, I'd like write something about my first experiences with the
Oracle database and node.js. This will not be a tutorial on how to install
and configure node.js, and how to install new packages - there are plenty tutorials
of this kind available in the internet. This posting will focus on how to
deal with the Oracle database in node.js programs.

I like to describe node.js as "javascript on the server". It is based on Chrome's
JavaScript Engine: a node.js script is being placed in a normal file and then
executed by node.js - typically launched from the command line. The programming
language is Javascript, node.js is extensible, provides a package manager (npm) and
has a vibrant community which drives and develops the function library. Compared to
other programming languages, there is a huge difference:
Node.js is asynchronous, event driven and non-blocking for I/O bound
functions; I'll go into more details during the blog posting.

To connect to the Oracle database, a program uses a driver - in the Java World this
is JDBC, in the .NET World this is ODP.NET and so on. For node.js Oracle does not (yet)
provide a driver - so we have to look what the community provides: And there is
the module node-oracle by
Joe Ferner, which is
available from GitHub.

The download contains a README which pretty well describes the installation of
the node-oracle module into an existing node.js environment. On my linux system
this went smooth and without problems.
After that we can author the first node.js program which does something with
the Oracle database. So, let's select the EMP table and print out the results.

As said earlier, node.js is event driven. The oracle.connect function, which
opens the connection to the database, is
I/O bound (network communication to the Oracle database). So node.js executes it
asynchonously. During its execution, node.js does not block and
continues working with the next statement. Upon finish, the oracle.connect will
execute the callback function which has been provided as a parameter (in the example this is an
anonymous function). So, node.js code looks different from Java, C or PHP code. Due to
the asynchronous and non-blocking architecture, developers work with callback
functions.

console.log("done") is the very last statement within the script - but in the
output it seems that it has been executed first.
That is due to the non-blocking characteristics of node.js. If you examine
the script carefully, you'll notice that at the top-level there are only 2 statements. The first
is oracle.connect, the second is console.log. The other lines are callback
functions. So the script starts by invoking oracle.connect, the database connection is being
established - and while this is being done, node.js continues executing the console.log statement.
So we first see "done" and then the actual data from the database.

The next node.js script is supposed to load files into the database. I want to
provide a directory - the node.js script loads all text files within that directory
into a given database table, DOCUMENT_TABLE.

create table document_table (filename varchar2(200), document clob);

The node.js script is based on the previous one: The nature of the task
is a bit more complex - and so is the script. The filesystem calls for
reading a directory and reading a file are I/O bound as well - and so they
have the same characteristics as the Oracle calls. We need to provide a
callback in all the cases. For that reason we get this highly nested code. There
are approaches in the node.js community to
keep the code more simple, but this is another topic.

The script opens the connection to the Oracle database, walks through the directory,
reads each file and performs a SQL INSERT on the Oracle database. So far, so good.
Here all the work is being done within one database connection. So on the
database side, we utilize one CPU. Some database systems have more available resources,
so what to do if we want to have this more scalable ...?

We need more database connections - so we change the order within the script. We
first walk though the directory, read each file, then open a database
connection in order to perform the INSERT operation. The code now looks like this.

In the "traditional" world, this would work without any problems - perhaps it would
be slower, because all these database connections are being opened and closed - but it
would work. But the node.js script shows up nothing ... and after a while
it stops with an error message (note that the script might run successful on a
database with huge resources). What happens here ...?

Again, node.js is event-driven. Creating an Oracle connection is a
very expensive task - while the first one is running, the script continues in
walking through the directory, reading files and spawning new Oracle connections.
Before the first file has been stored into the database, node.js has
walked completely though the directory and requested a database connection for
each file. And as soon as the database limits have been
reached, the script stops.

We need a connection pool. But node.js does not provide a connection pool for
database connections. Luckily, the database does. Oracle 11.1 introduced the
Database Resident Connection Pool which has been developed for exactly
these cases. This connection pool (which is not the Shared Server Architecture)
is a pool of Dedicated Server Connections. When a client establishes a database
connection, no new background process will be created, it will be taken from
the pool. And when the client closes the connection, the server process is not
being destroyed, it is being placed back into the pool. And as all connection pools,
one can configure settings like the minimum or maximum amount of connections or
other settings.

The database-resident connection pool is being administered with the
PL/SQL package DBMS_CONNECTION_POOL. You need DBA privileges in order to work
with it and if you are using Oracle12c and the Multitenant Architecture, you
need to work in the Container Database. Now, start the Connection Pool with default
settings.

To actually use the connection pool in our node.js script, we just need to
make a tiny change within the database connection parameters. Note that
this works in SQL*Plus, Java or any other Oracle client as well.

The key to scalable node.js scripts working with the Oracle database is
a mainly unkown database feature: Database Resident Connection Pool. Every node.js
program working on the Oracle database should use it.

I did some further experiments with node.js - but for today this should be
enough information. I think, node.js can be nicely used as a scripting language.
For instance, converting a folder of XML documents to JSON and uploading this
to the database is just a minor change within the above script and will
lead to only a few more lines of code ...

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle Corporation or Oracle Germany. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.