This blog posting will be about "executing operating system commands" from within
the Oracle database. Generally, there are two approaches for this.

You can use the built-in package
DBMS_SCHEDULER to create a job which launches an
operating system executable or a shell script. DBMS_SCHEDULER is built-in,
available in all database editions (including XE), so there are no special
prerequisites. A disadvantage is, that you cannot access STDIN, STDOUT or
STDERR - so there is no "handle" for you to get the console output of your
operating system command.

As an alternative, you can use the
package for filesystem access and operating system commands,
which is based on the Java VM inside the database. It allows to do those things,
the built-in packages cannot do - like accessing console input and output or
retrieving folder contents as a virtual table.

In both variants, the actual operating system call is being executed as the user,
which also runs the Oracle processes. DBMS_SCHEDULER allows changing this for
a longer time, but this is widely unknown. With the Java-based approach, it
was impossible to change the operating system user in the past - but the latest
database versions 11.2.0.4 and 12.1 allow this as a new feature. In this blog
posting I'll show, how to execute an operating system command as a different
user - based on the most current 12.1 release of the Oracle database.

Change the operating system user for the DBMS_SCHEDULER approach

Beginning with version 12.1, the Oracle database contains the new package
DBMS_CREDENTIAL. Before Oracle 12.1, the DBMS_CREDENTIAL
functions and procedures were part of DBMS_SCHEDULER itself (deprecated with 12.1).
With the CREATE_CREDENTIAL procedure, a new credential (a username/password pair)
is being created in the database. A database user needs the CREATE CREDENTIAL
system privilege for this. The following PL/SQL call creates a credential named
CRED_TESTUSER containing username
and password for the Unix-User testuser.

This credential can then be utilized with DBMS_SCHEDULER. But in order to create
a job executing an operating system call, the system privilege CREATE EXTERNAL JOB
is needed. The simple CREATE JOB is not sufficient. The following PL/SQL code
creates the scheduler job JOB_TOUCH_FILE which executes some shell code - as
the user testuser.

Change the operating system user for calls done with the database JVM

I'll show the java example based on the above mentioned
package for filesystem access and operating system commands.
Within that package, OS_COMMAND allows to launch an executable. As already
mentioned, this package allows to access STDOUT, STDIN and STDERR - the following
example with the ls command illustrates this.

As you can see, there is the directory tmpdir - which is owned by
testuser and which other Unix users cannot access. By default, an
operating system command executed via OS_COMMAND, also cannot access it (it's being run as the
owner of the Oracle database processes).

To change the user, as which the external calls are being made, there is the
SET_RUNTIME_EXEC_CREDENTIALS procedure within the DBMS_JAVA package.
It also allows to store a username/password pair. But, as opposed to DBMS_CREDENTIAL,
this will be linked to the database user and not be stored under its own name. Therefore,
only a DBA (SYS) can use that procedure. The following PL/SQL code does two things: First,
it maps the database user SCOTT to the Unix user testuser - so all external
calls, done by SCOTT with Java in the database, are being executed as testuser.
Second, SCOTT needs the privilege to execute an external call at all - the
call to DBMS_JAVA.GRANT_PERMISSION does this. Both calls have to be executed
as DBA (as SYS).

As already pointed out, the mapping between SCOTT and the Unix credentials
for testuser is being persited in the database. It will remain even when
the session end. So one database user can have only one Unix credential, for external
calls by the Java VM, at the same time. That is an important difference between
the Java based approach and DBMS_SCHEDULER.

This blog posting will be about "analysis" of XML documents - in the database, with the
SQL language. I have blogged several times about how to extract information from
XML documents using the SQL functions XMLTABLE or EXTRACTVALUE (in earlier database versions). But
to work with these functions, one must know, where the information of interest is located
within the XML document. One must know the name of the XML tag, its position within
the hierarchy, its namespace and so on. Of course, there is always good and comprehensive
documentation (or an XML Schema) which contains this kind of information ... but
I'll proceed anyway ...

So, today you'll see SQL queries on XML documents, which don't extract actual
content, but information about the XML structure. Perhaps this is useful for somebody -
sometime.

Within the SQL functions XMLQUERY or
XMLTABLE
(I hardly use XMLQUERY myself), XML tags are being selected. For this selection, either
the XQuery or the more simple XPath language can be used. Beyond the pure navigational
features, XPath also contains some functions to obtain information about the XML structure:
name() retrieves the name of the current XML tag, namespace-uri() gets the
full XML namespace information, and there are some more. With this in mind, we can build
our first example. A SQL query is supposed to list the names of all XML tags.

For clarity, we pass the XML document directly into the XMLTABLE function as a literal. The
XPath expression //* selects all XML tags in all hierarchy levels. This
"intermediate XML result" will then go the the COLUMNS clause which maps it to
the relational result set of the SQL query. In this case, the COLUMNS clause applies the
name() function on each selected node. You'll see a row for each XML tag - including
duplicates. To get only distinct tag names, we can utilize SQL features (SELECT DISTINCT); and
counting the XML tags is also a very easy task now (these are the things I really like
in the Oracle database - combine functionality and do awesome things.)

That is very useful, when you don't know all the used namespaces within your XML document. In most
cases these are being declared at the beginning, but this is not mandatory. A namespace declaration can
occur at any position in your XML document, so perhaps there is an XML tag from an unknown namespace - at
the very end of the XML document ... you can't be sure. Of course, in pratice we always have an XML schema or a documentation, so this is all theoretical ...

Next, we'll extract information about the context of an XML tag. XPath does not only
allow to navigate from a parent to a child, but also from a child to a parent - or from
a tag to its sibling. Whereas the child axis is the default, there are other axes to
navigate along. The following SQL query will use the parent axis in order to retrieve
the name of each XML tag's parent. Since this is a bit more complex, we don't use
pure XPath any more. We'll utilize XQuery syntax instead.

I don't want to provide a full explanation of XQuery here (there are plenty of tutorials in the web),
but you might see how it works. The XQuery expression "builds" another XML document (for each
XML tag) with
the simple structure "<r><t>{tag-name}</t><p>{parent-tag-name}</p></r>".
This is being passed to the COLUMNS clause. The XPath expressions
in the COLUMNS clause now selects items from the "intermediate" XML document, so they
navigate within the <r>, <t> and <p> tags. Looking at this queries' results and
knowing that we are within the Oracle database, we should know, what the next step is ...

Another nice instance of combining technology: A SQL query gives an overview on the
structure of an XML document. Other technologies would require you to install tools or
to author complex procedural code - and here we are doing all this with query language.
Let's test this on more complex XML document - I'll take one which is already present
in an Oracle database: The configuration of the XML DB protocol servers (FTP, HTTP, WebDAV)
is being stored as XML within the database. We can retrieve it with the PL/SQL function
DBMS_XDB.CFG_GET.

We now get an overview on the hierarchy of the XML DB protocol server configuration; without
a tool - just with SQL. Using the SYS_CONNECT_BY_PATH function, we can let the database even generate a full path to each XML tag.
And with a WHERE clause we can look for specific tags of interest. The following query shows this:
We are interested in the full path to a tag named session-timeout.

OK folks, enough for today. When playing with these features, keep in mind, that
the database does not optimize these kinds of XML query for performance - so they might take longer
for large XML documents or large document sets. But having this in mind, nothing should
prevent you from analyzing your XML documents - with SQL!

One of the new things in Oracle12c is, that more (compared to Oracle11g) pure PL/SQL types
are accessible from SQL queries or DML statements. Pure PL/SQL types are record-, boolean
or "INDEX BY" types. The latter are also called associative arrays. So, beginning with 12.1,
we can access PL/SQL INDEX BY tables with a SQL query - according to
the documentation. This blog posting describes how it works (and what does not work).

First, the setup. We need (of course) a PL/SQL function returing a PL/SQL INDEX BY table.
And in order to create such a function, we need to have a package specification with the
INDEX BY table definition. To make it easy, we put the function into the package as well.
Reading the documentation carefully, we observe the first restriction for this feature:
The INDEX BY table must be declared as INDEX BY PLS_INTEGER.

Does not work. Bummer. OK, read the documentation again. Aah, there is the other
restriction: A PL/SQL function cannot return a value of a PL/SQL-only type to SQL.
But what's the value of that new feature then ...? And how can it be used?

So, PL/SQL INDEX BY tables are accessible for a SQL statement
within a PL/SQL procedure, function or anonymous block (use the TABLE() operator).
This can be very useful: think about joining your PL/SQL INDEX BY table to a
database table, as follows ...

In Oracle 12.1, PL/SQL and SQL work more closely together. We can use INDEX BY tables within
SQL queries now, as long as this query is being executed within a PL/SQL context. So
INDEX BY tables are still PL/SQL, not SQL data types. We still have some barriers
for PL/SQL data types in SQL - but the development direction is good.

This blog posting is about XML namespaces, and how to deal with them when
processing XML with SQL functions. This often leads to confusion and questions -
the following sections will try to shed some light into this. We'll start with a simple example ...

But, although all tag names are correct, it does not work. The reason
is the XML namespace declaration at the beginning of the document.

<document xmlns="a">...</document>

XML has the concept of namespaces. An XML tag is being identified not only by its
name (here: document), but also by its namespace. Using this, XML allows
tags having the same name, but different semantics. By considering both name and
namespace, an XML engine is able to differentiate between the tags. In practice, namespaces
are being named with URLs like http://mycompany.com/myproject/mynamespace
(these URLs don't have to exist physically - it's just a name).
Instead of a URL, we can also use any character string for a namespace, like in the
example above: "a" (which is not likely to be globally unqiue, of course).
If a URL, or a long string is being used for a namespace, we seem to have a problem: We
need to add this string to each and every XML Tag, don't we ...?

Namespace prefixes map long namespaces to short prefixes

The default namespace is being used for tags without a prefix

Both are declared with the xmlns attribute: xmlns="a" declares "a" as
the default namespace wheres xmlns:pr="a" maps "a" to the namespace prefix "pr".
In our example, a default namespace is being used.
As an alternative, we can declare a namespace prefix. Then we need to add it to the
XML tags - like in the following XML document.

Both XML documents have exactly the same semantics - for an XML parser there is
no difference. And that's not all: XML tags can also have no namespace at all. Some refer
to this as the null or empty namespace. This is
the case, when the xmlns attribute is either not present or when the
XML document has no default namespace, but contains XML tags without a namespace prefix.
Therefore we can have XML tags with, and without a namespace and - of course -
mutliple different namespaces - all within the same XML document.

We need to take care about this when authoring XMLTABLE queries. In the above example,
all XML tags were part of the "a" namespace. But the XMLTABLE query did not
contain a namespace declaration - so it looked for tags without a namespace. Since
the XML document does not contain these, the query result is correct. To get the query
working - we need to add the XMLNAMESPACES clause. A working solution can look
like this ...

The XMLNAMESPACES clause within an XMLTABLE query has the same meaning
as the xmlns attribute within an XML document: a namespace
is mapped to a prefix or declared as default.
The prefixes
and defaults within a query are independent from the prefixes and defaults in the document -
only the namespaces themselves (here "a") are important.
Having this in mind,
we can now also imagine documents with tags from multiple namespaces.

This XML document is semantically different to the previous ones, since each XML tag
belongs to another namespace: a, b and c. a is being
declared as the default (XML tag has no prefix); b and
c are mapped to prefixes ns1 and ns2. A working XMLTABLE
query can look as follows.

As already said, the prefixes used in the XMLTABLE query can be different from the prefixes
in the document: the namespace itself is important - not the prefix. We do also not need to
use a default namespace - another possibile solution would be to have three different namespace prefixes.
Taking care about this, XML documents with namespaces are no problem at all.

In this blog posting, I'd like write something about my first experiences with the
Oracle database and node.js. This will not be a tutorial on how to install
and configure node.js, and how to install new packages - there are plenty tutorials
of this kind available in the internet. This posting will focus on how to
deal with the Oracle database in node.js programs.

I like to describe node.js as "javascript on the server". It is based on Chrome's
JavaScript Engine: a node.js script is being placed in a normal file and then
executed by node.js - typically launched from the command line. The programming
language is Javascript, node.js is extensible, provides a package manager (npm) and
has a vibrant community which drives and develops the function library. Compared to
other programming languages, there is a huge difference:
Node.js is asynchronous, event driven and non-blocking for I/O bound
functions; I'll go into more details during the blog posting.

To connect to the Oracle database, a program uses a driver - in the Java World this
is JDBC, in the .NET World this is ODP.NET and so on. For node.js Oracle does not (yet)
provide a driver - so we have to look what the community provides: And there is
the module node-oracle by
Joe Ferner, which is
available from GitHub.

The download contains a README which pretty well describes the installation of
the node-oracle module into an existing node.js environment. On my linux system
this went smooth and without problems.
After that we can author the first node.js program which does something with
the Oracle database. So, let's select the EMP table and print out the results.

As said earlier, node.js is event driven. The oracle.connect function, which
opens the connection to the database, is
I/O bound (network communication to the Oracle database). So node.js executes it
asynchonously. During its execution, node.js does not block and
continues working with the next statement. Upon finish, the oracle.connect will
execute the callback function which has been provided as a parameter (in the example this is an
anonymous function). So, node.js code looks different from Java, C or PHP code. Due to
the asynchronous and non-blocking architecture, developers work with callback
functions.

console.log("done") is the very last statement within the script - but in the
output it seems that it has been executed first.
That is due to the non-blocking characteristics of node.js. If you examine
the script carefully, you'll notice that at the top-level there are only 2 statements. The first
is oracle.connect, the second is console.log. The other lines are callback
functions. So the script starts by invoking oracle.connect, the database connection is being
established - and while this is being done, node.js continues executing the console.log statement.
So we first see "done" and then the actual data from the database.

The next node.js script is supposed to load files into the database. I want to
provide a directory - the node.js script loads all text files within that directory
into a given database table, DOCUMENT_TABLE.

create table document_table (filename varchar2(200), document clob);

The node.js script is based on the previous one: The nature of the task
is a bit more complex - and so is the script. The filesystem calls for
reading a directory and reading a file are I/O bound as well - and so they
have the same characteristics as the Oracle calls. We need to provide a
callback in all the cases. For that reason we get this highly nested code. There
are approaches in the node.js community to
keep the code more simple, but this is another topic.

The script opens the connection to the Oracle database, walks through the directory,
reads each file and performs a SQL INSERT on the Oracle database. So far, so good.
Here all the work is being done within one database connection. So on the
database side, we utilize one CPU. Some database systems have more available resources,
so what to do if we want to have this more scalable ...?

We need more database connections - so we change the order within the script. We
first walk though the directory, read each file, then open a database
connection in order to perform the INSERT operation. The code now looks like this.

In the "traditional" world, this would work without any problems - perhaps it would
be slower, because all these database connections are being opened and closed - but it
would work. But the node.js script shows up nothing ... and after a while
it stops with an error message (note that the script might run successful on a
database with huge resources). What happens here ...?

Again, node.js is event-driven. Creating an Oracle connection is a
very expensive task - while the first one is running, the script continues in
walking through the directory, reading files and spawning new Oracle connections.
Before the first file has been stored into the database, node.js has
walked completely though the directory and requested a database connection for
each file. And as soon as the database limits have been
reached, the script stops.

We need a connection pool. But node.js does not provide a connection pool for
database connections. Luckily, the database does. Oracle 11.1 introduced the
Database Resident Connection Pool which has been developed for exactly
these cases. This connection pool (which is not the Shared Server Architecture)
is a pool of Dedicated Server Connections. When a client establishes a database
connection, no new background process will be created, it will be taken from
the pool. And when the client closes the connection, the server process is not
being destroyed, it is being placed back into the pool. And as all connection pools,
one can configure settings like the minimum or maximum amount of connections or
other settings.

The database-resident connection pool is being administered with the
PL/SQL package DBMS_CONNECTION_POOL. You need DBA privileges in order to work
with it and if you are using Oracle12c and the Multitenant Architecture, you
need to work in the Container Database. Now, start the Connection Pool with default
settings.

To actually use the connection pool in our node.js script, we just need to
make a tiny change within the database connection parameters. Note that
this works in SQL*Plus, Java or any other Oracle client as well.

The key to scalable node.js scripts working with the Oracle database is
a mainly unkown database feature: Database Resident Connection Pool. Every node.js
program working on the Oracle database should use it.

I did some further experiments with node.js - but for today this should be
enough information. I think, node.js can be nicely used as a scripting language.
For instance, converting a folder of XML documents to JSON and uploading this
to the database is just a minor change within the above script and will
lead to only a few more lines of code ...

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle Corporation or Oracle Germany. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.