The Anti-Kyte

Is there any such thing as ANSI Standard SQL ?
Lots of databases claim to conform to this standard. Recent experience tends to make me wonder whether it’s more a just basis for negotiation.
This view is partly the result of having to juggle SQL between three different SQL parsers in the Cloudera Hadoop infrastructure, each with their own “quirks”.
It’s worth remembering however, that SQL differs across established Relational Databases as well, as a recent question from Simon (Teradata virtuoso and Luton Town Season Ticket Holder) demonstrates :

Is there an Oracle equivalent of the Teradata LIKE ANY operator when you want to match against a list of patterns, for example :

like any ('%a%', '%b%')

In other words, can you do a string comparison, including wildcards, within a single predicate in Oracle SQL ?

The short answer is yes, but the syntax is a bit different….

The test table

We’ve already established that we’re not comparing apples with apples, but I’m on a bit of a health kick at the moment, so…

create table fruits as
select 'apple' as fruit from dual
union all
select 'banana' from dual
union all
select 'orange' from dual
union all
select 'lemon' from dual
/

The stand-off between Apple and the FBI has moved on. In essence both sides have taken it in turns to refuse to tell each other how to hack an iPhone.

Something else that tends to tell little or nothing in the face of repeated interrogation is SYS_CONTEXT(‘userenv’, ‘current_sql’).
If you’re fortunate enough to be running on Enterprise Edition however, a Fine Grained Auditing Policy will loosen it’s tongue.

Consider the following scenario.
You’ve recently got a job as a database specialist with Spectre.
They’ve been expanding their IT department recently as the result of their “Global Surveillance Initiative”.

There’s not much of a view from your desk as there are no windows in the hollowed out volcano that serves as the Company’s HQ.
The company is using Oracle 12c Enterprise Edition.

Everything seems to be going along nicely until you suddenly get a “request” from the Head of Audit, a Mr Goldfinger.
The requirement is that any changes to employee data in the HR system are recorded, together with the statement executed to change each record.
Reading between the lines, you suspect that Mr White – head of HR – is not entirely trusted by the hierarchy.

Whilst journalling triggers are common enough, capturing the actual SQL used to make DML changes is a bit more of a challenge.
Explaining this to Mr Goldfinger is unlikely to be a career-enhancing move. You’re going to have to be a bit creative if you want to avoid the dreaded “Exit Interview” (followed by a visit to the Piranha tank).

First of all though….

Fine Grained Auditing Configuration

You need to do a quick check to make sure that Fine Grained Auditing is available and configured in the way you would expect.

Access to Fine Grained Auditing

FGA is a feature of Oracle Enterprise Edition.
If you were working on any other edition of the database, Oracle would tell you that FGA is not enabled. For example, running the following on Oracle Express Edition 11g…

If you don’t happen to work for a worldwide crime syndicate and/or don’t have access to an Enterprise Edition database, you can still have a play around by means of a Developer Day Virtual Box image.

Unified Auditing

The other thing you need to check is just where audit records are going to be written to. This is not so much a requirement for the solution being implemented here, but it is relevant to some of the examples that follow.

By default, unified auditing is not implemented in 12c and you can confirm this by running :

select value
from v$option
where parameter = 'Unified Auditing'
/

If the query returns FALSE, then Unified Auditing has not been enabled.
Otherwise, it’s probably worth taking a look at the documentation to see how this affects auditing behaviour in the database.

Initialization Parameters

Assuming Unified Auditing has not been configured, the location of the audit records will be dictated by the AUDIT_TRAIL initialization parameter. You can check this value as follows :

select value
from v$parameter
where name = 'audit_trail'
/

If the value is set to DB, or DB, EXTENDED then any FGA policies should write to the tables mentioned below.

Now to take a closer look at FGA…

How long before SYS_CONTEXT cracks ?

To test exactly when you will be able to retrieve the DML statement you’re interested in, you can knock up a quick test.

create or replace trigger trg_msg
for insert or update or delete
on trigger_messages
compound trigger
l_action varchar2(10);
before statement is
begin
l_action := case when inserting then 'INSERT' when updating then 'UPDATE' else 'DELETE' end;
dbms_output.put_line('Before Statement '||l_action);
dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
end before statement;
before each row is
begin
dbms_output.put_line('Before Row '||l_action);
dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
end before each row;
after each row is
begin
dbms_output.put_line('After Row '||l_action);
dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
end after each row;
after statement is
begin
dbms_output.put_line('After Statement '||l_action);
dbms_output.put_line( nvl( sys_context('userenv', 'current_sql'), 'My lips are sealed'));
end after statement;
end trg_msg;
/

Next up, you need a procedure to serve as a handler for a Fine Grained Auditing event. The reason for this will become apparent when we run the test. Note that the signature for an FGA handler procedure is mandated :

It’s worth pausing at this point to note that SYS_CONTEXT can report up to 32k of a statement.
It does this by splitting the statement into eight 4k chunks, available in the USERENV context variables CURRENT_SQL, CURRENT_SQL1…CURRENT_SQL7.
It also provides the length of the statement it currently holds in the CURRENT_SQL_LENGTH variable.
Therefore, you may consider having a 32k varchar statement column in the audit table ( if this is enabled on your database), or even a column for the contents of each of these variables.
For the sake of simplicity, plus the fact that none of the examples here are very large, you decide to stick with just the one 4k varchar column to hold the statement.

In the real world this would probably be in a package, but hey, you’re working for Spectre.

Now we need a handler for the FGA policy that we’re going to implement. In order for the context values that are captured to be accessible to the trigger, this handler is going to be part of a package which includes a couple of package variables :

Due to the fact that the FGA policy is not fired until after an AFTER ROW trigger for a DELETE, we are only guaranteed to capture the CURRENT_SQL value in an AFTER STATEMENT trigger.
The upshot is that we’re left with a PL/SQL array which is not constrained by a LIMIT clause. In these circumstances it’s not too much of an issue, Spectre has quite a small number of employees…er…associates, so you’re not likely to end up with an array large enough to cause memory issues.
On a potentially larger volume of records you may well consider splitting the INSERT and UPDATE portions of the trigger so that you can limit the size of the arrays generated by these operations. For DELETEs however, it appears that we may well be stuck with this approach.
On a not entirely unrelated subject, Jeff Kemp has an interesting method of speeding up Journalling Triggers.

We live in interesting times. As I write, the political life of a great nation is currently in thrall to a wealthy right-wing populist with a rather distinctive hairstyle.
But enough about Boris Johnson.

For someone used to the way things are done in Oracle, Hadoop can be something of a culture shock.
My initial introduction to Hadoop and it’s “vibrant ecosystem” was some internally written documentation.
As with most technical documentation, it was written with the assumption of a certain level of prior knowledge on the part of the reader. For this particular reader, it proved to be an erroneous assumption.

After a half an hour of struggling through this text , I was left wondering what you would use a Khafka Banana Oozie for.

Maybe it’s for killing zombie processes ? Let’s face it, with all that Java running on your system there are bound to be a few knocking around.
I’m a little hazy on my Zombie Lore, so I’m not entirely clear as to why a rapid application of fresh fruit would pose an existential threat to your average zombie. Maybe they’re potassium intolerant ?

There are a bewildering array of tools associated with Hadoop, many of which appear to provide almost identical functionality.
For example, a relational database traditionally requires only one SQL Engine…and I’ll be taking a look at two of them.

Moving from Oracle to Hadoop can feel rather like trading your car in for a box of Lego.
If the box does contain any instructions they seem to have been translated from Java into English…by someone who doesn’t speak either.Note to reader : please substitute your own language as appropriate.

Fortunately, there are Hadoop distributions available which bundle the core tools required to get up an running. We’ll gloss over the fact that these distributions – Cloudera and Hortonworks – don’t contain the exact same tools.

In my search to find a coherent explanation of how to use Hadoop, I’ve found that the VM provided by Cloudera, together with the introductory tutorial, is a valuable aid to familiarising myself with the basics.

To start with then, I’ll cover getting hold of the Cloudera VM and setting it up in VirtualBox.
Then I’ll go through some of the tools available and what they do.
I’ll do this from the perspective of an Oracle developer (mainly because I don’t have much option) and will point out the driftwood of familiarity that you might be able to cling to in order to stay afloat in your Data Lake.

What I’ll cover is :

The core components of Hadoop

HDFS commands

Transferring data between a relational database and hadoop using SQOOP

Querying structured data using Hive and Impala

Uploading data using Kite

Ready to Dive in ?

Cloudera VM setup for Virtualbox

Before we head off to get the VM, there are a couple of points worth mentioning.

First of all, the VM is running on a 64-bit version of Centos 6.4. Therefore you need to make sure that your host operating system ( i.e. the one that you’ll be running Virtual Box on) is also 64-bit.

Secondly, the minimum recommended memory allocation for the VM is 4GB.

Finally, I’m using the Cloudera Quickstart 5.5 VM. The behaviour of some of the tools covered here differs in 5.4 and 5.5
The Cloudera VM can be downloaded from here .

You’ll need to fill in some details about yourself before moving on.

Once you’re at the downloads page, select VirtualBox from the Platform drop-down list.
VMs for other platforms are also available.

Once you hit the “Download Now” button you’ll be asked for some further details. However, these do not appear to be validated.

The download is quite chunky, just over 5GB.

Once it’s completed you should have a zip file called :

cloudera-quickstart-vm-5.5.0-0-virtualbox.zip

You can use a standard unzip utility to extract this file ( e.g. Winzip on Windows, plain old zip on Linux).
Once extracted, you’ll see a directory called cloudera-quickstart-vm-5.5.0-0-virtualbox which contains two files :

The MapReduce framework consists of a Mapper and a Reducer.
In SQL terms, the Mapper program applies a query predicate against the data you are querying – essentially, it does the filtering.
The Reducer then does any aggregation on the result set produced from the Mapper process.

Yarn

Yarn is the default job scheduler and resource manager. It facilitates parallel execution of MapReduce jobs.

HDFS

HDFS – Hadoop File System – is a distributed file system. The idea is that datafiles are replicated across multiple nodes (physical servers) in a cluster. Essentially, any program can run on any node. By replicating the data to each node, network latency is minimised for these programs.
It comes with it’s own set of commands which you can use interactively.
These appear to be largely a subset of those you’d find on a Linux OS.

The format of these commands is :

hadoop fs -command [some arguments]

However, the way that the VM is configured, we need to run these commands as the hdfs user.
Therefore, in the examples that follow the commands will follow the format..

sudo -u hdfs hadoop fs -command [some arguments]

To start with, let’s see what we’d need to do to upload a csv file to HDFS.

The file in question is called tools.csv, which I’ve created in the cloudera user’s home directory on the vm.
It contains the following :

There’s a fair amount going on here, we’re connecting to MySQL, then outputting the data as a compressed file onto hdfs in the /user/hive/warehouse directory.
The compression library being used is Snappy.

It’s instructive to see the output when we run this command as it shows both MapReduce and Yarn in action. You’ll probably see lines like :

NOTE – the first time I ran this, I shutdown the VM after I got to here. When I started it again, Impala (see below) refused to see any databases. I had to trash and re-create the VM to get it to work. I’m not sure why this happened ( I did have a hunt around) but just thought I’d mention it in case you’re thinking of doing the same.

Anyway, now we have data in Hadoop, it would be good if we could interrogate it…

Not Quite SQL – Hive and Impala

Cloudera comes with a Web based UI for Hadoop in the form of Hue.
Note that Hortonworks seem to be standardazing on a different UI tool – Apache Ambari.

In the Cloudera VM, there is a link to Hue on the Bookmarks bar in Firefox.
Click on this link and then connect as cloudera (password cloudera).

The favoured query engine for interactive queries in Cloudera is called Impala.
I believe that, at the time of writing, Hortonworks are sticking with Hive.

As far as I can tell, it seems that Hive has been retained by Cloudera to handle what, in RDBMS terms, would be called the catalog.
In Oracle terms this would be the Data Dictionary.

This is essentially the metadata for the tables in the database.
This metadata seems to be read by all of the SQL Engines irrespective of where it is updated from.

Once you’re connected via Hue you can select either of these tools from the Query Editors drop-down.

Initially, the tutorial directs you to Impala.

The Data Dictionary

The first thing to note about Impala is that it doesn’t bother reading the table metadata unless you tell it to. Therefore, if you make any changes DML or DDL changes, you probably need to tell Impala to check the metadata for any query results to pick up these changes.
For example, we know that we’ve got six tables in our database which we’ve created via SQOOP. However, if you ask Impala about it :

show tables;

…you get the not entirely helpful :

The operation has no results

By contrast, if you try this in Hive (Query Editors/Hive), the tables are all present and correct.

To persuade Impala to see these changes you need to run :

invalidate metadata;

Note that you can also run this command for individual tables should the need arise, e.g. :

invalidate metadata categories;

Anyway, now we can see the tables in Impala, we can run some queries against them.
Whilst we’re at it, we can do a simple comparison between Impala and Hive in terms of how they process the same query.

Comparative performance

The query in question (taken from the Getting Started Tutorial provided with the VM) is :

As well as being somewhat shorter, the Impala plan appears more familiar to someone looking at an Oracle Query Plan.

To check for the possible effect of caching, I then ran these queries again.
For Impala, the runtime dropped to around 13 seconds.
For Hive, the runtime was the same as for the first run.

The Impala plan was unchanged, despite the fact that the runtime was drastically reduced.
From this behaviour I would infer that there is some caching effect for Impala, although a cache similar to the Buffer Cache in Oracle is not shared between the two Query Engines.

The explanation offered in the tutorial is that Hive compiles SQL queries into MapReduce jobs wheras Impala was designed as a SQL engine.
It would appear that Impala is better for small-scale interactive queries wheras Hive is more suited to large-scale ETL.

You’ll notice that, at the start of the Impala query plan, there’s a warning about missing statistics.

If you roll the mouse over one of the tables on the left-hand side of the Hue Query Editor Window, you’ll see an icon appear which enables you to explore the table’s metadata.

Looking at one of these tables we can see that there are no stats present :

We can fix this easily enough in Impala by gathering stats for each of our tables. For example :

compute stats categories;

If we now check the metadata for the CATEGORIES table, we can see that stats are present :

If we now re-run the original query, the plan will no longer show the warning. However, the rest of the plan remains unchanged.
Given that these are quite small tables, this is probably not surprising.

Once again, it looks familiar for the most part.
One point of interest is the INSERT OVERWRITE command used to insert data into the table.
This has the effect of removing any pre-existing data in the table before inserting the new data.

The values in the url column contain %20 characters instead of spaces.
Let’s see what Impala can do in terms of the standard SQL string manipulation functions.

Unlike Oracle, there’s no REPLACE function, there is however a REGEXP_REPLACE…

select regexp_replace(url, '%20', ' ')
from tokenized_access_logs;

When we run this, we can see that the REGEXP_REPLACE has done the job :

The logs we’re really interested in are where a specific product has been viewed.
If we can get the name of the product from the url, then maybe that will help when relating this data back to the most popular items in terms of sales…

There is no such option in the Impala equivalent. It simply searches the string from the start and reports the first occurrence.
Fortunately, we want to strip out these results for the stuff we’re going to do in a minute. Therefore, I’ve just amended the query to be :

The Getting Started Tutorial goes on to cover various other tools available in the Cloudera distribution for doing data manipulation and analysis.
Additionally, you may find this presentation on Hadoop internals for Oracle Devs by Tanel Poder useful.

The recent passing of Lemmy has caused me to reflect on on the career of one of the bands who made my growing up (and grown-up) years that much…well…louder.

Yes, I know that serious Python documentation should employ a sprinkling of Monty Python references but, let’s face it, what follows is more of a quick trawl through some basic Python constructs that I’ve found quite useful recently.
If I put them all here, at least I’ll know where to look when I need them again.

In any case, Michael Pailin made a guest appearance on the album Rock ‘n’ Roll so that’s probably enough of a link to safisfy the Monty Python criteria.

I find Python a really good language to code in…especially when the alternative is writing a Windows Batch Script. However, there is a “but”.
Python 3 is not backward compatible with Python 2. This can make life rather interesting on occasion.

The argparse module makes handling arguments passed to the program fairly straightforward.
It allows you to provide a short or long switch for the argument, specify a default value, and even write some help text.
The program is called no_remorse.py and looks like this :

The argparse gives us a couple of things. First of all, if we want to know more about the required parameters, we can simply invoke the help :

python no_remorse.py -h
usage: no_remorse.py [-h] [-a AGE]
optional arguments:
-h, --help show this help message and exit
-a AGE, --age AGE How old are you ? (defaults to 40 - nothing personal)

If we run it without specifying a value for age, it will pick up the default….

python no_remorse.py
I remember when Motorhead had a number 1 album !

…and if I’m tempted to lie about my age (explicitly, as opposed to by omission in the previous example)…

python no_remorse.py -a 39
Who are Motorhead ?

As well as using the single-letter switch for the parameter, we can use the long version …

python no_remorse.py --age 48
I remember when Motorhead had a number 1 album !

One other point to note, the program will not accept arguments passed by positon, either the long or short switch for the argument must be specified. Either that or Python comes with it’s own outrageous lie detector…

There are times when you need a program to run on multiple environments, each with slightly different details ( machine name, directory paths etc).
Rather than having to pass these details in each time you run the program, you can dump them all into a file for your program to read at runtime.
Usually, you’ll pass in an argument to point the program at the appropriate section of your config file. A config file will look something like this :

[DEV]
db_name = dev01
[TEST]
db_name = test01
[PROD]
db_name = prod

In this example, your program will probably accept an argument specifying which environment it needs to run against and then read the appropriate section of the config file to set variables to the appropriate values.

My working example is slightly different and is based on cover versions that Motorhead have done of other artists’ tracks, together with a couple of my favourite covers of Motorhead songs by other bands :

Now, you could spend a fair amount of time trying to figure out how to read this file and get the appropriate values…or you could just use the configparser module…

Conditional Import – making sure you find Configparser

The configparser module was renamed in Python3 so the import statement for it is different depending on which version of Python your using.
Fortunately, Python offers the ability to conditionally import modules as well as allowing you to alias them.
Therefore, this should solve your problem…

So, if we’re running Python 3 the first import statement succeeds.
If we’re running Python2 we’ll get an ImportError, in which case we import the version 2 ConfigParser and alias it as configparser.
The alias means that we can refer to the module in the same way throughout the rest of the program without having to check which version we’ve actually imported.
As a result, our code should now run on either Python version :

try:
import configparser
except ImportError :
import ConfigParser as configparser
config = configparser.ConfigParser()
config.read('covers.cfg')
#Get a single value from the [CORDUROY] section of the config file
cover_artist = 'CORDUROY'
#Find the track they covered, originally recorded by Motorhead
# Pass the config section and the original artist ( the entry on the left-hand side of the &quot;=&quot;
# in the config file
track = config.get(cover_artist, 'Motorhead')
# cover_artist and track are string objects so we can use the title method to initcap the output
print(cover_artist.title() + ' covered ' + track.title() + ' by Motorhead')
# Loop through all of the entries in the [MOTORHEAD] section of the config file
for original_artist in config.options('MOTORHEAD') :
print('Motorhead covered ' + config.get('MOTORHEAD', original_artist) + ' by ' + original_artist.upper())

As with the configparser, there’s no need to write lots of code to open and write to a file.
There are five levels of logging message supported :

DEBUG

INFO

WARNING – the default

ERROR

CRITICAL

There is a separate call to write each message type. The message itself can be formatted to include information such as a timestamp and the program from which the message was written. There’s a detailed how-to on logging here.

For now though, we want a simple program (logger.py) to write messages to a file wittily and originally titled logger.log…

One evening recently, whilst climbing the wooden hills with netbook in hand, I encountered a cat who had decided that halfway up the stairs was a perfect place to catch forty winks.
One startled moggy later, I had become the owner of what I can only describe as…an ex-netbook.

Now, finally, I’ve managed to get a replacement (netbook, not cat).

As usual when I get a new machine, the first thing I did was to replace Windows with Linux Mint…with the immediate result being that the wireless card stopped working.

The solution ? Don’t (kernel) panic, kernel upgrade !

Support for most of the hardware out there is included in the Linux Kernel. The kernel is enhanced and released every few months. However, distributions, such as Mint, tend to stick on one kernel version for a while in order to provide a stable base on which to develop.
This means that, if Linux is not playing nicely with your Wireless card/web-cam/any other aspect of your machine’s hardware, a kernel upgrade may resolve your problem.
Obviously it’s always good to do a bit of checking to see if this might be the case.
It’s also good to have a way of putting things back as they were should the change we’re making not have the desired effect.

What I’m going to cover here is the specific issue I encountered with my new Netbook and the steps I took to figure out what kernel version might fix the problem.
I’ll then detail the kernel upgrade itself.

Machine details

The machine In question is an Acer TravelMate-B116.
It has an 11.6 inch screen, 4GB RAM and a 500GB HDD.
For the purposes of the steps that follow, I was able to connect to the internet via a wired connection to my router. Well, up until I got the wireless working.
The Linux OS I’m using is Linux Mint 17.3 Cinnamon.
Note that I have disabled UEFI and am booting the machine in Legacy mode.

Standard Warning – have a backup handy !

In my particular circumstances, I was trying to configure a new machine. If it all went wrong, I could simply re-install Mint and be back where I started.
If you have stuff on your machine that you don’t want to lose, it’s probably a good idea to back it up onto separate media ( e.g. a USB stick).
Additionally, if you are not presented with a grub menu when you boot your machine, you may consider running the boot-repair tool.
This will ensure that you have the option of which kernel to use if you have more than one to choose from ( which will be the case once you’ve done the kernel upgrade).

It is possible that upgrading the kernel may cause issues with some of the hardware that is working fine with the kernel you currently have installed, so it’s probably wise to be prepared.

Identifying the card

The first step then, is to identify exactly which wireless network card is in the machine.
From a terminal window …

It looks like the penultimate entry is our wireless card.
It is possible to get details of the card you have by using “Intel Corporation Device 3165” as a search term. However, we may be able to get the name of the card by running ….

At this point, we now know that an upgrade to the kernel may well solve our wireless problem. The question now is, which packages do we need to install to effect the upgrade ?

If you look in the repositories, there appear to be at least two distinct versions of kernel packages, the generic and something called low-latency.
In order to be confident of which packages we want to get, it’s probably a good idea to work out what we have now.
This can be achieved by searching the installed packages for the version number of the current kernel.
We can do this in the terminal :

As an alternative, you could use the graphical Synaptic Package Manager.
You can start this from the menu ( Administration/Synaptic Package Manager).

Now we know what we’ve got, the next step is to find the kernel version that we need…

Getting the new kernel packages

It may well be the case that the kernel version you’re after has already been added to the distro’s repository.
To see if this is the case, use Synaptic Package Manager to search as follows :

Start Synaptic Package Manager from the System Menu.
You will be prompted for your password.

Click the Status button and select Not Installed

In the Quick filter bar, enter the text : linux-headers-4.2*-generic

This should give you a list of any kernel 4.2 versions available in the repository.

If, as I did, you find the version you’re looking for, you need to select the packages that are equivalent to the ones you already have installed on your system.
Incidentally, there are a number of 4.2 kernel versions available, so I decided to go for the latest.
In my case then, I want to install :

linux-headers-4.20.0-25

linux-headers-4.20.0-25-generic

linux-image-4.20.0-25-generic

linux-image-extra-4.20.0-25-generic

NOTE – If you don’t find the kernel version you are looking for, you can always download the packages directly using these instructions.

Assuming we have found the version we want, we need to now search for the relevant packages.
In the Quick filter field in Synaptic, change the search string to : linux-*4.2.0-25

To Mark the packages for installation, right-click each one in turn and select Mark for Installation

Once you’ve selected them all, hit the Apply button.

Once the installation is completed, you need to re-start your computer.

On re-start, you should find that the Grub menu has an entry for Advanced Options.
If you select this, you’ll see that you have a list of kernels to choose to boot into.
This comes in handy if you want to go back to running the previous kernel version.

For now though, we’ll boot into the kernel we’ve just installed.
We can confirm that the installation has been successful, once the machine starts, by opening a Terminal and running :

uname -r

If all has gone to plan, we should now see…

4.2.0-25-generic

Even better in my case, my wireless card has now been recognised.
Opening the systray icon, I can enable wireless and connect to my router.

Backing out of the Kernel Upgrade

If you find that the effects of the kernel upgrade are undesirable, you can always go back to the kernel you started with.
If at all possible, I’d recommend starting Mint using the old kernel before doing this.

If you’re running on the kernel for which you are deleting the packages, you may get some alarming warnings. However, once you re-start, you should be back to your original kernel version.

The command then, is :

sudo apt-get remove linux-headers-4.2* linux-image-4.2*

…where 4.2 is the version of the kernel you want to remove.
Run this and the output looks like this…

The following packages will be REMOVED
linux-headers-4.2.0-25 linux-headers-4.2.0-25-generic
linux-image-4.2.0-25-generic linux-image-extra-4.2.0-25-generic
linux-signed-image-4.2.0-25-generic
0 to upgrade, 0 to newly install, 5 to remove and 7 not to upgrade.
After this operation, 294 MB disk space will be freed.
Do you want to continue? [Y/n]

Once the packages have been removed, the old kernel will be in use on the next re-boot.
After re-starting, you can check this with :

uname -r

Thankfully, these steps proved unnecessary in my case and the kernel upgrade has saved me from hardware cat-astrophe.

I’m determined to adopt a positive mental attitude this year.
When the train company explains delays by saying we have the wrong kind of sunshine, I prefer to marvel at the fact that the sun is shining at all in the depths of an English Winter. Let’s face it, it’s a rare enough phenomenon in the summer.
The slow-running of the train caused by this rare natural phenomenon also gives me more time to write this post.
There’s more “good” news – Datapump Import tends to be rather slow when it comes to applying optimizer statistics.
This is because it insists on doing it one row at a time.
All of which provides us with an opportunity to work from home optimize our import job… by not bothering importing the stats.
“Hang on”, you’re thinking, “won’t that mean you have to re-gather stats after the import, which probably won’t be that quick either ?”

Not necessarily. You just need to think positive…

What I’m going to cover here is :

How to save stats to a table

Export without the stats

Import without stats

Applying stats from a table

I’m using 11gR2 Express Edition in the examples that follow.
We’ll start by exporting the HR schema and then import the tables into the HR_DEV schema.

As there are overhead-line problems in the Watford Junction area, we’ve also got time to choose between running the datapump export and import on the command line or via the DBMS_DATAPUMP package.

Saving Stats to a Table

Let’s start by making sure that we have some optimizer stats on the tables in the HR schema :

I can see that all of the tables in the schema have stats, which is good enough for my purposes here.
If you find that the LAST_ANALYZED value is null for the tables in your database, or if you just decide that you want to take a less cavalier approach to the relevance of your Optimizer stats, you can update them by running :

begin
dbms_stats.gather_schema_stats('HR');
end;
/

Now we know we’ve got some stats, we need to save them to a table. This process is made fairly straightforward by DBMS_STATS. To create an appropriately structured table in the HR schema, we simply need to run :

Now for the final touch, apply the stats that we have in the EXP_STATS table. Should be easy enough…

Applying stats from a table

If we were importing into a schema of the same name as we saved stats for, this would be straight forward.
However, in this case, we’re importing into a different schema – HR_DEV.
Therefore, if we want to avoid “leaves-on-the-line”, we need to do a little light hacking.

To make things a bit clearer, let’s have a look at the contents of the C5 column of our EXP_STATS table…

Yes, the table owner (for that is what the C5 column contains) is set to HR. This is reasonable enough as it was the stats for this schema which we saved to the table in the first place. However, this means that the stats will not be applied to the tables in the HR_DEV schema unless we do this…

Whilst importing stats separately does entail a few more steps, it does mean that there is rather less hanging around for datapump import to do it’s impression of a train trying to get through “the wrong kind of snow”.

Last year we got a kitten. Little Cleo was full of the joys of…well…being a cat. Then, one day, she just disappeared.
Several months later, having given up hope of ever seeing her again, we adopted Nutmeg.
Then, just before Christmas, Cleo suddenly re-appeared.
It’s a complete mystery as to where she had been for the last year and she has not condescended to comment on the matter.
The end result is that we are now a two cat family.
This brings with it certain complications.
When they aren’t studiously ignoring each other, the cats sit there giving each other hard stares for hours on end.
I think there may be some tension over exactly just who owns that fluffy ball.
To ensure that our sleep is not disturbed by these two fighting like cats in a sack, it’s necessary to ensure that they are in separate rooms before we retire for the evening.
As a result we’ve become rather expert at the art of Cat Herding, which largely consists of bribery with cat-nip, among other things.

Whilst acquiring a reputation as a “dealer” among the feline population of Milton Keynes, I have had cause to reflect on the similarity of our new hobby with the trials and tribulations of persuading DBMS_METADATA.GET_DDL that you do actually have permissions to see the source code you’ve asked for.

This is regularly a fairly tortuous process. In what follows I will be examining just why SELECT_CATALOG_ROLE is DBMS_METADATA cat-nip…and why SELECT ANY DICTIONARY isn’t.
I’ll also look at how you can stop chasing your tail and get this function to behave itself when invoked from within a stored program unit…

“Query any data dictionary object in the SYS schema. This privilege lets you selectively override the default FALSE setting of the O7_DICTIONARY_ACCESSIBILITY initialization parameter.”

Some explanation is required here. Prior to Oracle 8, a user with the SELECT ANY TABLE privilege had access to any table in the database. From 8 onwards, this privilege was restricted to exclude the Data Dictionary. This is where SELECT ANY DICTIONARY came in.

Before we go any further, let’s just make sure that the O7_DICTIONARY_ACCESSIBILITY parameter is not set to TRUE…

By default, EXECUTE on the DBMS_METADATA package are granted to PUBLIC. So, invoking the GET_DDL function for the same program unit should return the DDL statement required to re-create it. We already know we have access to the source so this should be no problem, right ?

set long 5000
select dbms_metadata.get_ddl( schema => 'HR', name => 'ADD_JOB_HISTORY', object_type => 'PROCEDURE')
from dual
/
ERROR:
ORA-31603: object "ADD_JOB_HISTORY" of type PROCEDURE not found in schema "HR"
ORA-06512: at "SYS.DBMS_METADATA", line 4018
ORA-06512: at "SYS.DBMS_METADATA", line 5843
ORA-06512: at line 1
no rows selected
SQL>

Has Oracle forgotten about this procedure ? Maybe it’s rolled under the sofa ?
Either way, further investigation would seem to be in order…

To try to determine what’s happening, we could do some tracing and look through the recursive statements to see which precisely what is causing the error. However, there is a quicker way.
Let’s start by looking at the comments in the DBMS_METADATA package header :

-- SECURITY
-- This package is owned by SYS with execute access granted to PUBLIC.
-- It runs with invokers rights, i.e., with the security profile of
-- the caller. It calls DBMS_METADATA_INT to perform privileged
-- functions.
-- The object views defined in catmeta.sql implement the package's security
-- policy via the WHERE clause on the public views which include syntax to
-- control user access to metadata: if the current user is SYS or has
-- SELECT_CATALOG_ROLE, then all objects are visible; otherwise, only
-- objects in the schema of the current user are visible.

Unless you are either the owner of the stored program unit you’re trying to retrieve, or connected as USER_ID 0 ( i.e. SYS), the only way that you’re going to get anything back from a query on this view is if you have been granted a role called SELECT_CATALOG_ROLE.

It would seem then, that only a user granted SELECT_CATALOG_ROLE can usefully use DBMS_METADATA.GET_DDL in a stored program unit to retrieve DDL for objects not owned by them.

Something you might want to consider at this point, is that SELECT_CATALOG_ROLE is itself granted the HS_ADMIN_SELECT_ROLE role. Between them, these roles have SELECT access on quite a large number of SYS objects :

At this point, you may well ask if there is any way for users to utilise our function without having this role granted.
After all, a common approach to application security is to bestow execute access to users on a stored program unit without them having any visibility of the underlying tables and views.

Well, there is…

Doing without the SELECT_CATALOG_ROLE

Before we go any further I think I should point out that there are several issues with creating objects in the SYS schema.

The objects in the schema are effectively part of the Oracle software. The schema can be seen as being analogous to root on a *nix system.

There are various admonitions against performing DDL in the SYS schema. These include

I suppose the best reason for avoiding this sort of thing is that it would only take one moment of inattention to cause potentially irreparable damage to your Oracle installation. Even with a backup re-installing Oracle is no trivial matter. Therefore, I strongly suggest that you consider carefully whether the benefits of the approach I’m about to take here outweigh the issues of granting SELECT_CATALOG_ROLE.

Getting it up and running was a bit more of a challenge than I had anticipated.
The Pi uses HDMI for Video output and my ageing monitor is not equipped for HDMI…

The best program on TV – NOOBS doing it’s thing.

In the end, I had to “borrow” the TV.
This arrangement was, of necessity, extremely temporary. The TV had to be back in it’s usual place ready for The Strictly-TOWIE-Dancing-Get-Me-Out-Of-Here Christmas Special, on pain of pain.
Therefore, my first Pi project was to connect to it remotely from another machine, namely, my Linux Mint Laptop.
This will enable me to run the Pi headless (i.e. without a monitor/keyboard/mouse attached to it).

I’m going to cover two different methods of connecting to the Pi.
The first is using ssh to connect to the command line.
The second is to connect remotely to the Raspbian desktop itself.

Just to avoid any confusion, I will be referring to the Raspberry Pi as “the Pi” and the machine I’m connecting from as “Mint”.

About the Environment
The Pi

The Pi I’m using for this is running the Raspbian Jessie OS.
It is set to start the desktop on boot.
The Pi is up and the desktop is running.
The Pi is connected to the router via a network cable.

The Mint Machine

The version of Mint I’m running on is 17.2.

The Network

I’m running on a standard home network with all devices connecting to a router.
The router assigns IP addresses to each connected machine dynamically via DHCP.
Apart from the router itself, no device on the network has a fixed IP address.

SSH – Setup

We need to perform these steps on the Pi.

First of all, we need to know what the machine name of the Pi is.

As the devices that connecting to the network are dynamically allocated an IP address it’s simpler to address a specific machine by name.

So, on the Pi, open a terminal and type :

uname -n

This returns the name of the computer. In my case :

raspberrypi

The next thing we need to do is to make sure that the Pi will accept connections via SSH.

On the Raspbian Desktop, click on the Menu and select Preferences/Raspberry Pi Configuration

Let’s face it, this is the closest I’m going to get to fresh fruit and veg for the next couple of weeks.

Next click on the Interfaces tab and make sure that ssh is enabled.

Once the steps have been completed, we’re ready to test…

SSH – from the remote machine

We want to connect to the Pi from a remote machine as a user that exists on the Pi.
Note that this user does not need to exist on the remote machine.

When we run the ssh command, we need to specify the user we’re connecting as, and the name of the machine we’re connecting to (i.e. the Raspberry Pi itself).

I’m going to connect as the user pi. We’ve already found out that the name of the Raspberry Pi is “raspberrypi”.

So, I just need to open a terminal window on my remote machine and type :

ssh pi@raspberrypi

The first time you ssh to another computer on your network, you’ll get this warning :

The authenticity of host 'raspberrypi (192.168.1.144)' can't be established.
ECDSA key fingerprint is 03:72:d9:84:58:c8:a6:cc:37:bc:c3:47:8f:1c:90:e0.
Are you sure you want to continue connecting (yes/no)?

Type “yes” and Hit Return…

Warning: Permanently added 'raspberrypi,192.168.1.144' (ECDSA) to the list of known hosts.
pi@raspberrypi's password:

Enter the password and…

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

Whilst ssh is handy if you just need command line access to the Pi, if you want access to the Desktop, you’ll need to try something a bit different.

Using RDP to run the Pi Desktop Remotely

Now, there is more than one way to do this. I’ve chosen to use RDP as it’s quite simple to setup.

Installing xrdp on the Pi

To start with, we need to install the xrdp package on the Pi. At this point, you can either do this on the machine itself ( by opening a Terminal window), or connect via ssh.
Either way, the command you need to enter is :

sudo apt-get install xrdp

You will be prompted for your password and should then get some output that looks like this :

Once that little lot has scrolled up your screen, you can exit the session ( just type “exit”).
There are some guides which suggest that you need to re-boot the Pi at this point. I found that this was not necessary. However, if things don’t quite work as described from this point on, it may be worth doing this. After all, “Have you tried turning it off and on again ?” is a cliche for a reason !

It’s probably worth mentioning that, at this point, you should be able to connect from any Windows ( Windows 7 or above) remote machine using the built-in Remote Desktop app.
That’s not we’re after though. Oh no. We want to be able to do this from Mint…

Installing rdesktop on Linux Mint

Back on Mint, open a Terminal window and…

sudo apt-get install rdesktop

Once again you should be prompted for your password ( remember this is for your user on the Mint machine, not on the Pi). You should then see something like …

Again, I found that no re-boot was required on the Mint machine. Once again, it might be different for you.

Either way, we should now be able to prove that the Pi will run happily without all of those wires plugged into it…

Connecting via rdesktop

…but I’d recommend keeping it all plugged in until you’ve tested first.

To connect from Mint, open a Terminal window and type :

rdesktop raspberrypi -u username_on_pi -p password -g 90%

…where username_on_pi is the name of the user on the Pi and password is the password for that user.

The -g switch tells rdesktop what size to create the window ( in our case, 90% of the screen size on our main machine)

You should now see (after a short interlude whilst the desktop is rendered)….

Alternatively, if you want to run in a full screen, you can use the -f switch for rdesktop instead – i.e. :

rdesktop raspberrypi -u pi -p raspberry -f

Once you’re in full screen mode, you can toggle between full screen and window mode at any point by pressing CTRL+ALT+ENTER.
It’s worth noting if you do this and then minimize the rdesktop window, when you maximize again, desktop can appear to be blank and or the re-drawing might not be complete. I’m not sure why this is, or what the fix is.

One other point to note, it’s more secure to run rdesktop without specifying the password. In this way, you’ll be prompted for it when you connect.
So, if you run…

rdesktop raspberrypi -u pi -g 90%

…you will be presented with

A Python Program to automate connection

To save us typing in the rdesktop command each time we want to connect to the Pi, we could write a simple bash script to automate our rdesktop command. However, in the circumstances, Python seems a more appropriate medium…

The program is saved as rdp_to_pi.py.
At the time of writing, the default Python version on Mint (17.2) is 2.7.6 ( although you can invoke a Python 3.4.3 interpreter by typing python3 at the prompt).
Therefore, this program is written for Python v2.

The first line of the program tells Linux to use the Python interpreter when this program is executed.

The program then following :

validates that it’s been passed a sensible argument value on the command line for the screen size percentage

builds the appropriate rdesktop command line using “constant” value for the machine name for the Pi and the name of the user to connect as

executes the command

To run the program, you first need to set the appropriate file permissions…

chmod u+x rdp_to_pi.py

…and then run it using a single “.” followed by a “/” ( not the usual “. ./” to run a bash script)…

./rdp_to_pi.py 90

Keyboard Mapping Issues

There are some other issues as well, most notably, the Pi seems to have forgotten where it is and has somehow adopted settings for a US keyboard.
If you want to test this and happen to have a UK keyboard, try opening a text editor and typing any of the following :

” – will print @

@ – will print “

£ – will print #

| – will print ???

~ – will print |

I’ve tried various things to fix this, but to no avail.
Despite the fact that both the Pi and the remote machine are configured with a UK keyboard, rdesktop seems to ignore this and insist on using US keyboard mappings.
I suspect that this is something to do with an X-Server configuration setting somewhere but I just can’t figure out where.