Saturday, August 10, 2013

This article is written in Portuguese and English (original article here)
Este artigo está escrito em Inglês e Português (artigo original aqui)English version:

Introduction

It should not be a surprise to anyone that I'm a big fan of PAM (plugable authentication modules). I've written several articles about it in this blog. This time I'll pick up a Google project and show you how to glue it together with Informix to achieve more security for your connections. And this happens thanks to PAM obviously.Google authenticator is a project that implements one time password (OTP) generators for several mobile platforms. In practice you'll use your smartphone as a token generator. This avoids the need of a specific piece of hardware and has the advantages of being open source and making available a server side PAM library. Meaning you can integrate it with anything that supports PAM. This means your SSH daemon and naturally your favorite database server software.
Google authenticator can be seen and used as a component of a multi-factor authentication mechanism. This implies that a user must present (and I quote) "two or more of the three authentication factors: a knowledge factor ("something the user knows"), a possession factor ("something the user has"), and an inherence factor ("something the user is"). This article will show you how to configure Informix for two factor authentication. In our scenario we'll use a traditional password (something the user knows) and Google's authenticator as the second factor (something the user has). In the future, if the rumors about the introduction of biometric readers (like finger print readers) on mobile phones becomes a reality, it may be possible to extend this to three factors (something the user is).
It's possible that we'll see more services using this kind of technology. Just recently Twitter introduces two factor authentication by sending a request to their app when you try to login in their web site. The user will need to authorize that connection by using the twitter app on an authorized phone. Essentially this implements the same concept, but in an easier way.

Setup

We have the PAM library and the google-authenticator binary that we'll use to generate a secret key for our user. So this is the next step. I'll not do this with user "informix", because the engine will always ignore PAM for informix user when connecting locally. But before, let's check the other fundamental component of the solution: Your smartphone. Currently I use Android, but the app is available to iOS and Blackberry. You should know how to find and install the app. For Android devices it's available on GooglePlay store. The app for Android and iOS supports both reading a QR code or manual introduction of the secret key generated by the google-authenticator binary. For Blackbery, according to the Google Authenticator website, it only supports manual introduction.

Assuming the app is properly installed, we can proceed with the server side configuration. As mentioned above, the next step is to generate the secret key between the server account and the mobile app. For that we use the google-authenticator binary installed before. When we run it it outputs the information needed and also makes some questions. I will not dig into those as you can find out more in the documentation:
-bash-3.2$ google-authenticator

Do you want to disallow multiple uses of the same authentication
token? This restricts you to one login about every 30s, but it increases
your chances to notice or even prevent man-in-the-middle attacks (y/n) n

By default, tokens are good for 30 seconds and in order to compensate for
possible time-skew between the client and the server, we allow an extra
token before and after the current time. If you experience problems with poor
time synchronization, you can increase the window from its default
size of 1:30min to about 4min. Do you want to do so (y/n) n

If the computer that you are logging into isn't hardened against brute-force
login attempts, you can enable rate-limiting for the authentication module.
By default, this limits attackers to no more than 3 login attempts every 30s.
Do you want to enable rate-limiting (y/n) y
-bash-3.2$

It shows us a URL and the key. If we have a library installed (libqrencode) it will also show a QR code that we can use directly with the phone. If we open the URL in a browser it will show a QR code that we can use in the app to install the key. Otherwise we need to introduce the key manually. Either way it's pretty simple and in a minute you'll have a token (or one time password) generator on your mobile phone:

After this we need to go and configure our informix instance to take advantage of this. For that I'll create a new listener port using PAM. As you probably already know, this is done by altering the $INFORMIXSQLHOSTS file and add a line similar to this:
tpch_pam onsoctcp kimball.onlinedomus.net 1527 s=4,pam_serv=(ids_pam_service),pamauth=(challenge)

Field by field:

tpch_pam
The INFORMIXSERVER name for this alias

onsoctcp
The protocol to be used

kimball.onlinedomus.net
The hostname

1527
The unique TCP port number for this listener

s=4,pam_serv=(ids_pam_service),pamauth=(challenge)options field. s=4 forces PAM usage. The PAM service name is "ids_pam_service" and the PAM mode will be "challenge"

After the $INFORMIXSQLHOSTS we must make sure that the name "ids_pam_service" is configured in the PAM configuration. Being Linux, this means having a file called ids_pam_service in /etc/pam.d. The content of the file will be:
auth required pam_unix.so
auth required pam_google_authenticator.so
account required pam_unix.so

This will be used for our "basic by the book" configuration. Later I'll explain some issues with it, and will show you a better way to use it. As mentioned in other articles we just need "auth" and "account" configuration lines. And we're "stacking" modules (Google and pam_unix) so that we achieve the "two" factor authentication. First we'll test the unix password and then the Google's authenticator token or code (the one time password)

Next we need to make sure "tpch_pam" is configured in the $INFORMIXDIR/etc/$ONCONFIG file in the parameter DBSERVERALIAS and after that we can start the listener with:
onmode -P start tpch_pam

Firts impression is good. It works. But there's something weird. If you take a close look you'll see that once I do a CONNECT, dbaccess asks me for "ENTER PASSWORD:". Then it asks me for "Password:" and finally for "Verification code:". Three prompts... I hope you understand two of them but not the repeated request for password. Let me explain. By default, whenever we try to CONNECT dbaccess will ask for a password. That's the first prompt. Actually, with this configuration (challenge mode) we can enter whatever we want here... it will be ignored. Than we start the PAM stack layer. And the first module is pam_unix.so. Since we're not sending it any password (more about this later) it asks us for the password ("Password:" prompt). Here you have to write the system user password. After that we move to the second module, the Google authenticator module and it behaves the same way. Since it doesn't have a password it asks for one ("Verification code:" prompt). And here we need to take a look into our smartphone and copy the current generated token (or one time password). After that we're logged in. Both passwords were verified (one for each module in the stack).
This behavior justifies the "challenge" mode. The module sends a challenge back to the engine, and the engine send it back to the client (dbaccess in our case) and the client must have registered a callback function to handle the challenges and user responses. Basically what it does in dbaccess is to echo the challenge and read the response, and finally sending it back to the engine which sends it back to the module for verification.
Although it works, it's a bit ackward and needs a client side function to handle the challenges. This means we could possibly have to change the application. dbaccess already knows how to handle it, but other clients wouldn't know what to do.
Informix APIs have functions to register a callback function. But again, code changes are never welcome. So, let's see what we can do...

Improvement

As you probably know (I mentioned it in previous PAM articles), Informix pam can be configured in "password" or "challenge" mode. The documentation sometimes leads us to think that for this kind of usage we need to use challenge. But in fact it really depends on the modules you use and the options they provide. In our case I noticed that Google's authenticator module supports two interesting options:

try_first_pass
makes it check the PAM framework for a previously supplied password

forward_pass
makes it smart... if you provide a password composed by the concatenation of the system password and the verification token, it will try to split them, verify the code and send the rest through the PAM stack of modules

I also noticed pam_unix.so supports both try_first_pass and use_first_pass. So, what all this suggests is that assuming we have a system password like "mypasswd" and a verification code like "096712" we could use a composed password "mypasswd096712", give it to the dbaccess password request, and don't be bothered by each module prompts. let's change the configuration and test again. The file /etc/pam.d/ids_pam_service becomes this:
auth required pam_google_authenticator.so try_first_pass forward_pass
auth required pam_unix.so use_first_pass
account required pam_unix.so

And voilá.... The first module now is Google authenticator. It gets the double password from the stack, extracts it's part (it knows it's the last 6 digits), verifies it, and sends the rest to the second module (pam_unix) that validates the password in the system

Now... we've seen that dbaccess does some magic... Because it knows we're dealing with a PAM port. So, to be absolutely sure this is transparent for the applications, let's try an external JDBC connection that has no knowledge that it's a PAM enabled port:

Success!
There not much more we can do. It's basic and very simple to setup. It shows Informix flexibility. And because it's PAM you can of course enrich it with additional modules if you like

Considerations

There are many things to note about this subject. First we could wonder about the usage cases for something like this. A few ideas come to mind:

added security for privilege users. You could assume that applications only connect trough a safer network, using normal authentication, but DBSAs may need to connect from the "external" world and it requires added security

You can use it to construct a "double" password and have part of if available to the application code and let the user introduce the verification code. This would prevent a user to authenticate from outside the application (because the user would never know first password components, but on the other hand an application manager would not be able to impersonate the user even if he got to know the user password)

Other extended uses could be achieved by tweaking the module code.

It's important to keep in mind that as with any security related component you should consider carefully what you need and discuss the possibilities with security conscious people. I did a sort of brain storming with two ex-Informix DBAs that now work in the security team and some interesting points were raised. Among them, here are a few:

This sort of token generator as opposed to specialized hardware like RSA tokens

Possible advantages of this method:

You probably notice you lost your smartphone faster than if you lost a specialized token

It's cheaper

If the device needs renewal, this method looks simpler (a user can do it once he gets the new phone)

You don't depend on any external supplier

Possible disadvantages of this method:

It seems easier to remotely "hack" a smartphone than to compromise the security of a specialized hardware token

The application could provide some security measure to prevent unauthorized access to the generated codes to anybody who has physical access to the phone. Note this also happens on the hardware token, and with a phone you could always protect it with PIN or pattern code. This is not exactly a disadvantage comparing to hardware tokens, but could be something to improve

This would be hard to use for non-interactive processes. Unless, and I believe this could be a possibility, that we work the other way around... Meaning we have the code generator inside the applications servers, and that we setup a callback function to answer the module challenge. This would possibly avoid the usage of the application user from outside the application server environment

A generic issue with any two factor authentication mechanism is that ideally the second factor should use a different communication channel from the first. That doesn't happen here, and this allows for man in the middle attacks

Another point, which can be related to the one before, is the possibility to use, or not, the same token in a certain time interval (even a short one). The codes generated by Google authenticator are valid by default during 30 seconds (to allow some time for the user to introduce the code and also to compensate for small clock differences between client and server). The module allows that the code before and after the correct one to be used, so the time interval becomes 1m30s. But all this can be configured, and we even have the possibility of not allowing the same code to be used twice. This however will limit the ability of making more than one login each 30 seconds

Issues found

During the preparation and testing for this article I've faced two main issues:

On first attempts I got the "Invalid verification code" error from the module. As the documentation says this is usually caused by clock synchronization issues between the server side and the mobile app side. After some checking I've found that I was mixing clock time with timezone offsets and it caused too much difference (the module allows configuration for some small difference)

As usual with PAM and PAM modules the hardest part was debugging. Most modules tend to be very quiet about the errors. Not sure why, but on first attempts of concatenating user password with the verification code I was attempting the wrong order (code + password) instead of the proper order (password + code). I ended up looking into the code and changing it to be much more verbose.

Another possible issue is that at the moment, there is no App for Windows phones. And apparently they are grwoing in the market (according to recent studies, they shipped more units than Blackberry)

Acknowledgements

It's not unusual that I discuss some aspects while working on new articles. I recall having some help from several IBM colleagues, some of them already mentioned here once or twice. In this case I had the pleasure to discuss this subject with two ex-Informix DBAs, with whom I worked for several years on a customer team. Now they're working in the security team and the first time I heard about Google authenticator was during a chat with one of them. During the writing of this article and before it was published we interacted a couple of times. Several aspects of the post should be credited to them. So for that, and for the good time we spent working together (we still do occasionally because they have a long history in this customer and there are always subjects where we can work together) a big "Thank you!" to Daniel Valente and Rui Mourão.

Do you want to disallow multiple uses of the same authentication
token? This restricts you to one login about every 30s, but it increases
your chances to notice or even prevent man-in-the-middle attacks (y/n) n

By default, tokens are good for 30 seconds and in order to compensate for
possible time-skew between the client and the server, we allow an extra
token before and after the current time. If you experience problems with poor
time synchronization, you can increase the window from its default
size of 1:30min to about 4min. Do you want to do so (y/n) n

If the computer that you are logging into isn't hardened against brute-force
login attempts, you can enable rate-limiting for the authentication module.
By default, this limits attackers to no more than 3 login attempts every 30s.
Do you want to enable rate-limiting (y/n) y
-bash-3.2$

Monday, August 05, 2013

This article is written in English and Portuguese (original article is here)
Este artigo está escrito em Inglês e Português (artigo original aqui)

English version:

Another
recent customer situation triggered an investigation that may be
helpful to others. We had a system with two instances, but only one was
working. Database configuration was almost equal to other system, but
the query performance was horrible in comparison. Sessions status had
lots of "IO Wait", but also lots of "yield bufwait". All the monitoring
with OS tools showed that we were facing severe IO performance issues
(disks showed up at 100% utilization and data throughput was much lower
than on similar systems).

So, the DBA team interacted
with the system administration team and they identified several hardware
and configuration issues that caused severe IO bottleneck. But even
considering that those issues were being addressed by the system admin
team, the DBA team was still concerned with the fact that we were seeing
bufwaits. Accordingly to the manual, bufwaits means our session was
trying to access a buffer which was locked by other session. But we
could get those bufwaits even when only one query was running. So it was
a bit strange.
As usual, if something doesn't look obvious we
have to follow the details. First step was to run "onstat -b" during the
query execution and try to find who is the owner of the buffer that we
want to access. The output was similar to this:IBM Informix Dynamic Server Version 11.50.UC9W2X1 -- On-Line -- Up 00:04:28 -- 111240 Kbytesaddress userthread flgs pagenum memaddr nslots pgflgs xflgs owner waitlist4421ec80 0 84 3:230791 4440e000 87 801 80 ffffffff 44d0bc90

The
pagenum was constantly changing, but the owner was always "fffffff".
Needless to say we don't have any thread with a rstcb address equal to
"ffffffff". So we were stuck. But a bit of internal IBM investigation
thought me that this means we're waiting for an IO (AIO or KAIO) thread.
This didn't answer our question: If we're waiting on IO why doesn't it
always show "IO Wait" but instead shows "yield bufwait"?

Well
it appears the answer is pretty simple and it's related to the read
ahead functionality. After some trial, I noticed that if I reduce the
RA_PAGES and RA_THRESHOLD parameters, the thread will show much more "IO
Wait" than "yield bufwait". If I increase it I get the opposite.
Apparently the engine considers read ahead for the query. And because
the disk access is slow, some buffers are consumed by the sqlexec before
the IO request completes. So when the sqlexec thread identifies the
buffer slot that it needs, it's already in the process of being read
from the disk.
Later one of the DBAs used this to make some
benchmarking about the best RA_PAGES and RA_THRESHOLD values to use. It
was around 128 and 100 respectively. But beginning with 11.70 you should
be better using AUTOREADAHED functionality instead.

Friday, August 02, 2013

English version:
Recently I've been asked to setup some connection redirection for Informix clusters. In most situations we'll want to use the connection manager, but in this post I'm not going that deep. I'll just review something that is also related to the use of connection manager, but that has a lot more uses and that I've found many customers are not aware. I'm referring to the INFORMIXSQLHOSTS file options field. You can find the full documentation here in the InfoCenter. I'll just make some remarks about several options that can be extremely useful, and many people never thought about using.

r option (0|1)Only used on the client side. If it's set to 1, the client will look for .netrc file in the user's home directory that allows you to specify a user and password to connect to the database server

6 restricts the port for replication connections only (HDR, RSS and ER)

k option (0|1)Activates (1) the TCP functionality of keep alive. Note that the keep alive interval, timeout and retries must be setup at the OS level. This can be very important when there are firewalls between the clients and the servers. In some cases, if the functionality is not activated (which can be controlled at the OS level), the firewalls may break the connection and an idle client will receive an error when it attempts to send a query after a long period of inactivity

As you can see on the documentation there are many more. But for the purpose of this post, I'm particularly interested in the group related options. The concept of groups was created for Enterprise Replication many years ago but we can use them for other purposes. Let's first see how to create a group:

So, here I have a group called "my_group", defined because I've use "group" in the protocol field. I don't use the hostname and service name or port (so I replace them with a "-") and I use two options:

i=100This is just a group id.We can use any unique number for each group

c=0This controls the way the clients trying to access INFORMIXSERVER=my_group handle the several servers in the group. This option can be setup to "0" or "1":

0 means that it will always try to connect to the first server, and if it's not available it will try the second one and so on

1 means it will randomly choose one of the servers that belong to the group and start trying from there until it finds one server available

Now, how is this useful? I'll give you two usage scenarios. First one is a cluster with primary server, an HDR server and an RSS server. Both secondary nodes are in read-only mode, we don't plan to change the roles between the servers, and naturally the chances that the HDR is more up to date than the RSS is higher. In this case we may want to redirect read-only clients preferably to the HDR instance, but in case of failure or planned maintenance we want the clients to be shifted to the RSS server. The following INFORMIXSQLHOSTS will do the trick:

And then just use INFORMIXSERVER=my_read_only_group. Clients will attempt to connect to my_hdr_instance first, but it that fails they'll retry the connection against my_rss_instance.
And there you have it... The most simple client redirection you can find. This has advantages and disadvantages:

It works for clients with very old versions that can't handle connection managers in REDIRECT mode (maybe in a future post about connection manager I'll explain this)

It will not work if you change your instances roles (like primary to secondary and vice-versa)

Although you could set "c=1" to randomly distribute the connections between the two servers, it would not allow more sophisticated redirection rules allowed by the connection manager

Let's now see the second scenario. For improved flexibility you want to implement connection manager. But you know that once you do it, it will be a single point of failure. If it's down, you won't be able to connect to your servers, even if there is nothing wrong with them. Naturally the solution would be to setup two connection managers on different machines. But then how would you configure your clients? Well, very similar to the example above. But this time, we would have two listeners (one in each machine where the connection manager is running) and we'd want the clients to see and try to connect through both of them. So, let's assume we create the cm_1 and cm_2 services on two different hosts using the connection manager. In each of them cm_* will be an SLA that will redirect for the secondary servers. So, our client side INFORMIXSQLHOSTS would be:

Our clients would be setup with INFORMIXSERVER=my_read_only_group and they would try each connection manager randomly. After reaching a connection manager service, they would be redirected to the appropriate secondary server. The configuration on the CM will not be covered in this post.

There is only one more piece of information that may be important. By default, Informix clients wait very long (60 seconds) for a connection attempt. When you're dealing with connection redirection you probably don't want to waste so much time.... Let's face it, if a server does not reply within 10s (I'm accommodating DNS problems and so on) it will never answer. We can establish two environment variables to configure this:

INFORMIXCONTIMEThe timeout the client will wait for a connection establishment (in seconds)

INFORMIXCONRETRYThe number of attempts the client will do within the INFORMIXCONTIME period

About Me

I'm an IBMer and I've been working with IDS since I joined Informix in 1998.
The ideas and opinions expressed in this blog are personal and in no way represent IBM positions, strategy or opinions.
I chose to write this blog in English so that I could reach the maximum number of Informix users. Take notice that English is not my native language, so there are probably many mistakes.
I appreciate any comments, corrections and topic sugestions.
I can be reached at domusonline at gmail dot com.