Blogroll

SSAS: Kerberos kills ‘large’ MDX queries on Windows Server 2008

Kerberos is nasty! I have had several encounters with Kerberos and none of them have been pleasant. I try to avoid Kerberos like try to avoid the dentist. However, some projects demand that Kerberos authentication be used to cater for the 'third hop' and then the pain begins.

The symptoms of my latest encounter were somewhat odd. Local MDX queries worked fine, but queries performed over the network failed with the message:

Executing the query ...The connection either timed out or was lost.Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.An existing connection was forcibly closed by the remote hostExecution complete

Of course the error message does not tell us anything useful. Just that someone does not like us today!

At first we thought there was a fault with the MDX query. But no, the MDX query ran fine against the local development server and also ran fine against the production server when run in SQL Server Management Studio in local mode (i.e. not across the network). But as soon as we ran the query across the network, we got the error.

Suspecting a hardware fault on the brand new server, the IT support team checked out the memory and changed the network cards. No change; the queries still failed.

Anyone familiar with SQL Server Analysis Services (SSAS) knows that it uses a Windows-based security model (i.e. the user must have Windows account in order to connect to the OLAP database). Unfortunately, this becomes a problem when the user's credentials need to be passed from machine to machine before finally arriving at the OLAP server. Basically the NTLM protocol does not support delegation across more than two hops.

To cater for the third 'hop' you have to co-opt the help of the 'dark side': i.e. Kerberos and SPNs. Servers have to be 'registered for delegation' in Active Directory and SPNs have to be set up to configure Kerberos. This is a black art and my colleague and I have the battle scars to prove it!

Our configuration was not unusual. Users connected to a web server running SharePoint 2007 and SQL Server Reporting Services (SSRS). SSRS then queries SQL Server Analysis Services (SSAS) database which resides on the backend 'data server'. The complication occurs when users connect over the internet; they are authenticated by Microsoft ISA server which passes their credentials on to the web server which in turn passes their credentials to SSRS which in turn passes their credentials on to SSAS.

Now, this infrastructure was working fine in the current SQL Server 2005 / Windows Serve 2003 implementation. However, the new SQL Server 2008 / Windows Server 2008 implementation was having trouble with a single hop, so we were in big trouble!

Suspecting a corrupt installation of SSAS, we did a re-installation of the whole SQL Server 2008 suite, but the MDX queries still failed. So next we did a fresh install of the Windows Server 2008 operating system along with SQL Server 2008, but the MDX queries still failed!

At this point we still believed that all MDX queries failed. But then I observed that the filters on the SSRS reports were been populated, whereas running the main report query generated the same ugly error message.

Been convinced my MDX was at fault, I started a trace of activity on the Analysis Services service which showed that small MDX queries worked fine (all the relevant log entries were present) whereas, large MDX queries failed and the only log entry was an "Audit Logout notification" event.

I started to play with the MDX and found that if I padded out a very simple query with comments it started to fail around the 1096 character mark i.e. simply adding /******/ until the whole statement was over 1kB caused the query to fail! Clearly something major was going wrong!

Still believing it to be a hardware or software implementation error, we proceeded to install SQL Server 2008 OLAP database on another identical box. No problem! MDX of any size worked fine. So it must be a hardware or software implementation fault????

After much scratching of his head, my good friend and colleague Eric Moutell eventually announced that the only difference between the two machines is that one had been set up with SPN entries to allow Kerberos authentication. So he deleted the entries, rebooted the world, rebooted the world again and finally any size MDX query worked on the original box. Ah ha! We were getting somewhere. So now we knew that nasty old Kerberos was having a laugh at our expense!

We got in contact with Microsoft support and after sending detailed information about the bug and several gigabytes of server logs, they eventually admitted we had found a bug in Windows Server 2008 which they have agreed to fix!

For those of you experiencing a similar problem, I reproduce the email from Microsoft support which you may find useful. In our environment, the server is running Windows Server 2008 64-bit with Microsoft SQL Server 2008 with the latest cumulative update patch CU3. The client machines we tried were either Windows Server 2008 64-bit or 32-bit running Microsoft SQL Server 2008 CU3.

Escalation engineers have confirmed that is a problem on Kerberos.dll related to encryption with AES on Windows 2008 and Vista. They have reported to the Product Group and, after analyzing it, they have accepted to fix it.

The estimated time for the fix is May (it is required two month cycle hotfix test pass).

I am sending you the Workarounds if you cannot wait until May:

If Kerberos authentication is a requirement, run Analysis Services on a Windows 2003 Server, since Windows 2003 Server is not AES aware.

Use Windows 2003 Server, Windows XP, or Windows 2000 Server to run client applications that will be connecting to the Analysis Server configured for Kerberos authentication and running on Windows 2008. Since these operating systems are not AES aware, that will avoid use of AES for encryption/decryption.

Avoid configuring the Analysis Server to use Kerberos Authentication, since this will result in the AES not being used for encryption/decryption

Add ";SSPI=NTLM" or ";Integrated Security=SSPI" to the connection string, which will force use of NTLM and avoid use of AES for encryption/decryption