The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode.

The revocation function was unable to check revocation because the revocation server was offline.

. ---> System.ServiceModel.Security.SecurityNegotiationException: The X.509 certificate CN=servicebus.windows.net, OU=WindowsAzure, O=Microsoft, L=Redmond, S=WA, C=US chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. The revocation function was unable to check revocation because the revocation server was offline.

---> System.IdentityModel.Tokens.SecurityTokenValidationException: The X.509 certificate CN=servicebus.windows.net, OU=WindowsAzure, O=Microsoft, L=Redmond, S=WA, C=US chain building failed. The certificate that was used has a trust chain that cannot be verified. Replace the certificate or change the certificateValidationMode. The revocation function was unable to check revocation because the revocation server was offline.

We had come across this previously in the test and development environments but it only ever happened very occasionally. Normally we had been able to clean the credential cache or restart the app pools and it had always just worked. We had also reviewed some of the other articles online about similar errors and possible fixes but none of them had ever seemed to work. Since the problem didn't really affect us in test/dev and always went away easily it had never been given too much airtime.

This week we had a bigger issue where the production service had been running fine for months but suddenly stopped. None of the old workarounds had really made any difference. Cutting through some of the diagnostics steps we had taken while troubleshooting to keep this article short, we had managed to change the user account running the app pool on one server and that server started working. On the other server the same steps didn't work.

We had production service restored but were unable to get it working with the expected configuration and were still getting the above error on one server.

At this point we had engaged with Microsoft support through our Azure support agreement. While working with one of their engineers we found that with netmon and also the CAPI 2.0 logging available via Event Viewer we could see that some of the certificates could not be verified and there were some errors. This corresponded with some information in our proxy server logs about some url's being blocked. The blocked url's were:

We configure the proxy server for access to our Azure Service Bus namespace's ACS endpoint

We configure the proxy server for access to a couple of other url's which seem to be required, we used the ones which were out there in the general guidance online and also looked for any others which might be required during out early stages.

From the CAPI log and the netmon trace we could see that there were issues accessing these certificate related resources which we assume would be updates to certs or revocation lists. We were seeing things like:

HTTP 403 Forbidden error code

Proxy returning error 'X-Squid-Error: ERR_ACCESS_DENIED 0'. So the proxy is not allowing traffic to above URL.

In addition to our configuration above our WCF service which has been in production for a while has been using the 1.6 SDK. This has now been superseded by a few other releases. This hasn't really changed for a while but it hadn't needed to.

Based on the support call your experience with this error could be slightly different depending on what version of the SDK you are using. This is outlined below.

Service Bus SDK 1.8 or Above

You should no longer get this issue because the SDK no longer checks for certificate revocation.

Service Bus SDK 1.7

This can be worked around by using the following snippet in the configuration file.

<configuration>

<appSettings>

<add key="Microsoft.ServiceBus.X509RevocationMode" value="NoCheck"/>

</appSettings>

</configuration>

You should probably still consider looking into your proxy server to check what is being blocked.

Service Bus SDK 1.6

In our case this was related to blocked addresses on our proxy server. We modified the proxy server settings to have the following as allowed on our Squid proxy.

<My Namespace Goes here>-sb.accesscontrol.windows.net

mscrl.microsoft.com

crl.microsoft.com

public-trust.com

verisign.com

windowsupdate.com

msftncsi.com

crl.omni root.com

Lessons Learnt

There are a couple of lessons we can take away from this.

We need to set something up to report blocked addresses from our proxy server for these kind of situations. This had been working fine for ages and then this week the certificate network retrieval has been blocked and we need to know if this ever happens in the future before it affects service. In terms of our solution, when its configured correctly we don't expect any url's to be used which would be blocked as they should all relate to the solution and it seems we were not aware of all of them

We need to agree a standard for updating the SDK. This component hadn't changed for months yet it is already 5 versions behind the latest SDK which is not 2.1