UAG DirectAccess unable to connect to resources via Tredo

Recently as part of a fairly complicated client UAG DirectAccess implementation I came across several rather puzzling access issues. After re-building the solution from scratch and still having the same symptoms I had to raise a call with Microsoft Product Support Services (PSS) and together with the excellent engineer, Balint, we uncovered some fairly simple (if not infuriating) solutions! I’ll cover the first problem in this post and do a follow up for the second.

Symptoms

UAG DirectAccess clients are able to connect to internal resources intermittently.

Running an IPConfig on the client machines shows that when they are using IP-HTTPS they are always able to access internal servers (if a client has an IP-HTTPS address as well as a Teredo on, it is generally using IP-HTTPS to transmit). When the client is running via Teredo certain servers are not accessible.

I’ll run though some of the troubleshooting steps but fee free to skip straight to the conclusion!

Overview

The UAG deployment consisted of two UAG servers configured in an array with the required public facing IP’s.

The certificates had all been configured (including local machine certs) and infrastructure servers defined for AD, Update servers and the NAP server. The group policies had been generated including a correctly configured NLS server and applied to the UAG servers and clients. The UAG activation monitor showed that everything was up and running…

The client would turn on, connect to the internet and then bring up its DA tunnels. During initial testing everything looked great. The client could access the domain, browse folder shares and get its anti-virus updates. NAP also seemed to be functioning as expected with the health certificates disappearing when AV was disabled & then doing auto-remediation to bring themselves back into compliance.

The last stages of testing before project completion was to run through a list of internal services that most clients would be using when out and about. This list included some websites (SharePoint) that were published internally using an ISA server array (ISA was used to manage the SSL connections & load balance). This is where things started taking a turn for the worst…

Access to the SharePoint site was intermittent and there did not seem to be any pattern emerging for when it would or wouldn’t work. I tired varying internet or broadband connections but they didn’t seem to make any difference. Eventually to rule out the onsite internet connection I used my mobile phone as a modem to do some more testing from another laptop. Suddenly the SharePoint sites were working and everything looked great.

Troubleshooting

At this point I did an ipconfig and noticed that due to the unreliable nature of my mobile phone internet link the client had swapped over to IP-HTTPS. This got me thinking & I used a local windows firewall rule on my original test laptop to block outbound UDP 3544. Again when the client swapped over to IP-HTTPS DirectAccess would come up and the client would have full (albeit slightly slower) connectivity to the internal network.

The other strange thing about this was that I could quite happily browse the SharePoint farm when logged on to the UAG servers so this meant there wasn’t any internal network routing issues. The problem looked like it had to lie somewhere inside UAG? I thought the issue lay somewhere in the UAG array as some sort of a routing issue as I had read varying posts about how Teredo or IP-HTTPS can get routed differently. I did consider looking at deploying ISATAP but the client wasn’t keen on this option and as the servers I couldn’t access was being published via ISA server there was very little chance of this helping (as ISA very firmly does not know anything about IPv6!).

During the initial phases of the PSS call we stumbled across another side issue that I’ll put in my next post that was also interesting. By the time we had resolved the side issue we noticed that Teredo was fully functional and could access the SharePoint sites. Reviewing what configuration changes we had made about the only thing had been to allow pings from the UAG servers to the ISA server…

During testing I had noticed that each time a DA (Teredo) client had tried to access the SharePoint resources you could see in the ISA firewall log a ping request originating from the UAG servers & then no further traffic. As an experiment we blocked pings again on ISA & that seemed to stop SharePoint access.

I then went on to do some testing using our lab environment & drew the following table of behaviour:

As you can see from the results it was fairly conclusive that in order for Teredo to work the client MUST be able to ping the server / resource it was connecting to. If the ping was blocked at any stage then Teredo would not function (for any protocol) On the other hand IP-HTTPS was oblivious and ploughed on regardless.

Conclusion

It turns out that as part of its protocol definition Teredo uses ping as part of its connection process:

IPv6 Internet destination

For packets destined for the IPv6 Internet, the Teredo tunneling interface uses ICMPv6 Echo Request and Echo Reply messages as substitute for the address resolution process of Neighbor Discovery. An ICMPv6 Echo Request message is sent to the destination. The ICMP Echo Reply message that is returned contains the IPv4 address of the Teredo relay closest to the IPv6 host on the IPv6 Internet. For more information, see “Initial communication from a Teredo client to a Teredo host-specific relay” and “Initial communication from a Teredo client to an IPv6-only host” in this article.

In our case because UAG was doing NAT64 & DNS64 in the middle of the connection it means that the UAG servers must be able to ping the resource you are trying to access so that the client gets a reply and will continue the conversation.

Teredo working with pings allowed

Pings blocked so Teredo fails

So after all the problems and troubleshooting looking into certificates and routing all comes down to ping!