RPC Client Access Cross-Site Connectivity Changes

Nearly two years ago, I presented a session called High Availability Design Considerations at TechEd North America 2010. During my session I described the changes we were making around how MAPI connectivity should occur after mailbox moves and cross-site database *over events. Unfortunately, after I gave that presentation we cut the feature due to the complexity of the changes we needed to introduce and the lack of time for testing prior to Service Pack 1’s release. And though I tried hard to get the work prioritized, we were unable to ship these changes in Service Pack 2, either.

Sadly, not every feature cut can be performed in a surgical manner, and a portion of the code changes were actually left in SP1. For example, you may have noticed that the Set-DatabaseAvailabilityGroup cmdlet has a property called AllowCrossSiteRPCClientAccess. You can toggle this Boolean property to your heart’s content, sadly it will not affect any behavioral change in the product (I know…that’s why I think this image is oddly relevant):

With Exchange 2010, a major change was instituted in the way clients connect and access mailbox related data. Unlike previous versions, clients no longer connect directly to the Information Store on the Mailbox server role to access mailbox data. Instead, clients connect to a set of services on the Client Access server (CAS) role and services within the CAS role access mailbox data using MAPI/RPC from the Mailbox server on behalf of the connecting user. Think of it as an abstraction layer.

Basically, one simple change was made such that all mailbox-related MAPI connectivity goes through the RPC Client Access service on the Client Access server role. To facilitate this abstraction layer, changes were made such that the database objects are no longer child objects of Mailbox servers. A new property was added to the Mailbox database, RPCClientAccessServer, which defines the legacyExchangeDN that should be used to access the particular database. This property takes a FQDN as its value. To ensure that you have high availability and fault tolerance in the event a CAS fails, this value should be the FQDN of a load-balanced CAS array. This load-balanced FQDN is what we refer to as an RPC Client Access Server array.

For more information, see Brian Day’s posts on Demystifying the CAS Array Object, Part 1 and Part 2.

Typical Outlook Experience

All Outlook versions behave in a consistent fashion in a single datacenter (or single Active Directory site) scenario. The Outlook profile’s RPC endpoint is the RPC Client Access Server array (or a specific CAS if for some reason you did not create an array, which you should have done, and if you don’t know why I will say again, read Brian’s posts). Whenever a failure occurs within the DAG (database failure, server failure, etc.), the RPC endpoint does not change, thus Outlook continues to connect to the same RPC Client Access Server array. Whenever a failure within the CAS array (CAS failure, load balancer failure, etc.) occurs, the RPC endpoint does not change, thus Outlook will continue to attempt to connect to the RPC Client Access Server array.

All Outlook versions behave consistently in a datacenter switchover scenario as well, assuming you follow our guidance. Why is that? Well, in a datacenter switchover, you change the primary datacenter RPC Client Access Server array DNS entry’s IP address to now point to the standby datacenter’s RPC Client Access Server array. Autodiscover continues to hand out the primary datacenter RPC Client Access Server array as the RPC endpoint. Existing Outlook clients do not need any reconfiguration; once the DNS cache on the client is updated, the client simply connects to the endpoint that now resides in the standby datacenter. In order to fully understand why this works you need to grasp that neither the client nor the CAS really care about the site in which the CAS exists, they simply accept that they are able to connect and that the CAS the client is connected to is able to provide access to the mailbox.

Cross-Site Database *over Behavior

To understand this scenario, it is important to understand that when you configure a multi-site DAG, the RPCClientAccessServer property for a given database is typically associated with the RPC Client Access Server array that is in the same AD site as the copy of the mailbox database with the lowest activation preference number. For example:

In the event that a copy of the database is activated in the standby datacenter, the users will continue to leverage the RPC Client Access Server array from the AD site where the mailbox database with the lowest activation preference value resides as their connectivity endpoint.

Figure 2: MDB1 database has been activated in the Standby Active Directory Site

If you review the RPC Client Access logs on the source RPC Client Access Server array you will see:

In the event the RPC Client Access Server array in the AD site where the mailbox database with the lowest activation preference value resides becomes inaccessible to the users, Outlook clients would be unable to connect to their mailbox that resides in the opposing datacenter. In other words, there is a datacenter outage event and the datacenter switchover process needs to be performed.

A simpler way to look at this behavior is that Outlook always connects to the configured RPC endpoint (assuming it is reachable) regardless of the database’s RPCClientAccessServer property value.

Can the system ever change the RPCClientAccessServer value automatically?

The only time the system changes RPCClientAccessServer value on the database is when the administrator changes the ActivationPreference number on the activated database copy such that it now has the lowest value (meaning it becomes the preferred copy), as seen in Figure 3.

Figure 3: MDB1 database has been activated in the Standby Active Directory Site and the RPCClientAccessServer property has changed

However, the Outlook clients with an existing Outlook profile would continue to use the old RPC endpoint rather than the new RPC endpoint (even though Autodiscover detected the change). This is because the old RPC endpoint does not return an ecWrongServer response to the client. The RPC endpoint accepts the connection; therefore, Outlook ignores the Autodiscover response because it has a working connection. In the event that the old RPC endpoint becomes inaccessible, Outlook 2007/2010 would update its settings (Outlook 2003, on the other hand, would not as it does not leverage Autodiscover). At any time you could force Outlook to use the new RPC endpoint by forcing a profile repair.

What happens if the administrator manually updates the RPCClientAccessServer value after a cross-site database *over event?

Going back to Figure 2, if the administrator manually updates the RPCClientAccessServer value to point to cas-sec.contoso.com for MDB1 after the mailbox database copy on MBX-C is activated (whose ActivationPreference value is greater than 1), then Outlook clients with an existing profile will continue to use the old RPC endpoint rather than the new RPC endpoint as long as the old RPC endpoint remains available (profile repairs will correct the issue). Outlook profiles created after the RPCClientAccessServer value change would use the new RPC endpoint.

Moving Mailboxes between Active Directory Sites

Prior to Exchange 2010, when you moved mailboxes across servers, the Outlook RPC endpoint would update to point to the Mailbox server (or clustered Mailbox server instance) hosting the database where the mailbox resides. After the mailbox move was completed, the Outlook client user would be prompted with the “The Exchange administrator has made a change that requires you quit and restart Outlook” dialog. After restarting Outlook, the client would be connected to the new RPC endpoint.

With Exchange 2010, you may have noticed that when you moved mailboxes between AD sites, users did not receive this dialog. Furthermore, you may have noticed that users also did not have their RPC endpoint updated to reflect the RPC Client Access Server array associated with the mailbox database in the AD site where the mailbox now resides. This is because the old RPC endpoint does not return an ecWrongServer response to the client. The RPC endpoint accepts the connection; therefore, Outlook ignores the Autodiscover response because it has a working connection. In the event that the old RPC endpoint becomes inaccessible, Outlook would update its settings (Outlook 2003, on the other hand would not as it does not leverage Autodiscover). At any time you could force Outlook to use the new RPC endpoint by forcing a profile repair.

Now I am sure you understand the above lolcat picture.

The Future…SP2 RU3 and beyond

I never gave up hope about addressing these issues. A few of us got bloodied, but the RPC Client Access development team, Exchange Servicing Team and I worked tirelessly to get the two required necessary changes into the product. The first being, fixing up Outlook’s profile when a mailbox is simply moved between databases in different AD sites, and the second when a cross-site database *over results in Outlook using a CAS that isn’t the most optimal choice for the location of the currently activated database.

Mailbox Moves

By default, once you have installed SP2 RU3, when you move mailboxes between AD sites, all versions of Outlook will get prompted to restart and the Outlook profile’s RPC endpoint will be updated.

If you review the RPC Client Access logs on the source RPC Client Access array you will see:

2012-05-06T14:43:03.875Z,37,1,/o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=userded,,OUTLOOK.EXE,14.0.6025.1000,Classic,,,ncacn_ip_tcp,,OwnerLogon,1144 (rop::WrongServer),00:00:00.0156267,"Logon: Owner, /o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=userded in database mdb2 last mounted on MBX-C.contoso.com at 5/5/2012 9:44:05 PM, currently Mounted; Redirected: this server is not in a preferred site for the database, suggested new server: /o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=cas-sec.contoso.com",RopHandler: Logon:

Notice that the RPC operation (ROP) is WrongServer (also known as ecWrongServer). This forces the Outlook client to do a profile discovery and update the profile based on new information found within the directory. The profile gets updated and once the client is restarted, the client establishes its connections to the new RPC endpoint.

And since this question will be asked, what other conditions will warrant the “The Exchange administrator has made a change that requires you quit and restart Outlook” dialog?

When you specify the DoNotPreserveMailboxSignature property on New-MoveRequest.

When the source and target mailbox databases have a different public folder hierarchy store.

When you move your mailbox from a legacy version of Exchange to Exchange 2010.

Cross-Site Database *over Events

Cross-Site database *over event behavior will depend on the value of DAG property AllowCrossSiteRPCClientAccess. If you set the AllowCrossSiteRPCClientAccess property value to $true, then the behavior I described in the previous section is the default behavior - in the event that the database is activated in the standby datacenter, the users will continue to leverage the RPC Client Access array in the AD site where the mailbox database with the lowest activation preference value resides as their connectivity endpoint.

If you set the AllowCrossSiteRPCClientAccess property value to $false (the default value for the property is $false), then the Outlook profile’s RPC endpoint will be updated to be the RPC Client Access Server array that is in the same AD site where the database is active and mounted. Note that the RPCClientAccessServer property is not updated as that defines the preferred site.

Figure 4: MDB1 database has been activated in the Standby Active Directory Site and Outlook profile has been updated automatically

If you review the RPC Client Access logs on the source RPC Client Access Server array you will see:

2012-05-06T15:12:42.958Z,47,7,/o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=userded,,OUTLOOK.EXE,14.0.6025.1000,Classic,,,ncacn_ip_tcp,,OwnerLogon,1144 (rop::WrongServer),00:00:00.0156262,"Logon: Owner, /o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=userded in database mdb1 last mounted on MBX-C.contoso.com at 3/6/2012 2:59:30 PM, currently InTransitSameSite; Redirected: this server is not in a preferred site for the database, suggested new server: /o=E2010/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Configuration/cn=Servers/cn=cas-sec.contoso.com",RopHandler: Logon:

Like in the move mailbox scenario, the WrongServer ROP forces the Outlook client to do a profile discovery and update the profile based on new information found within the directory. The profile gets updated and once the client is restarted, the client establishes its connections to the new RPC endpoint.

Conclusion

So there you have it – with SP2 RU3 (or later) you can ensure that mailboxes moved between AD sites will have their profile updated correctly. In addition, in the cross-site database *-over scenario, you can control whether to allow the cross-site RPC connectivity or to force the Outlook client to use the RPC Client Access Server array that is in the same AD site as the activated and mounted database (the default behavior). I think it is appropriate to end with this:

Fantastic news, Ross. I always felt that Outlook was missing out from the Exchange 2010 HA benefits of a cross-site DAG because the manual step of updating DNS was required. Looking forward to putting this functionality through its paces.

@Nelson, I'm not to what you are referring specifically, but if you are referring to our datacenter activation steps which involve changing the primary datacenter's DNS records to point to the IP addresses in the secondary datacenter, that is still our recommendation. This ensures that users do not have to learn a new namespace during this scenario for each client type.

@David, Unfortunately, I will not be at TechEd 2012 this year.

@Simon, we try very hard to minimize the marketing hype on this blog. :)

Very very excited about this! I've had a few networking issues where databases would *over to the secondary data center and would knock my clients offline..this change will now allow me to correct the issue and restore service to the primary without taking clients offline.

I do have one question.. with cross forest Mailbox moves will ecWrongServer be issued after the move is completed to avoid resetting up outlook profiles and OST files? If so, would it be issued by the source server or the new destination server if autodiscover records get changed to the new server? I've always worked around the issue by deleting the registry value 001f662a and then doing an repair profile after autodiscover is changed.

Ideally the client would continue connecting to the old rpcclientaccessserver (since it obviously works) until the next "natural" restart of the client, at which point it would find the new server in the background and start using it. Too bad we're forcing a restart.

@Prescott – The issue is due to the behavior of Outlook not applying updated Autodiscover settings when it has an existing connection. That's why prior to SP2 RU3, clients would not update in a cross-site db *over event. What we did in SP2 RU3 essentially is to force the RPC Client Access service to reject the connection, which kicks Outlook into do a profile discovery (in case of OL2007 and later via Autodiscover) and to apply all the settings to try and get a connection re-established. And due to the way Outlook caches information about the mailbox session (see the article I pointed Jeff to) there is no way to not have the restart prompt.

Great read, thanks Ross. If a database *over to the secondary datacenter, is there a major benefit in having the AllowCrossSiteRPCClientAccess set to $False. For example, if the active database fails and the best copy selection process determines that the passive copy in the secondary datacenter is the best choice to active, all my end-users would be interrupted with a prompt informing them to close and re-open Outlook. Once the failed database is back online, if I want to change the active database back to the primary datacenter the users would again be interrupted by a prompt. I could also see this happening during server patching. So I have 2 questions please,

1. Could I change the AllowCrossSiteRPCClientAccess value to $False after a datacenter failure occurs, will the client automatically redirect to the standby CAS array without manipulating DNS records?

2. From a WAN prospective, if a database *over to the secondary datacenter, is there any savings in having the CAS servers traverse the WAN and communicate the mailbox server or is it negligible in having the client traverse the WAN and communicate to the CAS servers in the secondary datacenter?

@Chris – Great questions. So you are correct that whenever a cross-site database *over occurs (whether a controlled or uncontrolled scenario) that the Outlook user will be interrupted and require restarting Outlook. If you perceive this is an impactful, you can set AllowCrossSiteRPCClientAccess to $true and ignore the situation. I don't have any specific bandwidth data to give you (you could use Neil's Bandwidth Calculator to get an idea – blogs.technet.com/…/exchange-client-network-bandwidth-calculator-beta2.aspx); I will point out that Outlook RPC is more resilient to network conditions than CAS to MBX RPC.

As for datacenter activations, no I do not advise using AllowCrossSiteRPCClientAccess to control the Outlook clients. Continue to use our prescribed guidance which is to move the DNS records. This ensures *all* clients are not impacted by the datacenter activation (as they will all continue to use the same namespace and not require reconfiguration).

Nevertheless, you write " In the event that the old RPC endpoint becomes inaccessible, Outlook would update its settings…", and according to my tests, Outlook clients (2007 and 2010 ) never updated automatically their profiles in that case unless I force a profile repair.

My test was about "dialtone portability scenario" in same AD site described in Technet.

@vrai_bunny – you mentioned that your tests were in the same AD site, so I'm assuming you were following the prescribed guidance and using a CAS array. In that scenario, there would be no profile update as the CAS array FQDN would still be the valid endpoint. If not, then more information is needed.

First off, thanks for pushing to get this feature in. Second, would this feature (automatic outlook profile update) work in the scenario where you had an existing multi-role server in place (with rpcclientaccessserver already stamped on the database in in users profiles) and you then created a new CAS array and manually updated the rpcclientaccesserver value to point to the new CAS array name?

I'm in that scenario now and am getting ready to create a new CAS array (this was the first site with 2010 so things rolled out a bit hap hazard, rest my my sites got the CAS array created immediately) but am dreading having to use a prf file to update each client in the site (about 400 users in this site) to make the switch over seamless.

We had an issue after deploying SP2 RU3 with Outlook Anywhere users. We are using load balancers to geographically redirect you to the closest CAS servers. We used owa.company.com as the proxy server name for OA users. After the rollup OA stopped working, some users couldnt send and receive and others were getting prompted to restart Outlook multiple times.

The solution was to go to the CAS servers per site and change the url for OA from owa.company.com to localcasarray.company.com and that fixed it…. Just in case someone else bumps in to this problem.

@Todd – The profiles won't automatically update to the new RPC endpoint unless the original RPC endpoint becomes unavailable for some reason (server is offline, etc.). The reason is that the original RPC endpoint is still a valid endpoint in this scenario.

@Pedro – I'm not sure what issue you are experiencing there. You should open a support case. I will advise you to check out Brian's blogs I referenced in the article. We specifically do not recommend using the RPC CA namespace as the OA namespace, because it will force Outlook to attempt to connect over TCP first (which for Internet users will time out, and then kick in the OA).