This is so pointless and I can not believe MS is telling this to customers. I discovered that you are able to get this working properly without this non sense TTL and register just a single IP. Let me explain:
1. First of all obvious I don’t like this custom setups, long run my folks always need to be aware they can’t create a regular AG, they need parameters….long term will cause issues with new folks etc.
2. Everything works properly if your Application client is on the same subnet with one of the SQL Multi-subnets nodes, doesn’t matter which one. So I am wondering how come MS is not capable of figure it out why as soon as I move my Application client to another subnet outside of the Nodes subnet it won’t work anymore.
3. Challenge with lowerTTL let`s say 30 seconds. Good luck buddy. Your application client in Subnet A let`s say talks to DNS server A. Now your SQL Node B in Subnet B fail-over, updates the DNS server B in site B, no way in hell your Client A can now take the new IP of that record even with a TTL of 30 seconds, WHYYY ? Cause DNS Server B needs to replicate though AD replication most likely 15 minutes or 5 minutes whatever, so my point is .. Lower TTL for fail-over is just a story…

Hi George, I don’t agree about it being pointless. I think it’s a good feature to have and I believe MS made improvements on it with Win Svr 2016 site-aware failover clusters.
Regarding your #1. Don’t see it as a custom setup. For you, this may be custom, but for other environments, these parameters are requirements for their applications to be able to leverage the functionality across sites and they let them adapt the cluster config to their specific business needs or failover policies. New folks should be trained when onboarding about specific env configs. Assume nothing is setup as default.
Regarding your #2. The application should be pointing to the LG. Try the MultiSubnetFailover=True on your connection string. More so, your network should allow traversing the required subnets if you want to independently failover different layers of your stack. However, I think you’ll experience latency or other performance issues depending on your env due to network reach. I recommend failover the stack together and that doesn’t have to be done automatically nor at the same time. A geo-failover typically means a bad case scenario, vs the typical case, would be failover locally meaning from a primary to an onsite secondary which would be on the same subnet for which a partial failover of the stack would suffice. In any case, MS did put out a newer version of the .Net framework and SqlClient to enable MultiSubnetFailover=True, but I think the issue you’re describing is related to the design or network of your specific env.
Regading your #3. Why would ever set the TTL so low? lol, If that’s the case, don’t design it for multi-subnet failover. If your RPO/RTO window is so small then use sync replicas on the same subnet so the LG doesn’t need to change IPs during a failover and nothing gets updated in DNS. Anyways, Client A can use in its connection string the setting MultiSubnet=True and/or you can lower the AD replication setting down to 15 minutes like you mentioned (try cmds repadmin /showrepl, or repadmin /sync) but think about it…what if you simply need to do an svr reboot that takes less than 5 minutes, do you want to fail-over the whole stack across sites? I don’t think so.
The point of all this is that it’s proposed more for a DR case scenario or 2nd-3rd tier failover in which case the organization might want some manual intervention or slight delay to avoid an automated attempt of resources failing back and forth across sites.
Thx,
Hiram