Re: Yarn MR overloads Active Directory domain controller CPU

Have you or your AD admins also attempted to profile what specific AD operation(s) are pouring in? Are they group lookups? Or are they actual authentication requests? The latter would normally be unexpected, given use of tokens won't require re-auth.

Group lookups are indeed done for every HDFS operation when permissions are in use. However, the groups are also cached internally by HDFS for 5m by default (configurable), and often also by a NameNode-local NSCD/equivalent service. These things help reduce the backend load, but the need is certainly present and the cache timeouts are finite, so it wouldn't be too odd to see a lot of group related requests get fired to whatever user directory backend is in use.