I'm looking for some help with some proactive monitoring within BES.
Let me know if you can help us out.

We have had MULTIPLE instances where we get reports of users not receiving mail while all the others are fine.

The only way we know things are actually 'broken' is by looking in the MAGT logs for these individual users. We see the occasional "MAPIMailbox::OpenMessage - OpenEntry (0x8004010f) failed" and other entries like those indicating that the MAPI connection to those individual users mailboxes fails - while others are just fine within the same log file.

During troubleshooting we find that the users that are broken are not on the same exchange server/database etc... but it does seem to follow the same messaging agent however not all users in that MA are down.

Also, during this partial outage, the BES monitoring still shows 'green' for the messaging agent containing the disconnected users. So BES thinks nothing is wrong and no alerts are sent yet users are not getting messages.

We have done quite a bit of troubleshooting with RIM and MS a few things that we've had to tweak in our environment to allow for a large number of users. The latest was this one (which seems like something RIM should put in their install docs):
Microsoft KB - 949469
NSPI connections to a Windows 2008-based domain controller may cause MAPI client applications to fail with an error code: "MAPI_E_LOGON_FAILED"

We have done a LOT of troubleshooting and tweaking to figure out these issues and get them fixed and things are running much better now (I willl spare you the details unless you want to know)

BUT it still raises the question:

Why is there no monitoring for these types of issues? - OR is there??

So I'm asking anyone out there who has dealt with this same problem or someone who knows more about BES that I do, is there something out there (BES or 3rd party) that will proactively monitor and alert for these dropped or missing MAPI connections?

It would certainly be nice if these types of errors could be monitored to provide alerts that point you directly to a root cause. The most likely reason that there isn't BES monitoring for these types of messages, is that MAPI errors are typically pretty generic. With respect to BES, these errors often expose an issue with a different component in your network (Exchange, GC, DC, etc.) so RIM will do best effort to identify these problems but won't necessarily provide monitoring sets for them. In your case, the MAPI_E_LOGON_FAILED messages simply indicated that the BESAdmin account was refused connection for one of several different reasons. (It sounds like the cause ended up being the NSPI bind limit in your case.) I've been in your shoes several times, spending hours researching an issue and wishing that the MAPI errors were more to the point. Often, they will send you off chasing a lead that ends up being a dead end.

You know your environment best, so your best bet is to get a good grasp on when these messages appear and then determine what the correlating root cause is. Then you can utilize the monitoring software in your environment and set up some SNMP traps to allow you to get on these problems proactively. Sounds like you should be in the clear for now though, especially with that NSPI limit increased on your GC's.