This will probably be very vague as I don't have much raw data to work off of here but I'll give it my best shot at explaining.

We just purchased an EqualLogic SAN and setup a storage network. We're getting speeds upwards of 500 MB/s on IO (mostly around 400 MB/s sustained). Our primary focus was to build a highly available SQL failover cluster around this shared storage.

We created the failover cluster, installed the instance, added our two servers to the cluster. Then we detached our databases from the local instance, copied the DB files, including logs, over to the cluster storage and attached them to the failover instance. After reconfiguring SQL Agent and such for log clearing and optimizations everything was working perfectly. However, some aspects of our site, mainly ones with intense queries are taking forever to load. We're taking load times of up to 20-30 seconds from about 1-2 seconds before. The SQL instance is on a brand new server with 32 logical cores (4 CPUs, 8 Cores) and 96GB of RAM. The IIS site is running on our old server which is still decent with 96GB of RAM but lower processing power.

All configuration values are exactly the same. We went through them one by one comparing the local instance to the failover instance. The storage network is doing fantastic, handling read/write speeds and load balancing through the multiple uplinks we have setup. We simply cannot put our fingers on it. Less intense queries run with ease and even show a performance increase, but these more complex ones do terribly when running in the cluster instance, on either server.

If anyone has any suggestions or ways I can help diagnose this more it would be extremely appreciated. We've been staring at the activity monitor seeing a few queries always pegging the wait time charts.

Maxdop is set to 0 as it is on the other server. We're using Enterprise edition so it'll just use both processors right?

Also, do I need to manually run the update stats command or is that done automatically? I read in the documentation that it's automatically updated every so often. This configuration has been running for almost two weeks now.

Yes 0 means it'll use all available processors. We set ours to 1 or 2 though. While we have 2 CPU sockets, they are quad core and then hyper threaded (most of our servers at least). So the OS is seeing 16 CPUs. We don't want our OLTP queries to span 16 CPUs! Do you see just 2 CPUs from the OS? Or were you referring to sockets only?

Auto update stats is enabled by default, however when it runs is sometimes not often enough. We update stats nightly, and on some systems we update it more than once per day. There isn't a one size fits all for what's needed to maintain the databases. You start off with a standard practice and adjust as needed.

2 Physical CPUs, 8-cores a piece and hyper-threaded. So 32 threads when all is said and done.

We might just try manually updating stats. I can't run it right now unfortunately since we're in a heavy traffic period. We were hoping to fix the issue before this period but couldn't get to the bottom of it.

Should I limit Maxdop since you said it refers to threads instead of physical sockets? Or do you think updating stats would probably be my best bet?

Honestly I would do both, but it really depends on your system. 0 for MAXDOP is the default and is basically not recommended these days due to the high number of CPUs. If your workload is OLTP, then 4 or under is typically what's recommended. We do 1 or 2, but that's based on testing.

Next step is to do analysis on the intense queries: what is it waiting on, what's the execution plan show, ...