SQL Wait ASYNC_NETWORK_IO on NAV Production

While we are looking into NAV SQL Performance issue, we found that one of the top wait is "ASYNC_NETWORK_IO".As per my understading the NAV App server is not consuming the data generated by DB quickly.

Our Prod Setup is:

-2 Terminal services for NAV client, and they are load balanced.
-2 App Servers are load balanced
-1 DB server in AG group with Sync

Can someone help me how to trouble shoot this issue with NAV client? or some one come across this type of issue before.When i asked my Infrastructure team, they told me that every thing is fine, and nothing they suspect from NAV App server or Terminal server.

Is there any setting with in NAV defines Client memory size etc.? or is it down to NAV coding etc.?

Any help would be appreciate.

Regards
Sarma

0

Answers

Since you have an infrastructure team I guess you have quite decent and expensive hardware around, including all network gear. Yet it still may be worth to check a few things. If you google around for "TCP/IP offloading SQL Server problem" you will find a good few articles about how to not to setup SQL server's network cards to avoid SQL Server performance problems.

Massive ASYNC_NETWORK_IO does not always mean problems. The real meaning of it is that "the NAV App server is not consuming the data generated by DB quickly enough.". That's it. It does not have to consume the data slowly, it just means that the SQL Server is quicker delivering it. Which is good in fact, as long as users are not complaining about the slow NAV.

If they are, then the sad truth is that it is almost always down to NAV coding - investing time in NAV code and table structure/indexing optimisations usually give the best pay-offs.

Memory-wise as far as I know you can only play with the cache size settings on the NST server, I am not aware of any RTC client side settings. They would not make much sense or impact anyway as the data flows between the NST and SQL server, the RTC clients are involved in data processing.

While we are looking into NAV SQL Performance issue, we found that one of the top wait is "ASYNC_NETWORK_IO".As per my understading the NAV App server is not consuming the data generated by DB quickly.
...

Had the same problem in NAV 2013.

How many CPU cores do you have on your SQL server? And what is you MAXDOP? if you have MAXDOP set to 4 or 8 then setting it to 1 may give you a quick fix whilst you sort out the root cause. But this will most likely slow down the system, but at least it wont lock out new users logins.

Ultimately take a look at your worker threads, as this is the most likely cause. But don't set them too high if you don't have enough CPU cores.

...
Massive ASYNC_NETWORK_IO does not always mean problems. The real meaning of it is that "the NAV App server is not consuming the data generated by DB quickly enough.". That's it. It does not have to consume the data slowly, it just means that the SQL Server is quicker delivering it. Which is good in fact, as long as users are not complaining about the slow NAV. ...

Noit always a problem but if it reaches the limit, then the server will refuse new connections and they will have to restart the service tier, which is never a good thing.

Ideally you want your Middle tier matched to the work load and the SQL server so that everything runs smoothy, and it is best to address these issues before the whole system grinds to a halt.

The DB server has 1 CPU and 16 cores.
The SQL Memory 100GB
The MAX_WORKING_THREAD is default one i.e. 0(Internally SQL calculating 700+ threads)
The MAXDOP value 1
The Always on group with FULL Sync mode.

We are getting this issue intermittent and one of the NAV session consumes 400+ threads(even after MAXDOP to 1).We are unable to identify root cause.Can some one helps me that would be great.

The DB server has 1 CPU and 16 cores.
The SQL Memory 100GB
The MAX_WORKING_THREAD is default one i.e. 0(Internally SQL calculating 700+ threads)
The MAXDOP value 1
The Always on group with FULL Sync mode.

We are getting this issue intermittent and one of the NAV session consumes 400+ threads(even after MAXDOP to 1).We are unable to identify root cause.Can some one helps me that would be great.

Regards

I would slowly increase MAXDOP, try 2, 4 then 8 stop at 8. At the same time you would increase the worker threads, Take regular recordings under similar conditions and get it in balance.

FYI 400+ threads sounds pretty normal for using one core.

This of course all depends on the number of users and the complexity of the queries. For example a page with a lot of cues and flow fields (Especially if the flow fields don't have VSIFTS) can use a huge number of worker threads. I have seen a NAV server shut down with MAXDOP = 4 and 3,500 worker threads. Obviously the proper way to fix this is to look at the queries and fix the code, but as if it gets critical, then increase the worker threads.

Just to be clear, increasing MAXDOP and Worker Threads is not the solution to the problem, it's the band aid fix to get your system going again to give you time to analysis the code and fix the code problems.

Issue is:- some times NAV server consuming all threads in SQL server, and DB is becoming unresponsive.
I have to log into the DB as DAC and kill couple of sessions to make DB responsive again.The
observation was one NAV session consumes 400+ worker threads.

Just want to double check with you all, if i set MAXDOP to 1, how come nav is using multiple worker threads with in single DB session. Is NAV overwriting MAXDOP or something which i missed from SQL server concept?

OK so it's not huge, and with 100gig of RAM you should be fine, so the thing to focus on is what is using up all the workers. Look at the long running queries, but really 3 worker threads per connection it not a lot. So either way you need to increase them.

Another biggest problem in current environment is deadlocks in NAV DB. On average, we have around 120+ deadlocks per day.We know the queries that are causing these dead locks,.

That is a lot of deadlocks. Which really should not happen on a regular basis, they fixed a lot of code in 2013 and 2015 but you will still get dead locks. But these are code issues. On older versions VSIFT used to cause deadlocks, but I don't see a lot of that from 2013 on. I would more likely say it is due to custom code.

Since you have such complex queries using so many workers and causing dead locks, I would suggest that you need to look at the code and find where it is dead locking. Are you also getting blocks and time outs?