We have a 100Mbit company network with approx. 50 workstations (mainly XP-W8, mainly DHCP), several servers (miscellaneous), some other devices (VOIP adapters etc.) and two Zyxel ES-1552 switches. There is a domain (AD/Windows Server 2012) running.
There are really annoying freezes in network communication, lasting from several seconds to more than one minute, concerning many but not all(!) workstations (no matter DHCP/static IP). These freezes occur usually in the morning when the network traffic is stronger.

For example, there are three XP workstations (A, B, C) in the same location. When the freeze occures, A and B completely stop networking (no internet/intranet, Windows explorer does not respond due to mapped network drives, A cannot ping B or anything else (timeout)) and it can last for more than a minute. At the same time, workstation C does not face any problems. Afterwards, the communication is restored and all three workstation communicate without any problem.

I'm rather a programmer, not a professional network administrator. I tried to reset switches, view logs, I even installed Microsoft Network Monitor but no help. There were several major changes in our network infrastructure in the last three months (a new server, new workstations, domain) but I cannot connect the problems with a particular event.

What are the suggested steps to diagnose this problem (without putting the network down)? Many thanks.

Swap the switches for something less 'consumer' oriented, I've got better switches than that at home.
–
Chopper3Mar 27 '13 at 8:12

1

100Mbit and XP? That's some old gear you have there.
–
LadadadadaMar 27 '13 at 8:20

It's been years since I've seen broadcast storms being a problem..
–
NickWMar 27 '13 at 9:59

Thanks for your comments. I've just discovered an option called "Storm control" which is currently disabled on both switches. Before purchasing new switches (our company tries to cut expenses) I'd like to fiddle with this setting.
–
user681768917Mar 28 '13 at 7:59

3 Answers
3

Changing switches for better ones (as many suggested) might help. But the problem disappeared without changing that piece of hardware. Three different changes have been done. One of them or a combination of them has helped.

Default Windows Server Backup still running in time when people were already at work has been replaced by 7-Zip in multi-threaded mode which is significantly faster and finishes before people come to work. This was the most probable cause for those freezes as the server was sending avg. 300Mbit/s through a switche's gigabit link to a NAS. Combined with people reading/saving large files this could be a problem.

Damaged network cable /STP/ (not used, but connected to one of the switches) has been eliminated. (It could cause a sort of short-circuit.)

Hyper-V Virtual Switch at the server has been switched off. Even if the data didn't go through it (I hope so - the other of the two physical Broadcoms was used), the data transfer rate has improved. The 7-Zip backup is even 15% faster since.

Wireshark has found that a workstation with an onboard Attansic network adapter (somehow damaged) was responsible for 95% of all the network traffic(!!!). After replacing it with another network adapter no broadcast storms discovered yet.

This sound suspiciously like Spanning Tree converging to me. Not sure what all those switches are capable of doing, but the standards based pre-Rapid Spanning-Tree implementation would take around 45 seconds to move a port from newly-up to forwarding. It could be that your switches are overwhelmed and are having to do something with STP on a subset of ports? Not real familiar with ZyXEL products...

Do you have any logs from the management GUI? Is the switch setup to do Spanning-Tree, and is there an option for Rapid STP or MST?

As someone else mentioned, have you considered upgrading these devices to something more enterprise-y? Something like a 48-port Cisco 2960 would be a good fit for a small business office, and they are quite reasonable on eBay. Config would range from dead simple to complex based on what features you're wanting.

Thanks. The [manual][1] doesn't mention any trace of SPT or MST. However I've found an interesting item called Storm control which is currently disabled on both switches. Do you think it's good idea to enable it? And what options (Broadcast only, Broadcast and multicast...) [1]: retrevo.com/search/v2/jsp/…
–
user681768917Mar 28 '13 at 7:13

10Mb Full Duplex? That doesn't look right. Your switch and the workstation should both be doing autonegotiation. Is your switchport side hard-set to 10Mbps/Full Duplex? How long is the cable run from PC to switch?
–
Keller GMar 28 '13 at 23:09

Yes. I randomly selected two log lines and there was 10Mbit. There is usually 100Mbit in those log lines, rarely 10Mbit. I don't know what's the problem. Today, port 45 is 100Mbit.
–
user681768917Mar 29 '13 at 8:55

I've enabled the "Storm Control". Today, no freeze registered yet but there are some workstations turned off. We'll see the next week if it has helped. Thanks anyway!
–
user681768917Mar 29 '13 at 9:02

There may be clues on to what is going on. This could be on your domain controller or workstation.

Test the environment in a maintenance window.

The last thing you want to do is make unintended changes and your whole network goes down. Also check whether these things happen when there are few pcs running, it may be a clue for further investigation.

Isolate the problem on the network, use a traffic monitoring tool to help find where it is happening.

Possible bufferbloat?

Usually bufferbloat is on the exiting router due to a small bottleneck, however poorly configured switches or routers may exhibit signs of this.

It may be possible that your network is being arp poisoned either by poorly configured device or (not likely), person. A device may poison your network until the correct updates have been sent out again.

Thanks. Checking event viewers was actually where I started. It revealed some group policy problems etc. which I fixed but nothing on these freezes. As I mentioned, I've installed "Microsoft Network Monitor" which seems to me very capable, but I didn't discover anything but something like "unreachable" just beyond my network adapter. (Well, I don't understand this utility in depth, I don't know where and what to find...) What do you mean by "Test the environment in a maintenance window"? Thanks.
–
user681768917Mar 28 '13 at 7:52