We've setup a coworking space for 200-500 users (2 locations so far, expanding to 1-2 more in the next few months), we are using pfsense routers (multi-wan setup) coupled with ubiquiti toughswitches and unifi access points. We seem to run into the occasional problem where the internet slows down and we're unsure why. We believe our setup capabilities might be weak, our equipment choices might be incorrect and/or superior monitoring systems need to be setup. We're prepared to do it all as long as we're smart and frugal in our choices and not trying to kill an ant with a tank :)

Any data points from your own setups that we could leverage for such a situation? Or if you would like to professionally engage with us, we would be more than happy (judging by our equipment choices you do recognize we're not the most loaded!). We're not super technical, but when properly explained we could implement what we are instructed. I say this because we're in India and have struggled to find talent that knows anything other than Cisco!

If remote engagement doesn't scare you and the situation we have described is up your alley (pfsense + ubiquiti +﻿ 200-500 users) - would love to hear your free/paid advice.

Yep, gonna agree with Nick42 here. Have you seen an increased use in bandwidth during these times, or have you set up any web filtering/protocol blocking? It could be something as simple as someone downloading bit torrents on your network.

﻿Have you ran Wireshark or any similar programs to identify if the issue is (high) traffic related or not? And if so, where that traffic is coming from? I would verify this first before I started messing around with different hardware or configurations. For all we know, it could be a malware infection with a bot or something.

1) Not that we can see. It seems to be business as usual. One thing that does seem to happen occasionally but not always, that my pings to the access point goes out of whack. The typical sub 10ms, jumps into the triple digits, again it doesn't always happen which is what confuses us.

2) Pfsense is running on a core2duo machine with 4gb of RAM (it barely gets to 2-3% CPU usage and less than 10% memory. We also reboot each evening just to make sure this stuff stays low.

3) Services - nothing heavy, really just a captive portal. We have considered squid, snort, ntop etc. But have removed all at this time to ensure processor and memory load are NOT the cause of the slow downs.

Yes, we would like to setup monitoring, any suggestions on what parameters to monitor on which devices? We're still in the really confused place of whether it is backhaul trouble, routing problems, wiring, or access point configurations. We'll start to play with these two systems for monitoring. Thanks for the suggestion.

zwrightTM - No its nor bandwidth, we actively monitor bandwidth usage and we walk over to a user thats at multiple mbps for many minutes and request them to turn off/check their system. An odd method I'm sure, but I can say with conviction its not a choking of the backhaul. We have 20mbps and we are routinely seeing usage of 10-15mbps at peak.

TechWorx - We have run wireshark, i didn't know how to read most of the data it gave me, but I was looking for the all red colored entries and they seemed to be few and far between. I'm happy to reinstall and use it, any quick tips on what I need to be looking for? We have considered this situation, but didn't know exactly how to detect for it. Would be super if you have ideas on malware prevention and cure in a network where we don't have device level control.

We're using the basic Unifi AP's, 2.4ghz (with a couple of LR's at one of our sites), there are no Pro's and no AC's in our deployment. We just upgraded to the new 3 series of firmware as that seems to be the first stable release in the series (I believe the exact version is 3.2.1). We do have the access points specifically setup on low power and fixed at certain channels (1,6,11) and HT20, this gives us least amount of interference. The users range from 10-30 on a given access point. Internally, we have set 30 as our max and would deploy an additional access point if that happens. The usage count doesn't seem to be correlated to slow downs as we went up to 40-45 recently at our second location before we added 3 new access points and now we're seeing patterns closer to 15-20 users per access point in both locations.

This person is a verified professional.

Given what you have laid out. I would first look at the wireless environment and APs.

If memory serves, 30 concurrent connections is the max on the basic UniFi APs. So that may be a good chunk of the problem.

﻿So you are on the basics channels, 1-6-11. What does the RF space look like? Are you being stepped on/competing with other networks? A nice scan at various points in the area would help ID issues. Wi-Spy devices are not expemsive, but if you have no budget for it, something like a wifi scanner on a mobile/tablet works in a pinch.

If 1-6-11 are all saturated with wifi in the area, try using (or adding more APs with) channels 4 and 8 as well. Sure, there is some overlap, but the center frequency will be far enough off to help out. Using said extra channels has really helped in areas where 1 and 6 were really packed.