Watchdogs

by Brian, January 12, 2017

Wireless networks have a bit of a reputation for instability. Modern hardware has fixed most hardware problems, but there is work that needs to be done to make the firmware reliable. You can do this with “watchdog” scripts. I haven’t had to reboot a router that is running our watchdog script.

Our firmware image (based on qMp) comes with a “bmx6health” script that checks whether the mesh software is running correctly and restarts it if necessary. This script by default runs once per day. I’ve found it better to run this every 5 minutes. You can do this by editing the crontab-

ssh into the router and in the terminal-

crontab -e

This opens a vi editor and you can change or add different scripts to run at different times. (The vi commands you need are “i” to insert, “esc” to stop editing, and “:x” to save and eXit.)

For some nodes, their main purpose is to be an internet gateway. To ensure that they always try to be online, you can add a watchdog script that pings a known website and calls “network restart” if it fails. These kind of scripts often ping 8.8.8.8, which is Google’s DNS server.

I’ve discovered 3 ways to recover a qMp mesh router that has functioning wifi but has lost internet- network restart, bmx6 restart and restarting dnsmasq-killall dnsmasq; dnsmasq start. Sometimes the dns forwarder, dnsmasq will stop working correctly letting you ping some things and not others. dnsmasq will then forward bad dns info to the other routers too so it needs to be fixed quickly! killall dnsmasq; dnsmasq start will fix it.

gwck is a qMp utility that is restarted after network restart.

Another problem I’ve had occasionally is that the wifi will lose connections. Even though the radio is on and the router lights are normal you can’t connect. I’ve written a simple script to restart wifi if both the ad-hoc and access point interfaces have no connections. It is a bit of a hack since the interface may be ok, but since nothing is connected via wifi it doesn’t hurt too much to restart it. I’ve also found that a network restart is necessary to make the wifi stable.

By default wlan0 is the ad-hoc interface that is used to mesh the routers and wlan0ap is the access point. This script checks to see the number of wireless interfaces so it works with dual-band routers and routers that are only ad-hoc or ap.

I’m using “Signal: unknown” to show there is no connection. It seems to work reliably. You could also try iwinfo wlan0 assoclist.

“sleep 5” is usual between “wifi down” and “wifi up”. I’ve found it not necessary when there are no connections, but I’ll leave it there in case.

It can run once a minute as it detects whether a network restart has just occurred and will wait 20 minutes before restarting again. I added the 20 minute delay so the router is still functional without an internet gateway.

Thanks to Nitin for help with the wifi problem and Zach for help with dnsmasq.