Recently I decided to look under the hood to see how exactly srtt is calculated in Linux. Actual (Exponentially Weighted Moving Average) srtt calculation is a rather straight-forward part but what goes in as input to that calculation under various scenarios is interesting and very important in getting correct rtt estimate.

Also useful to note the difference between Linux and FreeBSD in this regard. Linux doesn’t trust tcp packet Timestamps option provided value whenever possible as middle-boxes can meddle with it.

Basic algorithm is:
For non-retransmitted packets, use saved packet send timestamp and ack arrival time.
For retransmitted packets, use timestamp option and if that’s not enabled, rtt is not calculated for such packets.

Let’s look at the code. I am using net-next.
When a TCP sender sends packets, it has to wait for acks for those packets before throwing them away. It stores them in a queue called ‘retransmission queue’.
When sent packets get acked, tcp_clean_rtx_queue() gets called to clear those packets from the retransmission queue.

A few useful variables in that function are:
seq_rtt_us – uses first packet from ackd range
ca_rtt_us – uses last packet from ackd range (mainly used for congestion control)
sack_rtt_us – uses sacked ack
tcp_mstamp is a tcp_sock member which represents timestamp of most recent packet received/sent. It gets updated by tcp_mstamp_refresh().

For a clean ack (not sack), seq_rtt_us = ca_rtt_us (as there is no range)

If such a clean is also for a non-retransmitted packet,
[sourcecode language=”c”]seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt);[/sourcecode]

and for a sack which is again for a non-retransmitted packet,
[sourcecode language=”c”]sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt);[/sourcecode]

Code that updates sack→first_sackt is in tcp_sacktag_one() where it gets populated when the sack is for a non-retransmitted packet.

tcp_stamp_us_delta() gets the difference with timestamp that the stack maintains.

Here is how I test simple FreeBSD tcp changes with dummynet on bhyve. I’ve already wrote down how I do dummynet so I’ll focus on bhyve part.

Caution: Handbook entry on bhyve is the true source. Please refer to it for exact information. This post is super quick and may contain not-entierly-correct things. Also, I am lazy and all this config is what I am using, you may need to tweak a bit here and there.

A few months back when I started looking into improving FreeBSD TCP’s response to packet loss, I looked around for traffic simulators which can do deterministic packet drop for me.

I had used dummynet(4) before so I thought of using it but the problem is that it only provided probabilistic drops. You can specify dropping 10% of the total packets, for example. I came across dpd work from CAIA, Swinburne University but it was written for FreeBSD7 and I couldn’t port it forward to FreeBSD11 with reasonable time/efforts as ipfw/dummynet has changed quite a bit.

So I decided to hack dummynet to provide me deterministic drops. Here is the patch: drop.patch
(Yes, it’s a hack and it needs polishing.)

In the example above, it configures the pipe 100 to drop 3rd, 4th and 5th packet and repeat this pattern at every 7 packets going from server to client. So it’d also drop 10th, 11th and 12th packets and so on and so forth.

Side note: delay, bw and queue depth are other very useful parameters that you can set for the link to simulate however you want the link to behave. For example: ‘delay 5ms bw 40Mbps queue 50Kbytes’ would create a link with 10ms RTT, 40Mbps bandwidth with 50Kbytes worth of queue depth/capacity. Queue depth is usually decided based on BDP (bandwidth delay product) of the link. Dummynet drops packets once the limit is reached.

For simulations, I run a lighttpd web-server on the server which serves different sized objects and I request them via curl or wget from the client. I have tcpdump running on any/all of four interfaces involved to observe traffic and I can see specified packets getting dropped by dummynet.
sysctl net.inet.ip.dummynet.io_pkt_drop is incremented with each packet that dummynet drops.

Future work:
* Work on getting this patch committed into FreeBSD-head.
* sysctl net.inet.ip.dummynet.io_pkt_drop increments on any type of loss (which includes queue overflow and any other random error) so I am planning to add a more specific counter to show explicitly dropped packets only.
* I’ve (unsuccessfully) tried adding deterministic delay to dummynet so that we can delay specific packet(s) which can be useful in simulating link delays and also in debugging any delay-based congestion control algorithms. Turns out it’s trickier that I thought. I’d like to resume working on it as time permits.

Traditionally, freebsd-net has been the mailing list where networking problems get discussed but some have complained it to be too spammy and too focused on NIC drivers related issues. So a new mailing list has been created to specifically talk about transport level protocols: [email protected]

We’ve also started creating a list of TCP related RFCs and their support for FreeBSD to have a single point of reference.

Plan is to have a coordinated effort to improve TCP, UDP, etc.. so if you are interested in any of those protocols, please join the mailing list and help FreeBSD.

I usually build my own packages with poudriere but it’s not fun to do on tiny boxes so I just do ‘pkg install ‘ on them and use upstream packages. One downside is, that package is build with default options. I recently ran into a situation where I wanted to change some options for just a single port.

Now, what is the minimal set of things in /usr/ports/ that I need to checkout to be able to config/build just one port?
Turns out to be:

Again, mtdparts an important piece here to see how uboot expects the image layout:
mtdparts=ar7240-nor0:256k(u-boot),64k(u-boot-env),1024k(kernel),5760k(rootfs),256k(cfg),64k(EEPROM)

I picked up a working openwrt image and tried to load it.
My setup looks like this:

Black power adapter has 2 cables going to it:
o POE (yellow)- power over ethernet – which connects to the board
o LAN (green)- which connects to my working router

Laptop act as a tftp server here which is also connected to the router (via gray cable). This way laptop and board are both in the same network.
laptop has a tftp server running and I’ve assigned 192.168.1.254 to that network interface (em0 in my case) with “ifconfig alias”. This is because as you can see in uboot’s printenv, uboot expects the tftp server to be running at that address.
The board will obviously act as a tftp client.

Now, to transfer the image, after generating the image (which we will look into in a bit), copy that into /tftpboot on the server.

Alright this was the basics of how to upload a valid image onto the board. Fun part is to generate a *valid* image that board will accept.
I tried a bunch of different kenrconf’s available in the freebsd-wifi-build project but board was not accepting the generated images. After accepting the image, it used to fail like this:
Firmware check failed! (-2)

This error was coming from uboot which was proprietary and I could not find this error in any opensourced version of uboot.

General consensus about the reasons for this error was:
– bad layout of the firmware image
– wrong header (something that this uboot is not liking)

[email protected] showed me a trick to look at images with hexdum. After looking at openwrt’s working image by “hexdump -c image”, I found out that uboot was expecting something along the lines of “XS2.ar7240.FreeBSD” as version string in its header. If it does not see “ar7240” in the header, it would fail the check.

So, after all that and with [email protected] suggestion of using lzma’ed kernel, I could boot up the board which later failed at mounting the rootfs:
U-Boot 1.1.4.2-s594 (Dec 5 2012 - 15:23:07)

spi0: at mem 0x1f000000-0x1f00000f on nexus0
spibus0: on spi0
mx25l0: at cs 0 on spibus0
mx25l0: w25q64, sector 65536 bytes, 128 sectors
ar71xx_wdog0: on nexus0
ar71xx_wdog0: Previous reset was due to watchdog timeout
Timecounters tick every 1.000 msec
arswitch0port1: link state changed to DOWN
arswitch0port2: link state changed to DOWN
arswitch0port3: link state changed to DOWN
arswitch0port4: link state changed to DOWN
map/rootfs.uncompress: GEOM_ULZMA image found
map/rootfs.uncompress: 173 x 131072 blocks

I’ve learned a few very basic networking things while trying to make this work.

After having FreeBSD run on the board, next thing was to make it do networking. How Adrian’s scripts work is, it has configuration in /etc/cfg/

# pwd
/etc/cfg
# ls
hostapd.wlan0.conf manifest rc.conf
#

Here,
manifest file has list of files to be stored in flash on “cfg_save”. Basic workflow is that you make changes you want and do “cfg_save” which writes things to flash and then “reboot”. So router comes up with the saved settings.

As this is a new file and we want to preserve it across reboots, we should add an entry for this file into manifest file. This is how manifest file looks:
# cat manifest
etc/cfg/manifest
etc/master.passwd
etc/group
etc/cfg/rc.conf
etc/cfg/hostapd.wlan0.conf

rc.conf is where you specify your networking configuration:
# cat rc.conf
# Set the default system hostname
system_hostname="freebsd-wifi-build"

A few things to notice here:
* arge0 is the ethernet interface
* wlan0 is the wifi interface
* bridge0 is the bridge interface connecting all of it together. It does the bridging of ethernet and wifi interfaces to send and receive bits (and bytes).

As you can see everything looked sane and I could see it advertizing wifi network but when a client tries to associate, it would get stuck on “Obtaining IP address” and even if you assign static IP, client does not receive any packets.

Now, as you can see the ethernet cable in WAN (blue) port shows up as port0 and it is in vlangroup1. But vlangroup1 is in vlan 2 and it is tagged. Moving ethernet cable from WAN port to one of the LAN ports fixed the problem.

After having serial connection setup, next thing was to load FreeBSD on it. (It comes with preloaded linux on it.) I grabbed image building tool from Adrian Chadd’s scripts and generated an image out of freebsd-head suitable for this board: TP-WN1043ND.factory.bin

Now, what you do is, plug usb end of the serial cable into laptop which FreeBSD presents as a cuaUN device. (Check your dmesg o/p to know value of “N”)
Connect to it via:
#cu -s 115200 -l /dev/cuaU0

Now starts the fleshing of firmware part. The first step below is *VERY* important…
# This erases the flash between uboot and the firmware configuration area. Whatever you do, don't mistype this!
erase 0xbf020000 +7c0000

Now, we should setup the tftp part so that we can get image we prepared from tftpserver (laptop) to tftpclient (router board). uboot is very specific about the IP addresses of both server and client. As per “printenv”, server IP *must* be 192.168.0.5 so assign that IP to the ethernet interface (via ifconfig alias). Run a network cable from laptop to one if the LAN ports of the router. (Original instructions suggest using the WAN port which did not work for me (because of which I wasted 2 frustrating weeks debugging)). Start tftpserver on laptop and copy prepared image in respective location (/tftpboot by default)

5) Make zfs realize the fact that partition has been changed and make zpool
use the new partition which is actually the same one (ada0s1a).
# zpool online -e zroot ada0s1a ada0s1a
# zpool status
pool: zroot
state: ONLINE
scan: none requested
config: