Tuesday, March 19, 2013

AR9380 support on FreeBSD; why it's taken so long..

There's now public, open source support for the AR9380 and later chips for FreeBSD.

It's not yet in the -HEAD tree - I'll get to that.

Let me take you on a bit of a journey.

I started a little side project late last year - I wanted to see if I could make the AR9380 HAL from the Qualcomm Atheros mainline driver (10.x branch) work on FreeBSD. I was hoping that the HAL API hadn't drifted all that much over the years.

Why do this? Two reasons:

I wanted to see if I could open source the HAL and have it work with FreeBSD; and

I didn't want to take on a similar project to what ath9k had to do - which is to take the existing HAL, convert it into something Linux-upstream-compatible, then push THAT into open source.

There's only one of me, and I don't want to spend all of my evenings trying to figure out which changes to the internal driver HAL need merging into "my" version of the HAL. I want to leverage all of the development and debugging that we do internally for the HAL. The ath9k team (both public and internally) need to do a lot of manual inspection and coding in order to pick up fixes and features from the internal driver. Since there's one of me, I'd rather optimise my time (read: get some sleep at some point.)

Then there's the third point that I didn't mention above:

I want to see how feasible it is to do snapshots from our internal codebase and push those out, rather than having to maintain a separate driver tree (sometimes based on the internal driver tree, sometimes re-implemented) and all the associated complication there.

This bit is pretty important. There's plenty of code I didn't want to open up. The bulk of the AR9300 HAL is already open sourced via the ath9k driver in Linux. So for the most part I'm open sourcing what we already have open sourced. However, I want to try and streamline the process for taking internally developed code and push it open.

This involves a few things.

Firstly, how much of the internal driver code is written with the idea that it's going to appear in the public eye? It depends what you think of as public - are your company developers "public" ? Are your customers with source code "public" ? It may not necessarily be "the general community." When you're writing code that's eventually going to be open sourced, you may need to make some decisions about how you structure your code.

For me, it was (mostly) easy. A very large amount of the "stuff that shouldn't be released" was already wrapped up in #ifdef's - stuff like emulation code, for example. So the public HAL snapshot is actually missing a lot of code that our internal version has. All I did (heh!) was pass it through 'unifdef'.

Next is whether the code is nice to look at. Is it formatted well? Is it well designed? Does it compile without warnings? Even on clang? These should be thought about whether or not your target audience is public or not. It's just good design. Companies may be worried about exposing the code, as if it will show badly on them. Well, yes, you should. But hey - we the open community would rather you release the code and take constructive criticism instead of keeping it closed. Who knows, it may actually help you!

The Linux upstream push is actually good here - the Linux system maintainers don't take "bad code". They hold the developers to a higher standard and this is forcing companies to think a bit more about how they develop things. Now, whether companies view this as a cost-centre or a benefit is not something I wish to discuss here. The point is that by working in the Linux upstream community, companies are being forced to tidy up their game a little.

Ok, enough of the back-story. How'd it actually all happen?

The short version - there was API drift, yes. There was a bunch of driver layer stuff that needed to happen. But it wasn't terribly painful. It required me to clean up the driver a bit and implement some nicer tools.

The long version:

There was an internal attempt to partly convert the HAL code internally over to a format that is Linux-upstream compatible. This involved a variety of formatting changes - function names and indentation changed. It also involved a variety of variable / method changes - eg halMciSupport became hal_mci_support. The boolean type changed - HAL_BOOL and AH_TRUE/AH_FALSE became bool, true & false. These needed to be renamed back to the HAL style before I could make it compile.

FreeBSD stripped out the HAL_CHANNEL stuff from its HAL, replacing it with a direct reference to the net80211 type (struct ieee80211_channel.) This made things slightly tidier but it did put an external dependency on the HAL. I may end up going through the FreeBSD HAL and undoing this at some point; but it's a big job.

A variety of APIs changed over time. Although the bulk of the APIs stayed the same, they grew parameters (eg 11n TX and RX antenna and chain configuration); the TX descriptor APIs now take a list of TX buffers rather than a single TX buffer, and other random other things.

So, what was I going to do?

My first cut was to just take a snapshot of the HAL and rename / shuffle things around enough to make it compile.

The first thing I did was to create a set of HAL stub functions. All the stub functions did was print out their method name and return. This way I wasn't surprised by a NULL pointer dereference when the HAL or driver called an unimplemented method - I'd get told which method was being called.

I started with the bare minimum code needed to support probe and attach - which required a surprising amount of code to be converted over. But it was mostly mechanical work. And it worked - enough to get things probing and attaching. I didn't bother with frame transmission and reception just yet - getting probe/attach was enough.

Then I realised that I wanted to this in a git branch, so I could import future versions of the HAL into master and then merge it into my branch. That's what I did. The HAL from 10.x was in master, and my FreeBSD port lived in 'local/freebsd'.

Next was figuring out whether to rename/fix API functions, or to use glue functions in order to deal with API differences. I've fixed some API differences (eg the reset path), but I ended up using a lot of wrapper functions to get the APIs to line up.

The important bits to bring up (in rough order) in order to see whether things are working:

The radio configuration path (ie, programming the analog section with the right frequencies, channel width, filter setup and such);

Interrupt handling;

ANI support;

RX path.

The RX path was the important bit. Once frame RX was working, I could do things like run the NIC in monitor mode and verify that HT20 and HT40 were working. And yes, that's pretty much what I did.

But at this point, the RX path exposed the first major API change - the whole FIFO setup that the AR9380 and later required. They don't support the list-based TX and RX that previous NICs supported. (Well, they _kind_ of do on the TX side, see below.)

The major change here required in the driver is that the RX descriptor is actually in the same memory area as the RX buffer. Ie, the first 'x' bytes of the passed in buffer is where the NIC DMAs the RX completion information to. Previous NICs have two areas for each RX frame - a RX descriptor area and an RX buffer area. Descriptors are in non-cachable memory, so I had to teach the FIFO RX path to support descriptors in cachable memory. I also had to teach the RX path to "skip" the 'x' bytes in order to hand the start of the data payload up to the net80211 stack. Finally, there's two RX FIFOs - one for high priority frames (beacons, uAPSD frames, PS-POLL frames, etc) and low-priority frames (everything else.) I had to teach the stack about this.

So, you can see the changes to the RX code - there's now a set of methods that implement RX - stop, start, flush, descriptor processing. The legacy routines stayed where they were. The new routines just overrode those methods.

And with that, RX came to life.

Next was TX. TX is a bit more special. There's only 8 TX FIFO entries per hardware queue (QCU 0..9); so I can't just push all the frames I want into the list. I also have four TX data buffer pointers per descriptor, rather than one per descriptor in the past. Finally, the TX status FIFO is completely separate from the TX FIFO itself - legacy chips would put the TX status at the end of the final descriptor in a frame.

This required some pretty significant refactoring of the TX path in order to expose the correct hooks to do this all properly. I won't go into the details here - suffice to say that I'm still working on it.

The next problem with TX was figuring out exactly what TX descriptor flags I was setting incorrectly. I eventually gave in and wrote some ALQ based logging which dumps the TX and RX descriptors into an ALQ log which I can then read from userland. This made it very, very easy to inspect what was going on - I was even finding bugs with the earlier chipset code!

Initially I used this to discover I wasn't correctly filling out all four buffer pointers in each TX descriptor. I can't leave any NULL if there's more descriptors for a given frame.

Then I used it to discover whether I was setting up the general flags right - TX chainmask, TX rate, duration, etc. I (re) discovered a hardware limitation with the AR9380 - I need to pad aggregate frames that use RTS with a little more pad delimiters or the transmission underruns. I was able to take these text dumps and give them to the Qualcomm Atheros MAC/PHY team for assistance and they were very impressed by the sophistication of my debugging tools.

Now I have the TX and RX side working. I pushed all of the driver side code into the public FreeBSD repository. I promised people that I would eventually open up the HAL side of things, but I figured that keeping the driver side of this closed was just plain silly. It also meant that if I did stop working on things (for whatever reason), the driver side was done - all that would need porting was the HAL.

Then I began the internal process to get the HAL opened up. I won't go into this in too much detail - suffice to say it took some engineering and legal review to get approval for this. The approval came in about two weeks ago and I pushed the repository into github shortly afterward.

Shortly after that, people started testing it and filing bugs. This part made me happy - there's a few small bugs that are actually in our 10.x mainline tree. I'll be pushing fixes back into the internal driver tree soon.

So, what's next?

I need to push the repository into a vendor branch in FreeBSD, then merge it into the kernel tree so it can be compiled by default.

I then need to get an updated version of the HAL approved by legal/engineering and push that update into the public git repository. Once that's done, I'll do a git merge into my branch and fix up whatever merge issues there are. This updated HAL includes some fixes for TX power and the AR95xx embedded SoC that we've just released. I hope to try and do a HAL update every month or two based on what bugs and features are introduced into the internal mainline driver.

I still have a bunch of driver work to finish up - notably I need to finish optimising the TX FIFO path in the driver and I need to implement MCI support. But the driver is now usable for me at least and I hope it'll become increasingly usable by others.