Lessons Learned: Developing a production-quality USB stack

In late 2008, MCCI started the development of a USB 3.0 host stack for Windows in order to support silicon vendors who were developing USB 3.0 xHCI host controllers. This was a large project, involving significant engineering effort over a period of four years.

With the introduction of Windows 8, and the certification of our customers’ products for xHCI 1.0 compliance, the active development phase of the product is essentially complete, and we can look at the lessons learned over the active period of the project.

BackgroundIn order to be useful, a USB host stack for Windows must have several major goals. First, the stack needs to support all existing USB devices (including hubs). Second, the stack must support all existing USB device drivers for Windows. These problems are subtly different.

To support all existing devices, the host stack must first be able to successfully recognize (“enumerate”) every USB device when it is plugged into the system. Recognizing USB devices is a multi-step process, which involves recognizing that a device is present, assigning it a bus address, checking its power requirements against the capabilities of the USB port, and generating the plug-and-play identifiers that Windows uses to load the appropriate driver. Although each step can be coded according to the standard, it’s not sufficient to recognize devices that comply with the USB standard. The host stack must recognize any USB device that works with a Windows system and the Microsoft USB host stack. This requires work-around at practically every step of the process. For example, Windows sends an idiomatic sequence of commands, with characteristic timing, while enumerating a device. A surprising number of devices cannot accept any other commands, in any other sequence, because they’ve never been tested beyond verifying that they work with Windows. Even more troublesome, varying the timing to be faster (or slower) than Windows causes some devices to malfunction.

To support all existing USB device drivers for Windows, the host stack must faithfully implement all of the kernel APIs that are exported by the Microsoft host stack. This is troublesome in several ways. First, the Microsoft APIs, like most operating system APIs, are not formally documented. Error paths and the error codes returned for specific conditions are not completely specified.

Second, drivers are written by people, who occasionally misinterpret the documentation. Often, the error is material, but sometimes the code works anyway. For example, the Microsoft stack might ignore the error. Third, the Windows kernel environment is highly parallel and asynchronous.

Windows driver code is very sensitive to execution context. Although most drivers are robust, some drivers will fail if the host stack doesn’t interact with the driver code in exactly the same sequence and in the same execution context used by the Microsoft driver. In the original design of the Microsoft stack, these contexts were a consequence of other implementation decisions, often at a very low level in the code. In any other host stack, these contexts must be arrived at by design - the low-level implementations will necessarily be different.

In order to qualify for mass production with PC vendors, a third party host stack must further pass a barrage of tests. The stack must pass Windows hardware quality tests. Despite the name, these tests check the operation of the driver as well as the host controller, in a variety of circumstances. Suspend/resume and hibernation testing are particularly challenging, because they frequently involve interactions between Windows, the driver, the host controller, the attached devices, and the ACPI BIOS that comes with the motherboard.

The stack and the host controller must pass USB-IF interoperability tests. This involves a complex set of scenarios with a collection of roughly 150 reference devices. Finally, the stack and host controller must pass muster with the system vendors. This normally involves testing with tens of thousands of devices.