USB Bugcheck FE: BAD_URB (Double URB Submit) - What is it and how to avoid it?

I am Pankaj Gupta, a developer in the core USB team at Microsoft. In this blog I am going to talk about what a Double URB Submit bugcheck is. I will present a case study demonstrating how some real world drivers end up with this error. And of course I will discuss a solution for the same.

What is Double URB Submit Bugcheck?

A USB client driver and the Microsoft core USB stack interact using URBs. The client driver allocates an URB, initializes it, links it to an IRP, IRP1, and sends the IRP down the stack to the core USB stack. Sending the IRP down to the USB core stack transfers the ownership of the URB request to the USB core. While the USB core owns the URB request, the client driver must not touch the URB contents until the request has been completed back to the client driver.

However, we see instances where the client driver reinitializes and resends that same URB down to the core stack in a different IRP, IRP2, while IRP1 is still pending in the USB core. The USB core stack detects this and explicitly bugchecks. It is debatable whether the core stack could have taken a less aggressive action than a bugcheck, but that is not the topic of discussion in this blog. Though to describe briefly, one (of many) reason for the explicit bugcheck is that the USB core stack may have the URB put in a private linked list. If the client driver changes the URB contents, it may lead to this private list becoming corrupt, thereby causing an unexplained bugcheck later.

This is how the Double URB submit bugcheck looks like in the debugger:

BUGCODE_USB_DRIVER (fe)

USB Driver bugcheck, first parameter is USB bugcheck code.

Arguments:

Arg1: 00000002, BAD_URB The USB client driver has submitted a URB that is still attached to another IRP still pending in the bus driver.

Arg2: [pointer value], Address of pending IRP.

Arg3: [pointer value], Address of IRP passed in.

Arg4: [pointer value], Address URB that caused the error

Case Study

I analyzed some drivers that submit an URB twice thereby causing the BAD_URB bugcheck. These were WDM drivers and a pattern emerged. (WDF drivers do not suffer from this problem because WDF has better synchronization.) The issue was related to power management and work items synchronizing with power state changes. To understand this let's take an example. Look at the following code in a driver.

// The StartOrRestartRead Routine is called in the following situations:

// 1. PnpStart

// 2. Entering D0

// 3. On completion of read data urb previously sent to the core USB

// stack

StartOrRestartRead(...) {

status = IoAllcateWorkItem(...);

if ( !NT_SUCCESS(status) ) { ... }

IoQueueWorkItemEx( ..., WorkerRoutine_ReadData, ... );

}

// WorkerRoutine_ReadData sends a read data urb to the core USB stack

WorkerRoutine_ReadData(...){

// Initialize a Preallocated Urb stored in the device Extension

BuildMyReadDataUrb( ..., deviceExtension->Urb , ... ) ;

// Send it down to the USB stack.

IoCallDriver( ... );

}

// StopRead is called in the following situations:

// 1. Pnp Stop / Surprise Remove

// 2. Exiting D0 ( Going to low power state )

StopRead( ... ) {

// Cancel Urb (if-any)

CancelMyUrb(...);

}

Looks simple enough! However there is a problem with this code. Can you guess what it is? The problem with this approach is that StopRead routine does not synchronize with the work item that may have been queued. So the following scenario is possible:

Thread 1,2,3

Workitem A

Workitem B

StartOrRestartRead queued a work item (let's call it A).

The driver received a power IRP to go to D3.

StopRead routine was run and completed. [There was no urb sent to the USB core stack and thus there was nothing to cancel.]

The driver then received a power IRP to go back to D0, thereby calling StartOrRestartRead routine.

StartOrRestartRead queued a work item again (let's call it B). [Remember that A is still queued]

Work item A starts executing: calls WorkerRoutine_ReadData

BuildMyReadDataUrb( ..., deviceExtension->Urb , ... )

IoCallDriver

Now the deviceExtension->Urb request is pending in the core stack.

Work item B starts executing: calls WorkerRoutine_ReadData

BuildMyReadDataUrb( ..., deviceExtension->Urb , ... ) --> deviceExtension->Urb should not have been touched by the client driver since it is pending in the core USB stack.

IoCallDriver --> This URB has been sent down the stack already! And then it sends down the core USB stack! As I explained before this is surely not correct, and thus the core USB stack explicitly causes the double URB submit bugcheck.

The Solution

Now the obvious question is how to prevent such a mistake. I will give you two solutions for this: a simple solution and a not so simple solution.

Simple solution:

Switch to WDF! My friends, all you need is to setup a WDF continuous reader for your endpoint. WDF takes care of the rest. No need to schedule work items or worry about synchronizing with them. For details look at Working with USB Pipes [http://msdn.microsoft.com/en-us/library/aa490269.aspx] or more specifically WdfUsbTargetPipeConfigContinuousReader [http://msdn.microsoft.com/en-us/library/aa492612.aspx].

Not so simple solution:

However if you must use WDM, here is one solution. An important thing to consider is whether the driver really needs work items. Read URBs can be sent to the USB stack at DISPATCH_LEVEL, and scheduling a work item is not necessary. However, if the driver does need work items, a solution is presented below. In this solution we are going to use the following a state variable (Reading) and a notification event (ReaderStoppedEvent):

deviceExtension->Reading (a BOOLEAN): Reading will be set to true when the driver intends to continuously read from its device. Reading is set to false when the driver intends to stop reading from its device.

deviceExtension->ReaderStoppedEvent (a notification event): ReaderStoppedEvent is cleared when the driver is continuously sending read URBs to read from its device, and it is set when the driver stops this process and any active read URBs have been cancelled / completed. The StopRead function would thus wait on this event to get set.

NOTE: This solution doesn’t show how to synchronize between Pnp and Power irps ( e.g. Pnp Stop irp vs Set D3 power irp ). It assumes that only one instance of StopRead or StartRead can be executing at at time.

// The StartRead Routine is called in the following situations:

// 1. PnpStart

// 2. Entering D0

StartRead(...) {

deviceExtension->Reading = TRUE;

KeClearEvent( &deviceExtension->ReaderStoppedEvent );

StartOrRestartRead( ... );

}

// The StartOrRestartRead Routine is called in the following situations:

// 1. By StartRead

// 2. On completion of read data urb previously sent to the core USB

// stack

StartOrRestartRead(...) {

if ( ! deviceExtension->Reading ) {

// Set the Event indicating that the continuous read process has been stopped

KeSetEvent( &deviceExtension->ReaderStoppedEvent, ... );

return;

}

status = IoAllcateWorkItem(...);

if ( !NT_SUCCESS(status) ) { ... }

IoQueueWorkItemEx( ..., WorkerRoutine_ReadData, ... );

}

// WorkerRoutine_ReadData sends a read data urb to the core USB stack

WorkerRoutine_ReadData(...){

// Initialize a Preallocated Urb stored in the device Extension

BuildMyReadDataUrb( ..., deviceExtension->Urb , ... );

// Send the Read Data Urb only if we are not Stopped.

if ( deviceExtension->Reading ) {

// Send it down to the USB stack.

IoCallDriver( ... );

} else {

// Set the Event indicating that the continuous read process has been stopped

KeSetEvent( &deviceExtension->ReaderStoppedEvent, ... );

}

}

// StopRead is called in the following situations:

// 1. Pnp Stop / Surprise Remove

// 2. Exiting D0 ( Going to low power state )

StopRead( ... ) {

// Indicate the state change to Stopped, so that other threads (if any)

// can bail out

deviceExtension->Reading = FALSE;

// Cancel Urb (if-any)

CancelMyUrb(...);

// Wait for the Read process to get stopped

KeWaitForSingleObject( &deviceExtension->ReaderStoppedEvent, ... );

}

In the above solution we are ensuring that we don't exit our working state till we have totally stopped the process of continuously sending read URBs to the USB core stack.

As you can see the WDM solution is not exactly trivial. And this solution doesn't even show how to synchronize between Pnp and Power operations.

If you use WDF, you don't have to deal with these kinds of issues. WDF guarantees that only one of the EvtDeviceD0Entry and EvtDeviceD0Exit callbacks is called at time. WDF internally manages synchronizing Pnp and power operations. In addition, as I mentioned earlier you would be able to leverage WDF continuous reader.

Key Take Aways

A client driver must not submit an URB to the USB stack that has already been submitted and is pending in the USB core stack.

If your driver uses work items, it is important to synchronize those with power and pnp state changes.

By using WDF, you can avoid all the complexity around synchronizing pnp, power, I/O events and produce a simple, robust driver that will save you ton of money in development and maintainence.