Re: [Discuss-gnuradio] USB Issues

From:

Michael Dickens

Subject:

Re: [Discuss-gnuradio] USB Issues

Date:

Sun, 15 Jan 2006 17:38:53 -0500

On Jan 14, 2006, at 6:10 PM, Eric Blossom wrote:

Generally speaking, reliable throughput on the USRP is dominated by
the OS's ability to deliver USB packets with small interpacket gaps.
[snip] The hardware (if properly implemented), should be
able to drive the USB at full speed. [snip]

Under Linux, [snip] We keep the endpoint queue non-empty by
submitting multiple

asynchronous requests.

Agreed on all accounts (including the snipped stuff). My goal in my
FUSB code was to deliver / retrieve as much data as possible with as
little delay as possible, so as to keep whatever OSX internal
software and hardware pipes full. Moving from sync (in LIBUSB) to
async (in my FUSB) offers a substantial improvement - not a surprise
there. While I'm happy with a 4x increase in throughput, another
2-3x will certainly be useful by someone eventually. Bottom line
from the below discussion: I really can't think of anything else that
would speed up FUSB transfers under MacOS X while using the current
code-base. Thoughts? - MLD

-------

The ::write() code requires 2 parts: (1) the actual ::write()
command; and (2) a callback to deal with buffering. In (1), the code
finds an available buffer (blocks if necessary until one is
available), copies the incoming data into that buffer, then writes
the copied data to the async USB pipe. When this particular data is
written, a callback (2) is executed which checks to make sure the
correct amount of data was written, then makes the buffer available
for use again.

The ::read() code requires 3 parts: (1) the actual ::read() command;
(2) a thread running the async USB read code; and (3) a callback to
deal with buffering. In (2), the code gets an available buffer
(blocks if necessary until one is available), then calls the async
USB pipe to read the data; this is all done within a "while()" loop,
and thus happens as quickly as the thread can execute. When this
particular data is read, a callback (3) is executed which copies the
actual amount of read data into an intermediate buffer, overwriting
oldest data if necessary (and printing a warning). The actual ::read
() command (1) simply copies data out of the intermediate buffer,
blocking until any amount of data is available.

The "speed" factors are making sure that:

(1) there are enough buffers so that there is no blocking
(NUM_QUEUE_ITEMS);
(2) buffers are big enough to prevent blocking and overflow
(MAX_BLOCK_SIZE);
(3) each async calls transfer enough data to fill or clear whatever
buffers MacOS X uses internally (MAX_BLOCK_SIZE);

(4) each async USB data transfer call happens often enough; and

(5) whatever code is generating the data and calling ::write()
or ::read() gets enough CPU time to sustain the required data rate.

(a) Increasing (1) from 2 to 10 increases the throughput from about
24 MBps to 29 MBps. There is no increase beyond that. Still 41 of
the GR O/U's. Leave this at 10 for now.

(b) For (2): At 4*1024, there are numerous read overflows (from my
code) but no underflows (from my code); data rates are around 26
MBps, and # of GR O/U's is 41. At 16*1024, over/underflows (from my
code), but still 41 GR O/U's; throughput is around 29 MBps.
Increasing to 64*1024 or 640*1024 has no real effect on throughput or
over/underflows or GR O/U's.

(c) Increasing (1) to 1024 and (2) to 1024*1024 results in 32 GR O/
U's, and throughput drops to about 28 MBps. Interestingly, all of
the write underruns happen immediately (within 1 second) then the
rest go without errors (for about 3.5 seconds). The read overruns
always happen spread out, no matter (1) or (2). This is an absurd
example since we never want to allocate that much DRAM for USB
buffering.

(d) increasing thread priority in (4) and (5) doesn't make any
difference.

Because the ::write() is not through an intermediate buffer while
the ::read() is, but the results are identical for a given set of
parameters, this leads me to believe that the delays are caused by
OSX, and not (4) or (5). The primary way to decrease delays inside
OSX is to remove the extra CoreFoundation-layer calls by going
directly to the kernel. This removal would decrease the number of
required threads and eliminate the "RunLoop" requirement (as found in
LIBUSB, causing async calls to be effectively sync, and my current
code too but in a separate thread so that ascnc calls are really
async), which could only speed up the throughput.