= Problem overview =
Feb 06 02:47:39.469 [err] do_main_loop(): select failed: No buffer space available [WSAENOBUFS ] [10055]
If your Tor server is experiencing a problem with "[WSAENOBUFS]
[10055]" error messages while running Tor, you are experiencing
Bug #98. This is a well known, and apparently commonly experienced, bug with
running Tor servers on non-server versions of Microsoft Windows 98, ME,
2000, and XP.
The official Microsoft description for WSAENOBUFS is:
{{{
WSAENOBUFS
10055
No buffer space available.
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
}}}
The WSAENOBUFS is related to a buffer used for data before and after it
traverses the TCP/IP stack. As far as we can tell, there is no common
hardware or software platform for those who experience this problem.
Running a Tor server on a vanilla XP install does not (easily) trigger
the problem. But it can be consistently reproduced if you also run TCP/IP
intensive applications such as P2P clients (Bit``Torrent, eDonkey, eMule,
etc).
The result is that the activity overloads the TCP/IP stack. Since network
drivers share the same buffers, often the whole
network on the computer ceases to work, and it requires a reboot to fix.
= Things that are not the problem =
This error is entirely unrelated to the WSAENOCONN error WinXP Home
and Pro users commonly experience. The error messages are different:
WSAENOCONN causes Event Log entries such as "EventID 4226:
TCP/IP has reached the security limit imposed on the number of
concurrent TCP connect attempts". TCPIP.SYS in XP is hardcoded to
a limit of 10 half-open connections per second. A sufficiently
high bandwidth Tor exit server WILL experience this error, but
this does not cause Tor to crash (though it does cause some outbound
connections to fail, and eventually we should build some workarounds
for this). Speed``Guide.net provides a
[[http://www.speedguide.net/read_articles.php?id=1497|more detailed explanation]].
= So what IS the problem? =
We're not totally sure. But we have a theory.
First, some background. One of the ways Windows does networking with
lots of connections at once is with an approach called "overlapped IO".
Basically you hand it a socket, a length, and a buffer, and tell it
to either read or write, and Windows will take it from there and let
you know when it's done.
Quoting from
[[http://www.codeproject.com/internet/IOCP_Server_client.asp?msg=1187159]]:
{{{
With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the WSAENOBUFS error.
}}}
But Tor doesn't use overlapped IO: it uses the select() system call to
learn when sockets are available for reading or writing, and then uses
non-blocking writes and reads to send and receive data.
So our theory is that when we send() and recv(), Windows copies
the contents of the buffer into a kernel buffer. If we send or recv
too much at once, Windows runs out of kernel buffer space.
Our current plan is that we need to abandon select() on Windows in favor
of overlapped IO. This involves three steps. Step one is to
add overlapped IO support to libevent. (Libevent already has a notion of
a buffer api, so we could extend that.) Step two is to change the way
Tor calls OpenSSL, so it operates on local buffers rather than interacting
with the network itself (presumably using recv and send). The third step
is to change Tor's networking loop to use libevent's buffer API rather than
the socket API. If you'd like to help with any of these steps, let us know!
Another guess is that the loop around select() is buggy in the
Windows libevent implementation. Tor is the only high-performance user of
libevent on Windows as far as we know, so this is quite possible. Check
out the code here:
[[http://cvs.sourceforge.net/viewcvs.py/levent/libevent/WIN32-Code/win32.c?view=auto]]
= How to make it break less quickly =
You can try increasing the priority of Tor, Privoxy, and Vidalia in Taskmanager by hitting CTRL-ALT-DEL, going to the processes tab, and right clicking on each process and changing the priority to "Above normal". You can use Prio to make this automatic every time you start Tor.
You can also screw with the registry:
The following registry entries have been shown to mitigate the buffer
issues to varying degrees of success. As always, if you do not understand
the Windows Registry, and Reg``Edit, do not attempt these modifications.
Your mileage may vary.
At least one user has reported success by following the instructions
from [[http://web.ircsystems.net/codemastr/bufspace.html]]:
{{{
To do this go to Start, Run and type regedit. In the left pane navigate to
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters once
there, you must create the entry TcpNumConnections. To do this, right click in
the right pane and choose new from the menu and select DWORD Value. Give it the
name TcpNumConnections. Then right click it and select modify and enter a value
of 800. Then restart your computer.
}}}
There are a few TCP related registry entries that potentially manipulate
the internal buffer size available for data to be passed through the
tcp stack. Manipulating
HKEY_LOCAL_MACHINE\SYSTEM\Current``Control``Set\Services\Tcpip\Parameters\Global``Max``Tcp``Window``Size
and Tcp``Window``Size to 0xfaf00 (1027840) seemed to increase the time
to failure when running Tor and Bit``Torrent.
Configuring HKLM\SYSTEM\Current``Control``Set\Services\Tcpip\Parameters\Tcp1323``Opts="3"
also seemed to help the exit server last longer.
Setting this to "1" is another option as it doesn't remove 12-bytes from every header for timestamp placement.
However, Tor seems to have lots of odd packet problems on an exit server (as shown by ethereal, lots of re-transmits,
lost ACKs, etc), and the "3" solution seemed to quiet these things down. (Only packet headers were captured during the tests, not actual data.)
HKLM\SYSTEM\Current``Control``Set\Services\Tcpip\Parameters\Sack``Opts="1" is another helpful setting.
An experimental feature recently added to Tor that constrains the send and receive socket buffer sizes may also reduce or alleviate this problem. If your Tor version supports it, try the following option in your configuration:
{{{
ConstrainedSockets 1
}}}
= Some more data points =
It appears that a system with 384MB of ram or greater, a fresh install of Win XP Home, fully patched
via Windows Update, and solely running a Tor exit server does not experience
these problems. This is true for both 0.1.0.16-stable and 0.1.1.12-alpha
versions of Tor. The configuration of tor is a simple exit server with no bandwidth limits, burst restrictions, nor hibernation.
We continue to debug this issue. Recent tests show that total available ram at boot time correlates with the creation of the [WSAENOBUFS] error. The amount of memory available to the system was configured via the C:\boot.ini option of ''/MAXMEM=###''. The results are as follows:
*At ''/MAXMEM=128'', simply starting up the tor server was enough to create a [WSAENOBUFS] error.
*At ''/MAXMEM=256'', the tor server did create a [WSAENOBUFS] error. Time varied from 2-5 hours.
*At ''/MAXMEM=384'', the tor server did not create a [WSAENOBUFS] error after 6 hours.
*At ''/MAXMEM=512'', the tor server did not create a [WSAENOBUFS] error after 6 hours. Further investigation is needed at this memory level.
*At ''/MAXMEM=1024'', the tor server did not create a [WSAENOBUFS] error after 48 hours.
We've learned that Windows does allocate large chunks of memory per socket on connect. See this graphic of [[http://img.photobucket.com/albums/v16/yoitsmeremember/charts/macroshaft.png|Non-Paged Pool Behavior]] in Win XP. It appears we are consuming against a hard limit, unable to be configured through registry settings. [[http://msdn.microsoft.com|MSDN]] articles refer to a hardcoded algorithm in non-server editions of Windows that determine non-paged pool size at boot. At this time, the memory factor, along with heavy network usage, appear to be the causes of the [WSAENOBUFS] error.
= Alternative solutions =
Virtualization doesn't help solve the underlying problem, but perhaps helps build the installed base. For lateral thinkers, VMWare Player (available at no cost) can be used by Windows users to run Tor on Linux. In particular the Browser Appliance [[http://www.vmware.com/vmtn/appliances/|available here]] might be a good starting point for a web client. There are many other [[http://www.vmware.com/vmtn/|VMWare Appliances]] which may also be easily modified to use Tor.
The [[http://janusvm.com/|JanusVM]] appliance provides a transparent proxy using Tor on Linux inside VMWare. This could also be used for acting as a server with the usual configuration applied to the following files at console:
*Configure the JanusVM to use a static IP address instead of DHCP using the menu.
*Edit the /etc/tor/torrc file with desired Tor Server settings.
*Modify firewall rules in /etc/init.d/janus.routing-with-tor to accept incoming requests on the ports required.
*Reboot the virtual machine.
Please consider the [[../OperationalSecurity| Operational Security]] requirements of running a Tor server before deploying on a VM just as you would for any other type of host.