work, particularly:
- at high rates (>120K req/sec) UDP replies get mismatched,
resulting in a steady stream of error messages and a trashed run.
- split off components (rqwheel, conn, dgram_ap, thread) into separate files
- documentation
- more testing, optimization
Even in its present state I was able to get over 150K UDP gets/sec with 2 threads at which point my 4-core memcached server starts showing signs of saturation. ~120K UDP gets/sec with a single thread.

Summary: stats are gotten via TLS, so there's no reason to assert verify the threadid when locking. This also resolves a bug where stats aggregation runs from any thread and therefore the assert would be wrong.
Reviewed By: marc
Test Plan: Ran in production before and got an assertion before. Ran in production after this change and the assertion did not come up.
Revert: OK

Summary: the cache_lock in memcached is now our "giant" lock and is a source of much contention. by making it adaptive, performance on an 8 core system with 8 server threads goes from 240,000 gets/s to over 300,000.
Reviewed By: sgrimm
Test Plan: ran stress test
blasted with both tcp and udp requests
Revert: OK

Summary: Remove the global stats lock and collect stats per thread and aggregate only when the stats command is issued.
The prefix, bucket and cost-benefit stats still use a global lock, but they are not enabled in normal operation, so they were not addressed here.
Prior to this change, on an 8 core system, peak req/s is around 160,000 with 8 server threads. With this change, peak req/s is 240,000.
Reviewed By: marc
Test Plan: Ran tester with asserts on and off. Ran libmcc against server and verified stats matched what libmcc tester reports.
Have not run in production yet.
Have not blasted with UDP other than using libmcc
Revert: OK

Summary: - A connection buffer group is created per thread. It gets the thread id so if asserts are compiled in, it can determine if it is being used by a different thread.
- conn_new and dispatch_conn_new no longer needs an initial read buffer size. it is meaningless in the context of connection buffers.
Minor change:
- udp support for stress test
Reviewed By: ps
Test Plan: ran stress test with asserts on for 1 hr with 4 clients that reconnect every 15 minutes.
ran in production with asserts on for 10 minutes (and still going...)
Revert: OK

Summary: maximize_sndbuf was broken in that it was expecting setsockopt to fail when being asked to set the send buffer to a size larger than permitted. it actually might not, so use getsockopt to observe the effect of the setsockopt.
Reviewed By: ps
Test Plan: ran with -vv
Revert: OK

Summary: when we reuse a connection, we don't save the ip address. therefore, we write the ip address of a previous connection instead of the current ip address.
some tweaking to the stress test loader to find the shared library.
Reviewed By: ps
Test Plan: ran with asserts on, spawned multiple connections and the assert did not fire.
Revert: OK

Summary: once we drop root privileges, we cannot open the maps file.
Reviewed By: ps
Test Plan: started memcached with sudo and -u nobody. did a stats maps. flooded memcached with sets to force it to do a brk. did a stats maps. diffed the outputs to verify that rewinding and rereading the file works.
Revert: OK

…don't fit in a small chunk
Summary: If we size small chunks too big, we're wasting space on small items. If we size small chunks too small, then keys that don't fit in a small chunk get autopromoted to a large chunk. That's pretty wasteful too. So this change allows us to split keys across multiple chunks.
minor tweaks:
- use memcmp instead of strncmp (so I can stop having to look up the definition of strncmp with respect to null termination).
- #defined the format string for stats_prefix_dump so we can get compiler warnings when the format makes no sense.
Reviewed By: ps
Test Plan: libmcc test/test.py passes.
flat storage unit tests pass.
stress test passes.
ran in production for a few days without any issues.
Revert: OK

Summary: What's life without a memcached diff in diffcamp? :)
There are two key changes:
1) here is that there is now only one LRU queue. There's no benefit to maintaining two queues, but offers simpler code.
2) do_add_delta takes a key/nkey pair instead of an item pointer. this has two key advantages:
a) it avoids us having to look up the item again when we don't do an update-in-place, since there is no race condition between the item_get and the add_delta locks.
b) it is enormously helpful when we allow the splitting of keys across data chunks, as now we have a contiguous copy of the key.
Other changes include:
- warnings are now errors, except "deprecated" warnings (generated by OSX, ugh) which are ignored.
- assoc_delete no longer requires that the item we delete be the one expected, since this assumption can be wrong when there are race conditions.
- prefix stats now take a key/nkey pair instead of just key, since keys are no longer null-terminated.
- append_to_buffer didn't properly reserve space for the terminator.
Reviewed By: marc
Test Plan: stress test passes.
libmcc tests pass.
flat allocator unit tests pass.
ran in production.
Revert: OK

Summary: - create a pool of connection buffers. these are giant buffers that are just mmaped. if we don't use it all, then we never fault in the page.
- track the maximum usage of a buffer. the client code must report how much of the buffer it used, otherwise the module assumes the entire buffer was touched.
- if a buffer goes beyond a certain limit, we throw it away back to the OS. if a global limit is hit, we also start reclaiming free buffers.
- when getting a buffer, the module will always return the buffer that was used the most. this allows us to minimize the amount of memory touched.
Reviewed By: ps
Test Plan: - ran the stress test against memcached with alloc_conn_buffer randomly returning NULL. libmcc reported a lot of errors (not surprising) but memcached did not crash.
- ran with freelist_check on, which ensures that the connection buffer free list is sane.
- ran with conn_buffer_corruption_detection, which takes every buffer returned from memcached, and marks it unreadable/unwritable. if memcached subsequently accesses this memory, it will segfault. this ran fine until the OS refused to give us the same page back, but it was at least a few minutes.
- the test/conn_buffer_test/* code is a stub that i never finished since the two checks embedded in the code are pretty thorough. maybe one day. :)
- this has run in production.
Revert: OK

Summary: item_walk had some bugs for extracting the middle part of an item. we never do that (we get the entire item for arithmetic/send/receive, and we get the end for memset/stamping), so this probably shouldn't actually make a difference. still, it is good to fix.
also added some unit tests for item_walk (which is how the bugs were found).
This includes http://www.dev.facebook.com/intern/diffcamp/?tab=review&revisionID=13628
Reviewed By: marc
Test Plan: all flat allocator tests pass
Revert: OK

Summary: Abuse the last 2 bytes of the UDP header to encode the number of udp reply ports the client supports. If the client supports less ports than the server has configured, use the receive socket to transmit the reply on.
While I'm here, make sure no dns requests are done in allocate_udp_reply_port.
Reviewed By: marc
Test Plan: Compiled libmcc without udp reply ports and responses were sent out on port 11300. Compiled libmcc with udp reply ports and saw responses come from configured reply ports. Set the # of reply ports to < # of threads in memcached, and the responses came from port 11300.
Revert: OK

Summary: For UDP on Linux, the transmit path is heavily contended when multiple threads are transmitting on the same socket. Allocate per-thread udp socket to sent response packets to.
Feature is only used when configured with --enable-udp-reply-ports and the -x option given on the command line.
Reviewed By: ttung
Test Plan: modified libmcc to not ignore packets from different source ports. ran load-mcc against memcache server and pushed server to 200,000 pps where before the limit was ~65,000 pps
Revert: OK

…thmetic ops
Summary: other changes:
1) added overlapping keyspace region, where both set ops and arithmetic ops will take place. this will exercise the code path that requires arithmetic ops to zero out values that can't be converted to a numerical value.
2) on a arithmetic op miss, execute a set (mimicking the /tfb/www/trunk behavior.
3) fixed memory leak with arith ops.
Reviewed By: marc
Test Plan: stress test for 5 minutes
Revert: OK