Monday, July 7, 2014

Lets have a look on how our traffic is XKey-scored and whetherits done with efficiency.The XKS source seems to be some kind of mangled-C++, just likea lot of C/C++-based languages exist for big/paralleldata processing (CUDA or other parallelizing extensions).Given that, DB is obviously some kind of nested std::map or apparently of a derived type, as can be seen by the apply()member which is not part of a STL map.Its probably not a multimap either, as denoted by the clear()and in that [][] assignments are not possible with multimaps [1].These types (as well as a multimap) are sorted associativecontainers (dictionaries) who's lookup complexity is guaranteedto be O(log(N)) at worst [2], where N denotes the numberof keys in the map. DB has at least 3 keys as seen from the snippet, but chances are that the number is much larger.Thelarger it is, the more need is for optimizing the map access.I doubt that XKS has their own implementation of dictionariesthat have a better O() and are optimized in a way thatDB["tor_onion_survey"]["onion_count"]access could be O(1). After all (look at the boost include), itlooks pretty much like STL-C++ code.Given that, inside a loop the following XKS code is ratherinefficient:

coder out there.Edit: Meanwhile I found another reason to avoid operator[]for assignments in a row inside one of Scott Meyers excellentbooks on C++ effectiveness [3] which I really recommend readingto any XKS developers (there are also classes for it).

[1] The clear() is important for our later optimization, asinsert() has the same semantics like operator[] assignment only if the key doesn't already exist - otherwise the assignment-step after finding the key won't happen with insert().