12:33 pm - There's a common theme running through these, but I can't tell what it is1. I'm not going to shut up about this for a while: if you enjoy logic puzzles, whether or not you think you're any good at them, register (register! register! register!) for the online qualifying test for the World Puzzle Championships, taking place at 1pm EDT on Saturday 18th June. If you're positively predisposed towards puzzles - if you're enjoying the Su Doku craze - then don't worry about the competition and you'll still enjoy the test.

Practice is going well. Yesterday I resat the 2000 qualifying test which I hadn't looked at for five years. I scored 120 and it would've been 145 but for a copying error. In 2000, I scored 55 on it, with the top British score being only 90 and the top US score being 220. Perhaps having seen the puzzles before helps a lot, but this is pretty good progress. British folk, 55 was enough to get me onto the British team in 2000; if you can score anything like 55 on the 2000 test - that's just four or five puzzles in 2½ hours - then you really are UK team calibre.

2. At least four people on my Friends list make, or have made, at least part of their living by teaching people how to do well on the SAT, GRE or similar college-entry tests. Now this shouldn't be surprising because y'all are damn smart, but would you like to get to know each other? Is there a community where you can hang out, share tips, find employment in the field and so on? (And why don't you use the WPC online qualifier as a test for logical thinking skills?)

3. Google are providing patronage to students who write software for the good of the world through their Summer of Code promotion - and, if you're so inclined, you can get paid to work on LiveJournal. I can understand people's concerns about Google's privacy policy, they've buggered up the interface for Google Groups and Froogle was a bit naff, but in my book Google do so much good for the world that I have lots of love and time for them. Plus they're sponsoring the online qualifying test and the US team for the World Puzzle Championships.

4. 284,376 LJ accounts, according to the stats, list that the poster is based in the state of Massachusetts (a state with lots of smart people and puzzle fans, who might enjoy the WPC qualifying test). The number of people in Massachusetts with LJ accounts will be lower than that, because people may well have more than one account, but it will be higher than you would expect based on that, because there will be people in MA with accounts who have not listed them as being in MA. Accordingly, let's guess at 200,000 and regard that guess as conservative.

The population of MA is something like 6.2 million. Accordingly at least 3% of people in MA have a LiveJournal, and it seems likely that at least, ooh, 6%-10% of people in MA know what LiveJournal is. These are tremendously high proportions - LJ is approaching being mainstream! (Based on this post to lj_research.)

5. Talking of the stats, you might observe that there are 2.2 million LJ accounts registered male, 4.5 million registered female and 2.1 million registered unspecified. total: 8.8 million. However, there are a total of 7.35 million LJ accounts! What's the discrepancy due to? A vexing puzzle of the sort that you won't find on any online qualifying tests for the World Puzzle Championships.

I asked support and got an answer back quite quickly. It's probably impolite to quote, so you'll have to trust that I'm not misrepresenting the position when I say I was told that the million-and-a-half account discrepancy can be attributed to accounts that have been deleted and possibly purged in the past. Perhaps we can use this figure to estimate some sort of LiveJournal churn percentage. (It is not clear whether those deleted accounts are included in the 284,376 figure quoted above or not.)

6. The BBC report research suggesting that the difficulty some women have in reaching orgasm may be genetic and hint that it's possible that there might be drug therapy some day which could help those who have found that even the most desired partner (if any) and the best technique are not sufficient. This is entirely cheering news and I hope that some appropriate drugs without hazardous side-effects can be discovered. One would expect that such a drug would be bigger news than Viagra.

However, it does illustrate a double standard in me and I'm worried about this. I am not embarrassed by adverts for Viagra, but should such a drug treatment eventually exist, I can't imagine that adverts for it wouldn't be horribly embarrassing. I don't think this is a double standard of mine along gender lines, it's more that the concept of "do something you used to be able to do" is less embarrassing than the concept of "do something you've never been able to do and you feel you're less of a person because of it" - a similar product for men who've never experienced orgasm would be just as embarrassing. I don't know why I feel this way; perhaps it's because it's closer to a purely hedonistic drug than we have legally yet reached. (ETA: I think I've worked this out. See comment.) Cough cough World Puzzle Championship qualifying test.Current Mood:puzzling

Comments:

Actually, it's alleged that Viagra does have this effect on women, and there have been efforts to get this recognised and have the drug prescribed for same. (I'd google for articles if I wasn't at work.)

I was getting worried about how you'd tie #6 into the theme! I was reading an article on the same research in New Scientist, which reports that lesbian monkeys have lots of orgasms and that it's possible that the female orgasm is merely an accidental echo of the male one, the equivalent of male nipples.

it's possible that the female orgasm is merely an accidental echo of the male one, the equivalent of male nipples

I don't follow that. Men have nipples because the basic body plan includes them -- hormonal differences between the sexes make them develop into something useful in women but be suppressed in men. Is she saying that orgasms are part of the body plan but are entirely secondarily suppressed in the females of virtually every other species? It sounds kind of implausible.

Sounds very implausible to me on the basis that a decent (by which I mean "fun" not "acceptable to polite society") female orgasm presumably promotes reproduction and therefore increases the fitness of the species in an evolutionary context.

She makes the (I think valid) point that if that were so, then the present situation (as described in this report, ie hereditary factors cause 48% of women to never or rarely orgasm) could not have arisen -- because this trait would have been strongly selected against.

eh what? I don't see how your argument responds to mine, so maybe I've not stated it properly.

I thought you were saying above ("female orgasm presumably promotes reproduction") that you thought a woman's ability to have children was significantly influenced by whether or not she would achieve orgasm during conception.

This author is saying that if that were so, then a putative non-orgasming trait would have died out long ago -- because the women with it would have always been less likely to have children (and so pass it on) than those without it. She concludes that as the gene reportedly hasn't died out, therefore women's ability to orgasm can't make any difference to reproductive success.

I'm happy to believe that you see a flaw in this reasoning, but you'll have to spell it out for me in more steps...

1) All other things being equal, more sex means more sprogs which means the species is collectively 'fitter' in an evolutionary sense.

2) I conjecture that more sex is likely to occur when both partners enjoy it than if only one does.

3) Once conscious family planning becomes a factor, it's not clear to me that points 1 and 2 continue to apply. Therefore the trait ceases to have impact on selection and it's not clear that it would necessarily be selected out.

Hmm, I don't think that works, because if (1) and (2) are true, the trait must be eliminated long before (3) becomes a factor.

My rough calculation suggests that if (1) and (2) are true, then even if the trait is initially 99.99% prevalent and confers only a 1% reproductive disadvantage, it will be effectively eliminated in fewer than the first 1000 generations. Yet any sort of (3)-type effective family planning has only existed for a very recent fraction of human evolutionary history.

So my conclusion is that either (2) is incorrect, or some combination of (2) and the 'other things being equal' part of (1) is incorrect, eg. "reproductive" sex is distinct from "non-reproductive" sex in not being governed by (2).

This latter leads into my own favoured hypothesis, which is that female orgasm is a tool in mate selection -- its existence encourages women to select men who are likely to be generally attentive.

Of course, it's also possible that it serves no evolutionary purpose at all and is just a curiosity.

Alternatively, the measured heritable variation in ability to orgasm may be caused not by physical or direct psychological factors, but by heritability of selecting poor (inattentive) lovers.

My gosh, we were hopeless in 2000 weren't we? And yet we still won the Puzzle Ashes at a canter. Go us!

Google fight has this to say on the relative popularity of Sudoku and Viagra. We clearly have some work to do, but I reckon it's not going to be too hard for us to be up with Viagra in, umm, no time at all... *trails off*

We could probably do with a UK puzzle team web site - just who we are, how we fit in with everything else, who the teams were, how they did, how the team are selected, what prospective entrants can do to practice and what the latest news is.

Do you want to or should I, somewhere along the line? (Or maybe someone else? Alan, perhaps?)

Mmm - there are easy Su Doku and hard Su Doku. The easiest I've seen are the quick ones on the back of the Independent - they're easier than even the ones in the Sun. I lose patience before completing most newspapers' hard/difficult/fiendish/Friday ones.

The Maths Challenge is an excellent starting-point. The top 1000-ish people from that are drawn on to do the British Mathematical Olympiad, and there are second and third rounds from there leading to international competition. One four-time World Puzzle Champion took part in the international competition twice or three times, doing very well but not outstandingly well in it. We really ought to try to get the people who take part in the international competition for the UK involved in the World Puzzle Championships. I made some inroads into this, then... lost momentum.

I am not embarrassed by adverts for Viagra, but should such a drug treatment eventually exist, I can't imagine that adverts for it wouldn't be horribly embarrassing.

Right, I think I've worked out why "do what you used to be able to" is not embarrassing and "do this for the first time" would be likely to be embarrassing in, at least, a TV advert.

It's because there would be a need (or, at least, a perceived need) to explain why you should want to have an orgasm at all in the second case, even if it's only "find out what you're missing". Embarrassing when talking about orgasms, not embarrassing when talking about the ability to read, not only embarrassing but unthinkably annoying when talking about - say - even legal recreational herbs.

You might like to consider the adverts for Green & Black's chocolate - originally marketed as organic, with a fairtrade option, now repositioned as a luxury. A large poster on the side of a Tube tunnel shows a well-dressed woman looking out into the night from her house above the caption 'So that's what chocolate's supposed to taste like.'

Stats

Yes, it's probably impolite to quote, but I don't think it's impolite to link (especially when LiveJournal provides a special syntax for doing so).

Although the general ethos of LiveJournal support is "don't guess — only reply if you know the answer", I think you should be aware that the answer you got is a best-guess (on the basis that this is usually the right answer for such numeric discrepancies) rather than an unequivocal answer. Stats are somewhat of a black art, and I suspect no one fully understands how it works except the person who wrote it and a couple of people who have read the source code. I notice, for example, that the raw data contains a "postsbyday" figure for certain selected dates between 1997-11-12 and 2003-03-11 and then stops. There's also a big section of "supportrank" results which I believe is equally out of date.

Anyway, I doubt that the figure 8.8 million represents the number of accounts ever created. I noticed a new user being created the other day (4th June) and his user number is 7326971. This agrees very well with the 7361446 being quoted in stats.bml for the total number of accounts today, especially given that the raw data tells us that nine to ten thousand accounts are being created daily. So that figure almost certainly tells us the total number of accounts ever created, not the total number of accounts still in existence. Unfortunately this still leaves us with the mystery, which I think we will only solve by reading the source code.

Regarding the figures listed for each location, again I believe the numbers include deleted journals (or some other random error term). To pick a country at random, the raw data says that 953 users are from Barbados but the directory search results contain only 824 matches.

Computing the statistics (1/2)

Unfortunately the stats-related source code isn't that helpful, as it doesn't tell us what's in the database. Probably reading the source code for all the database-related stuff would help (though I know practically nothing about databases in general), but when it comes down to it only the on-site staff can tell us what's actually in there.

Anyway, stats.bml is the code which fishes the stats out of the database and formats them into a web page. Although the stats page claims "certain parts are live", those happen to be the sections (latest updates and newest users) which are currently disabled, so in fact all the data on that page is pulled from a database called "stats" which is generated nightly.

stats.pl is a program (using utility functions from statslib.pl) containing the statistics-related maintenance jobs to be run nightly (or, in at least one case, weekly — though that one hasn't been working since late 2003 and I don't know if it's that it doesn't work any more or just that someone forgot to install the weekly cron job). These fill the stats database with computations from the real database, and then print out the current contents of the stats database into stats.txt. I note that a configuration variable was added on 25 April 2005 to allow some statistics to be considered private and not dumped to the text file, which might explain how it is that stats on account types are no longer available (though it seems they were never on the main stats page). The precise value of this configuration variable doesn't seem to be visible from outside.

(where the results are also used to compute the "new accounts by day" and "users updated in last n days" statistics — note that a record is returned for each user even if they never updated their journal). Unfortunately this doesn't tell me whether deleted and purged users are held in the "userusage" database. However, they must be in at least one database because if you try to view their userinfo then LiveJournal tells you they have been deleted and purged. Clearly, users who have been suspended or deleted but not purged must still be in the database, though they have to be filtered out of the results of any directory search. Incidentally, it must certainly include communities, and I suppose it includes syndicated feeds too.

Renames are a slightly different matter. LiveJournal has to keep the old username around because it either pretends the old name has been deleted or forwards you to the new username (depending on what the user chose when they renamed). I can't currently find any "befores and afters", but it looks like you keep your old userid number when you rename, so I guess that your old name has to be assigned a new number. Unless, that is, you've renamed to a name that was deleted and purged. I'm not sure what happens in that case, but it would make sense for the purged entry to be removed entirely from the database (to be replaced by the user who renamed) when that happens. So, the total accounts statistic probably doesn't count the accounts which were deleted, purged, and then replaced by someone else — but this is pure speculation on my part.

The maximum value of userid is stored in the stats database and can be read from the text dump as "size accounts". The above total accounts number is stored as "userinfo total". In the current text dump, we have:

size accounts 7433711
userinfo total 7421711

which means that there are 12,000 completely vanished accounts (it surely must be a coincidence that this works out to such a round figure). It is left as an exercise for the reader to speculate whether this could be accounted for by purged-and-renamed journals.

Now the gender information is retrieved on a cluster-by-cluster basis from the "userproplite2" database. I've no idea how the clustering actually works or what this database is (or indeed the exact meaning of the SQL query). However, what happens is that the data for each cluster is saved in the partialstats database, and when this is complete the records from partialstats are summed and placed in stats. The code claims to count every possible value of gender except for '', and according to the text dump it comes up with four possible values: blank (with only one matching account), 'F', 'M' and 'U'.

I don't know whether it's relevant, but the clustered code asks for

c.clusterid IS NULL OR c.clusterid=?

(where "?" is the cluster under consideration). If there are any records with "c.clusterid IS NULL" then it looks like they'll be counted several times — once for each cluster.

However, let me speculate on where the extra 1.5 million users (not counting those with blank genders) are coming from. The stats database is never cleared out (I'm assuming this because there are several statistics in the text dump which haven't changed for years and aren't mentioned in the program). Suppose, when they occasionally rename some or all of the clusters, they accidentally leave the stats for the old cluster names in the partialstats database. When the code comes to compute the sum of any clustered statistic, it will include all the out-of-date info for the clusters which no longer exist and thus produce an inflated figure. Of course, I have no idea whether the clusters are referred to by name in the database or by some other identifier which would render my theory invalid, and since I don't have access to the database there is no way to check whether I'm on the right track.

Adverts

I don't think I find the Viagra ads embarrassing, but I still wish they would go away. (At least they've ditched the Pelé ones for the moment — now those were irritating).

I can't imagine a female orgasm pill ad that wouldn't be cringeworthy in one way or another. However, it wouldn't be the first sex aid to be advertised on TV. It's not that long since the KY heat-sensation-gloop ad that invited viewers to send off for a free sample. I am also reminded of the [some brand — I forget now] condom advert which basically showed some woman having sex with the cameraman. I'm glad I don't watch TV with my parents that often, I can tell you.

However, one of the worst "medical" ads on at the moment has to be the one which starts "When you had diarrhoea this morning, you had a choice." Um, no, I think you've mistaken me for someone else there.