How might one do this? A simple method is counting the number of (English) gendered personal pronouns on the page of the funny person, specifically, counting instances of “he” and “she” (as words alone not as part of other words). Since articles first mentions the person, and then refer back to him or her withthe words she, her, and he, his this can be used to determine the gender of the person in the article. Does this work? We picked a comedian at random, Bud Abbott, counted words, and the results are:

6 matches for “ he “

0 mathces for “ she “

22 matches for “ his “

1 match for “ her “

1 match for “ him “

0 matches for “ hers “

This result is very clearly in favor of the person in the article being male. There are various ways to deal with short articles having only a few instances of gendered personal pronouns. One could simply go by the majority of gendered pronouns. If no majority, ignore the result. This might introduce some measurement error, but it probably won't be a large effect. Another idea is to simply ignore pages with <5 gendered personal pronouns. Another idea is to just find the first gendered pronoun. Presumably, it will have the correct sex.

All the above methods should be utilized, so as to test which of them are best (test their inter-correlations), also in comparison with their computational requirement.

We note that this requires the comedian to have a page.

Stats

Stats are rather easy to do for this. We expect that a simple confidence interval will be sufficient.

Controls

It might be possible to control for age in the lists of funny people. This will be done by also gathering the birth date/age of the funny person. Perhaps there has been a change in the gender ratio over the years. Especially interesting will be numbers from people born after the 1970s because of the effect, if any, of second wave feminism.