While statistics of the editor gender gap rely on self reporting and are incomplete, the biography gender gap has rich statistics but is less studied. Furthermore, longitudinal studies (long-term and time-oriented), have less often been undertaken to show the rate and movement of the biography gender gap.

Instead of observing the trend of editorship, we will observe the trend of gender in biography articles. We admit that editor-gender and article-gender may not be related, and assuming they are is essentialist - still we claim biographies are worth investigating for their own sake. We have already prototyped research and found preliminary results that analyse the biography gender gap by date of birth, citizenship, and language. However this data only represents a single point in time, it would be more useful to sample these data many times and view the trends. Therefore we would automate the production and graphing of these statistics in a publically viewable website with open-data downloads, and at the end of a year provide a final report on the observed trends.

A view of the female ratio of biographies by date of birth, and by citizenship aggregated by world culture. How will these trendlines, especially recent years, evolve over the next years as editing demographics change?

The ultimate goal of the project is to raise awareness of the gender gap using statistical and quantitative means. The purpose of doing so is to frame the gender gap in a way that makes its thesis accessible to a demographic that tends to be more convinced with quantitative methods. We hope to have a larger group of people talking about the gender gap as a serious issue than currently do so. We also hope that researchers will download and re-use this dataset, bringing Wikipedia's gender gap even more insight. As a result, we hope more conversations will shift, as they have been doing, from whether the gap exists or is important, to what should be done in light of it.

Another more specific goal of our project is to try and see if different, currently enacted solutions to the gender gap are having an effect. Two large caveats come with this goal. The first is that we cannot run our data collections as an experimental observation to any specific remedy to the gender gap problem. Rather we can only see the aggregate effect of all gender-gap projects at once - and without any control data as to what would happen if no efforts were being made. A second caveat is that it is essentialist to argue that having more women-identified editors active on Wikipedia would necessarily mean that the representation of about-women biographies should increase. Still, despite these two caveats we think that more data gives a better view of what the current trends are on different Wikipedias.

I think this could be useful but I suggest changing the name to "Wikipedia Biography Gender Index Tools". Jane023 (talk) 21:01, 31 March 2015 (UTC) My reason is because only biographies are in this proposal, not editor gender (gendergap), or gender-specification (close to impossible) of Wikipedia topics (fashion objects: "Handbag" vs "Briefcase", professions: "Nursing" vs "Road construction", home furniture: "Vanity table" vs "Workbench").

Gender by date of birth and date of death in two different time frames. How will these proportions change?

An example of data re-use: A heat map of the "celebrity" ratio of biographies by gender, language, and decade.

Our project is a longitudinal study on measures of the biography gender-gap. To accomplish this we will create a weekly updated dataset and webpage, akin to stats.wikimedia.org and datavis.wmflabs.org, with views and highlights on how the compositon of Wikidata-gender-having articles are changing. We will sort the collected data by several other variables:

Date of Birth/Death

Citizenship/Place of Birth

Ethnicity

Occupation/Profession

Inclusion in Wikipedia Language

Creation date in each Wikipedia (experimental).

We will consult the community for the types of graph they think are most valuable, and suggest the following two as a starting point. One view will highlight of the current state of the distribution of gender, along with the direction and magnitude of change in each of the above variables. A second view will show "hot" variables which have had the most movement in the past week.

After one year we will create a report from running statistical tests to see if these measures had evolved significantly.

The budget is split into three buckets to be distributed among each team. Teams will take responsibility of splitting the work hours among themselves, with a 'bottomliner' for each position, to assume responsibility for the team's completion.

Our target audience are a mix of editors, academics and journalists; anyone who is interested in the various attempts to quantify the gender gap.

In terms of editors, we seek to inform all Wikiprojects which focus on biographies, regardless of whether they are gender-focused Wikiprojects or not.

In terms of academics, there has been previous research on the gender gap in Wikipedia, see for example ( Lam 2010, Reagle 2010, Eom 2014, Wagner 2015 ). Notifying that research community about new data made available would perhaps stoke more research on the problem.

In terms of Journalists, much has been made of the gamergate controversy, and - perhaps unfairly - the editorship of Wikipedia. With this a message can be sent to the media about Wikipedia's self-awareness of the gender gap in the easy-to-understand statistical terms that are usually potent for journalists.

As the saying goes, "what gets measured gets fixed." As we will be providing metrics on the biography gender gap, we believe that will be an impetus to expanding biographical coverage and improve quality. This will accomplished by nudging current editors, or by involving new editors.

Considering that the project and statistics are proposed to be automated, the project, as a website and dataset will continute to generate after the grant ends. Of course some small amount of maintenance is required to keep an "automated" process running, yet the website and machinery will be hosted on Tools-Labs, and thus open-source and more easily community supportable.

Another way which we expect the project to potentially be grown after the grants is to have more measurements included. For instance the Grants:IdeaLab/Examination_of_gender_in_biographies is already a similarly dervied measure that could easily fall under the umbrella of WIGI, and be generated and displayed automatically along side it.