They want help designing a crowdsourcing data analysis project

My collaborators and myself are doing research where we try to understand the reasons for the variability in data analysis (“the garden of forking paths”). Our goal is to understand the reasons why scientists make different decisions regarding their analyses and in doing so reach different results.

In a project called “Crowdsourcing data analysis: Gender, status, and science”, we have recruited a large group of independent analysts to test the same hypotheses on the same dataset using a platform we developed.

The platform is essentially Rstudio running online with few additions:

· We record all executed commands even if they are not in the final code

· We ask analysts to explain these commands by creating semantic blocks explaining the rationale and alternatives

· We allow analysts to create graphical workflow of their work using these blocks and by restructuring them

Of course this experiment is not covering all considerations that might lead to variability (e.g. R users might differ from Python users), but we believe it is a step towards better understanding how defensible, yet subjective analytic choices may shape research results. The experiment is still running but we are likely to receive about 40-60 submissions of code, logs, comments, and explanations of decisions made. We are also collecting various information about analysts like their background, methods they usually use and the way they operationalized the hypotheses.

Our current plan is to analyze the data from this crowdsourced project using inductive coding by splitting participants into groups that reached similar results (effect size and direction). We then plan to identify factors that can explain various decisions as well as explain the similarities between participants.

We would love to receive any feedback and suggestions from readers of your blog regarding our planned approach to account for variability in results across different analysts.