We All Counthttps://weallcount.com
Project for equity in data science.Fri, 22 May 2020 16:52:32 +0000en-US
hourly
1 https://weallcount.com/wp-content/uploads/2018/11/cropped-Favicon-File-for-Wordpress-32x32.pngWe All Counthttps://weallcount.com
3232Framing Research Questions that Reflect Who is Expected to Changehttps://weallcount.com/2020/05/22/framing-research-questions-that-reflect-who-is-expected-to-change/
https://weallcount.com/2020/05/22/framing-research-questions-that-reflect-who-is-expected-to-change/#respondFri, 22 May 2020 16:49:14 +0000https://weallcount.com/?p=3220The post Framing Research Questions that Reflect Who is Expected to Change appeared first on We All Count.
]]>

We were involved in a project trying to understand the impact of a new school board program. The program was intended to reduce LGBTQ+ bullying and improve the inclusive climate within the schools. The boards wanted to know how much improvement we saw in LGBTQ+ students feeling comfortable at their schools. This is a great outcome to study, but they initially crafted their research question like this: “Do LGTBQ+ students now feel better about school?”.

At first glance, this is a fine research question. It aims to be equitable and inclusive, but it’s actually problematic. It frames the research around changing the students’ feelings rather than around changing the school environment. Feeling negatively about an environment that is hostile towards you is a healthy, appropriate feeling that the students might have. We were trying to directly accomplish a change in our school environment, with an outcome of increasing student comfort. This was always what they meant, it just wasn’t well reflected in the research framing. We needed to craft a question where the responsibility for the ‘success’ or ‘failure’ of the program was in the most equitable place. In this project we went with: “Has the school environment become less hostile to LGBTQ+ students?”

At We All Count, we like to think of the different variables involved in a project as puzzle pieces that can fit together well, poorly, or not at all. When we design research questions we try to decide which piece to rotate, shift or swap out to see improvement or ‘positive change’. Choosing which piece to study is easy when you ask ‘What’s the most equitable piece to adjust?’.

Consider the research question from another project: “What factors and trends are causing the vulnerable Indigenous children in Australia to have poor health outcomes relating to burns?” Is this an equitable research question or world view? The expectation that indigenous populations change to adjust to ‘the system’ rather than the other way around reflects a common colonial worldview and reinforces an underlying assumption that non-indigenous people ‘figured out’ how to use a system rather than the reality in which that system was constructed specifically to work for them. Acknowledging this allowed the question to be reframed to examine how the healthcare system better supported white children with burns and how that effectiveness could be extended and altered to serve a more inclusive group. The researchers in this case changed the research question to be “How can an understanding of the ways in which healthcare systems produce advantage and positive health outcomes for white Australians help improve Indigenous healthcare.”

Lastly, examine the research question: “How can we keep Hispanic boys from being expelled from our schools at a higher rate than non-Hispanic boys?”. This question arose out of another school district project aimed at reducing the rate of expulsion of a specific group of young men. At the outset of the project Hispanic boys and their community of parents were understandably resistant to the study. The question made it seem like there was something wrong with the Hispanic boys.

When the question became “What processes in our school are most strongly related with pushing out Hispanic boys?” and “What school characteristics are most strongly related with creating environments that encourage Hispanic boys to fulfill pre existing desires to remain in school?” the project suddenly saw broad support, was able to change methodologies and get more participatory engagement – so changing research questions isn’t just about the analysis stage, it fundamentally affects your entire project. In this example, it was always the school’s intention to change rather than dismiss the attitudes, habits and goals of any of their students, it just wasn’t coming through in the research question. Make sure that your research questions are embedded with the equity that you are trying to foster!

]]>https://weallcount.com/2020/05/22/framing-research-questions-that-reflect-who-is-expected-to-change/feed/0In Conversation with Catherine Harnoishttps://weallcount.com/2020/03/02/in-conversation-with-catherine-harnois/
https://weallcount.com/2020/03/02/in-conversation-with-catherine-harnois/#respondMon, 02 Mar 2020 18:23:06 +0000https://weallcount.com/?p=2911I have a chance to talk with Catherine E. Harnois about her ground-breaking book: "Feminist Measures in Survey Research". She shared what she's been thinking since writing the book and some tips and tools to help embed equity in survey-based data products.

When I was a graduate student, I was studying mathematical statistics and was interested in ways to use my work as a feminist. I took several women’s studies research methods classes and asked a variety of professors how to use quantitative statistics from a feminist perspective. Without exception, I was told it wasn’t possible. That “regression was deductive” or “quantitative statistics turns everyone into one patriarchal average” etc. knew this wasn’t all there was and I kept looking. Then I found Dr. Harnois’ book “Feminist Measures in Survey Research” and it felt like an oasis in the desert. I was so happy to find someone interested in some of the same questions. I was thrilled to find someone who knew some answers. Over the years, I’ve purchased many copies of this book and refer many of my students and colleagues to it. Now we’ve made it the first choice of our We All Count Book Club and Catherine was kind enough to do a Q&A with us.

Heather: We really like your examples of multiplicative models as a way to do intersectional analysis. This works well for the large national surveys in your example. What about with smaller surveys? We run out of statistical power fast. Suggestions?

Catherine: I really like this question, and all the others too! With this one, I think it is useful to think about the relationship between any particular study (small or large scale), and all of the other existing research and research possibilities in the future.

The way I think about this is that a smaller survey project is often able to involve more nuanced measures – measures meant to really get at a particular concept or experience or behavior among a particular group or groups. An example I really like is Evelyn Simien’s National Black Feminist Study (N=500, which is still on the large side but bear with me). She creates measures of black feminism and womanism and includes lots of attitudinal questions that attend to race and gender issues at the same time. So, extremely helpful for understanding black feminism and the intersection of race and gender more broadly. But the study includes only African American respondents, so we’re left wondering how these items, and the relationships among them, might differ, or not from other social groups.

So, applying this to small samples, I would say that one especially valuable aspects of projects with small samples is that the researchers can be really careful with their measures, develop contextually specific / relevant questions to assess the dimensions of life that they see as relevant, and then test their specific hypotheses or conduct their exploratory analyses. But then the question is, what makes this isolated, contextually specific analysis intersectional?

And there I would say that such studies can be intersectional in many ways. First, when developing the research question, hypotheses, measures, sample etc., the intersectional researcher will have considered the ways in which inequalities come together in this particular context. That intersectional theoretical framework informs the research design. Then, once analyzed, that same intersectional framework informs the interpretation of the specific findings and the way they are communicated. And, last but not least, if we consider this particular small-scale study in relation to other existing work in other contexts, we can see what patterns are similar, which are different, and again use an intersectional framework to contextualize not only the researcher’s present findings. The results from this single study can also provide a starting point for other existing studies, that may have inadvertently put forth general claims that in reality are specific only to a particular social group or context.

It is my view that a small-scale survey, administered to a small and even relatively homogenous group can be intersectional, as long as it is designed and interpreted through an intersectional framework. Curious to know if others thinking about these things agree!

Heather: How should we deal with dynamic sociodemographics such as gender and sexual orientation?

Catherine: From a sociological point of view, almost all sociodemographics are potentially dynamic, and also contextually specific. Think age, education, certainly income and work force participation, parental status etc.

That said, I think there’s several different ways of acknowledging and/or modeling the dynamism of gender and sexuality. A few different literatures are relevant.

First is research on race and ethnicity. As you likely know, there’s an ongoing, complex global argument about whether surveys should even ask individuals about their race or ethnic statuses. One position is that doing so essentializes and reproduces socially constructed categories (this is the position taken by France and Australia, among other states, organizations and individuals, that tend not ask questions re. racial/ethnic identity in their surveys). The contrasting position (articulated by the American Sociological Association among other organizations, states, and people) is that if we don’t ask people about their racial/ethnic identities, then it’s very difficult to track inequalities.

With respect to the dynamism of gender identities, Aliya Saperstein (Stanford) and Laurel Westbrook (Grand Valley State) have been doing very interesting work on this issue. They’ve been investigating various approaches for measuring not only gender identity, but also gender performances. See Hart et al (2019 in Journal of Health and Social Behavior) for a good example. There is also quite a bit of older quantitative work that documents variation in gendered behaviors, attitudes and ideology too. Sandra Bem’s work is one example. Elizabeth Cole’s (Michigan) work also is excellent for intersectional (gendered*racial) identities and attitudes.

With respect to sexual orientation in surveys, some foundational work here has been done by public health scholars (and of course earlier on by Kinsey), who saw early on that it was important to draw a distinction between sexual identities and sexual practices. This is because identities might be more closely associated with some factors of social life more than behaviors (identities might predict activist behavior and involvement with social movements or community organizations), but sexual behaviors might sometimes be better predictors of “high risk behaviors” or particular health outcomes. This is especially true when we’re talking about stigmatized or marginalized identities.

A cool theme throughout all of this research is (race/ethnicity, gender, sexuality) is thinking about the circumstances in which these identities are likely to change, and the implications of that change for behaviors, or health or whatever dependent variable one might be interested in. Longitudinal research here can be especially revelatory! There’s a well documented phenomenon on racial/ethnic identities in Brazil, showing that as people move up the class ladder, they’re more likely to identify as white. There’s a tiny bit of research in the US (Penner & Saperstein) that shows this may happen in the US too.

Getting back to the bones of the question, I would just say that the way to deal with dynamic identities and statuses is to (1) acknowledge that they are dynamic (2) measure them in a way that makes sense for the research question (3) be clear about why the measurement is appropriate and (4) be clear about what you are implying by using this measure and what you are not implying.

I would like to add that sexual identities seem to be changing and expanding particularly rapidly, so it’s extra hard for large-scale surveys to keep pace and to assess the meanings and outcomes associated with very particular identities. And in large-scale surveys, often only a few people claim a particular identity. For example, the American National Election Study now includes a third gender option, but only a handful respondents choose this option. I think this is another place where smaller-surveys studies can be especially valuable.

Heather: How do we include sociodemographic identity in a model as a social institution, not a personal characteristic?

Catherine: So that I don’t totally confuse everyone, let me start by making a conceptual distinction between social statuses and identities. I’ll use the term “social status” to refer to a socially recognized category and its associated meanings / norms in a particular social context. Here I’m drawing from Judith Lorber’s Paradoxes of Gender. She defines gender statuses as “the socially recognized genders in a society and the norms and expectations for their enactment behaviorally, gesturally, linguistically, emotionally, and physically.” (1994, 30). She goes on to define “gender identity” as an “individual’s sense of gendered self.” She sees gender statuses as being part of the social institution of gender, and identities as part of gender at the level of the individual.

I think one good thing to remembers is that all levels of social life are interconnected and mutually influential. (See Risman’s 2017 article for a very nice discussion of this!) So, in my mind, it’s not really an either/or situation. When we use a categorical variable like gender status or sex in a statistical model, we are never just modeling a personal characteristic, but the combination of social institutions and cultural practices, institutional arrangements, that have produced particular gender categories along with meanings, expectations, rewards and opportunities attached to them. It’s never just the individual.

Now let me loop in research from sociological social psychology. This literature emphasizes that when the focus is individual identities, as opposed to social statuses, these identities have to be understood in relation to macro-level aspects of society. Identities are socially created, historically and contextually specific. When we think of ourselves in terms of this category or that category, this term or that term, it is at least partly because of the social created meanings associated with those categories and terms. Even if we disavow a particular identity, the identity that we are distancing ourselves from, and the meanings and hierarchical position of that identity, is culturally rooted.

An example of using a status-based variable but linking it to social institutions and interpersonal interactions : João Luiz Bastos (Federal University of Santa Catarina, Brazil) and I wrote a paper about workplace gender discrimination and sexual harassment, and its relation to gendered health disparities. We used data from the General Social Survey (GSS; 2002-2012) and used a really simplistic measure of gender statuses (the variable SEX), along with variables assessing perceived gender discrimination and sexual harassment at work. A few interesting points: First, I believe the SEX variable isn’t really even a survey “question” –the interviewers were told to “code respondents’ sex”. Second, it seems likely to me that interviewers were recording the respondents’ gender status, not their sex. So, some serious limitations! But, obviously the GSS is a really valuable survey for analyzing all sorts of things. So, we used this variable to look at gender gaps in health, and argued that:

“the present study illustrates one way in which quantitative analyses can clarify the processes through which gender inequality is embodied at the level of the individual. Rather than taking “a dichotomous classification of bodies as a complete definition of gender” (Connell 2012:1675), here we have conceptualized gender categories as socially constructed statuses, and we have interpreted them within the context of gendered processes (e.g., discrimination and harassment) and within gendered social institutions (e.g., the workplace). We contend, then, that the “gender effect,” to which we referred above, materializes in the form of a statistically significant regression coefficient due to capturing these and other complex social processes occurring at multiple societal levels. Were it not for the social construction of gender, based on social, cultural, and biological meanings of what makes gender, the aforementioned “gender effect” on health-related outcomes would likely not be consistent or even detectable through our analyses.”

Sandya Hewamanne (Essex, UK) and I have a forthcoming article at the journal Gender Issues called “Categorical Variables Without Categorical Thinking? A Relational Reading of the Sri Lankan Demographic and Health Survey” (DOI: 10.1007/s12147-020-09252-5)

where we tackle this issue directly. One of our big points is that you can interpret survey data in ways that recognize that social statuses like gender are socially created and relational. Another theme is seeing survey research, especially global survey research in the context of historic global inequalities and colonial legacies.

All that said, if we are really interested in complex identities, and here I don’t mean only the gender statuses that we identify with but the more psychologically complex ways in which we might identify as more or less masculine and/or feminine, or varying combinations of these things, or not at all, or how these identifications might change in various situations, then we could go back to Bem, and also to Saperstein and Westbrook to think about how to measure what we might call a “gendered sense of self” (e.g., masculine / feminine / androgymous/ non-binary) as opposed to gendered “social identities” of gender (e.g., man, trans, genderqueer, woman)

Heather: Measures of discrimination: single item or multi-item? How do we deal with the self-rating issue? Abused kids don’t usually know that they are abused kids.

Catherine: Depends on what we’re trying to measure, and how much space we have in the survey. If we are trying to assess whether or not someone feels that they have experienced “discrimination” AND thinks about it in those terms, then maybe a single item question is appropriate. But if we are trying to assess whether or not people have ACTUALLY experienced discrimination, regardless of whether they think about it in those terms, then we are probably in the land of multi-item. Not only because we need to ask about several types/forms of “mistreatment” (Unfairly fired? Unfairly denied a promotion? Unfairly not hired? etc.), and probably we need to also ask about why it occurred. But even with these latter questions, which we could call as asking about “perceived mistreatment” we are still in the realm of perception.

What you say about abused kids is right, but it’s true for adults too. Women are often paid less because of their gender and but in many cases do not know it because our culture discourages talking about salaries / wages (and I believe some workplaces explicitly policies forbid it). When people aren’t hired for a job – which obviously happens all the time – they almost never know why they weren’t hired. Self-reports are limited. Research about how question wording affects reports of sexual assault is really illustrative here.

Big picture, I would just reiterate that no one study, or even a single methodological approach can do it all. Methodological pluralism is really needed for understanding / analyzing these complex social issues.

Also, I would like to make a plug for research on the cognitive aspects of survey methodology (CASM), for example this book, which is all about how to ask questions that solicit the information that we are intending to ask about, how to identify when things have gone wrong, and how things like response options, question order and interview dynamics influence survey results. One important take-away is that things often go wrong, even when asking questions about relatively straightforward issues!

Heather: Do you have a successful way of communicating these complex results of your intersectional models? Our audiences – who range from policymakers to school district executives, to foundation directors – don’t understand the results from the models a lot of the time.

Catherine: I think this may be the hardest question for me to answer. In terms of writing, what I do is write for someone I care about, but who has no background in statistics or even the social sciences. For me it’s usually my mom, who is very smart and insightful and cares about social justice issues, but who is not a sociologist. So, I try to make my writing clear (reduce jargon and passive voice, grounding abstract claims with concrete examples) so that she and others will have a better reading experience.

I think images help a lot. Graphs as well as images of arguments / concepts / processes. I believe that interactive visualizations may be increasingly central here, particularly for the intersectional models, because users can see for themselves how things change when one or more characteristic changes. Alternatively, or additionally, making visual representations of two or three cases based on, for example, predicted probabilities or something similar. If we’re thinking about sharing findings with policy makers, foundation directors, we’re probably talking about smart people without a lot of time. So, my recommendation would be to visualize the key findings, which could easily highlight differences across groups, and which could make a strong impact in a short amount of time. Then have the details on hand if people want more information.

I think when writing about issues of gender and other social inequalities it is also very important to spell out the processes and institutional arrangements that lead to differences, rather than just reporting differences. This idea is emphasized in Maxine Baca Zinn and Bonnie Thornton Dill’s (1996) piece, “Theorizing difference from multiracial feminism” (Feminist Studies), which is among my favorite intersectional articles of all times. When we report differences, or really any findings, it’s important to provide readers with the context, conceptual tools, and theoretical arguments to make sense of them. Without these, it’s easy to walk away with static, essentialistic, and/or individualistic ideas of difference and inequality.

Another thing which is probably obvious, but I’ll say anyway, is that it is important to know up front who it is that you are writing for. And if you are seeking to share your results with multiple audiences, you should probably be prepared to communicate in multiple ways. This can be time consuming.

Heather: Models and Measures. Can you say a bit more about models? Does that mean to include measures of gender? On page 88 you talk about gender as a potential mediator – however, I don’t think you do a mediation analysis?

Catherine: You are correct! In the book there is no mediation analysis, although there are some formal mediation analyses in some of my more recent articles. Until fairly recently, I was using the word “mediate” in a rather loose way, in the context of interaction terms (as were many of my sociological colleagues). I hope this didn’t cause too much confusion!

Let’s say you are surveying 100 people out of 10,000. You want to analyse the data from your sample of 100 to get answers about the likely behaviours and preferences of the overall 10,000 person population.

Part of your project focuses on equity among sexual orientations. You don’t want to leave anyone out and you know that having a question about sexual orientation where people select ‘heterosexual or homosexual’ isn’t inclusive enough. You consult experts and the local community and decide to include ‘Heterosexual, Gay, Lesbian, Bisexual, Pan Sexual, or Asexual’ as options in that question.

Once your responses have come in, you have data from respondents across each of those categories, however only a few respondents identified as bisexual and only one person identified as pan sexual and asexual respectively. When trying to analyse the data to represent the responses of all these orientations, you realize that you have such a small amount of data from some categories that you can’t say anything statistically relevant about them, you can’t extrapolate the preferences and likely opinions about all Asexually identifying people in your population of 10,000 from one person’s data.

Rather than completely discount the categories in which you have very few responses, you decide it’s better to combine them into an amalgamated category, so that they can be better represented. When you publish your findings, you frame your results as Heterosexual, Homosexual and Other, the very thing you were trying to avoid. People are mad and hurt that they aren’t well represented and feel lumped into an ‘other’ category. Respondents who took your survey feel cheated by being asked detailed questions that you just combined anyway.

This kind of ‘collapsing’ or ‘amalgamating’ of data categories happens all the time and not just with sexual orientation. Almost all demographic questions are susceptible to being limited in the survey or condensed in the analysis; race, ethnicity, gender, language, etc. Imagine how difficult and how statistically useless it would be to list all possible spoken languages as an option on a survey. How can we be inclusive without making minority categories so small that only the majority data has statistical relevance?

Competing Priorities:

It’s important that the diversity among your respondents is given respect.

It’s important that the results you show be statistically meaningful.

Option 1: Collapsing

The first ethical issue when collapsing data categories after the initial analysis, for example into ‘Heterosexual/Non-Heterosexual’ is that it frames the categories so that heterosexual is normal and everyone else is “other”. Second, it categorizes your respondents in a way that they did not categorize themselves, removing the agency of choice that you offered them. The least ethical occasions of collapsing occur when people use a lot of inclusive categories on the public facing survey just to appear inclusive, covering their own butts with the public while planning to collapse the data anyway.

From a mathematical point of view, collapsing the sexual orientation into two groups is a problem because your results get a lot less accurate. The attitudes and behaviors might vary a lot between gay, lesbian, or bisexual respondents, which is important to measure and acknowledge. Your results will bury this if you report only on Hetero/Non-Hetero.

Option 2: Not Collapsing

If you have a bunch of data categories with a small number of responses it’s going to reduce the statistical certainty of what you can say about your overall population. It’s not acceptable to say something like “73% of Heterosexual and 88% of Asexual identifying people are in favor of the new law” when you have hundreds of responses from one identity and only one from the other. You have to report your findings with their statistical confidence which almost always corresponds to the number of respondents in that category.

Not collapsing therefore leads to an issue where only the majority categories, the ones with lots of responses, have strong statistical meaning. Your efforts to be inclusive actually weaken the voice of the least represented groups. Of course in many cases you will just have a difference in statistical confidence between groups. If your respondents are in three categories at 60%, 30% and 10%, you can still report on each of them just including the difference in their statistical weight.

So, what to do? Of course, as always, the answer is “It Depends.” It depends on the research question you’re trying to answer. It depends on what the people you’re working with need to know. It depends on how the people you’re collecting data from feel about their representation. Those are just a few of the factors.

What You Can Do:

Decide on how to deal with this before crafting your survey and before analysis.

Report your results in more than one way, including collapsed, uncollapsed, and hybrid perspectives.

Be transparent about the dilemmas, compromises and choices you are addressing with your data team, your survey respondents, and your audience.

Deciding how to approach this issue in the Project Design phase, before creating your survey or conducting your analysis is the first way to dramatically increase the equity of your project. If you decide that it’s most important to have three categories at the end that have a strong statistical confidence, you can design a system for that. Let’s say you’ve decided in advance to report the top three categories: the most respondents, the second most respondents, and a combination of all the remaining respondents. This gives you so many advantages. It will allow to still ask about more than two or three categories on the survey, increasing inclusiveness. It will allow you to not assume what those three categories are; you don’t know that heterosexual respondents will always outnumber bisexual respondents in all surveys. It will allow you to tell survey respondents about how you intend to analyse the data so they don’t feel mistreated when you do combine some categories. There are all kinds of ways to address this issue in Project Design including what questions to include, how to weight categories, how to report categories, and more. Deciding in advance about your projects systems and best practices will help you sidestep many equity issues.

You don’t have to report your findings in only one way. You can break out your results in one way that shows the strongest statistical meaning; maybe bisexuals, asexuals, and lesbians do feel the same way about an issue and combining their data gives their responses a stronger voice. You can then also show all the categories individually, while including information about the statistical confidence to show your audience that your survey did include the pan sexual orientation, even if you didn’t get any respondents in that category. You can create a hybrid where some of the categories are collapsed in ways you think are the most meaningful. Maybe you report the orientations in intersectional categories of their responses: most likely to say yes: lesbian, bisexual and pan sexual orientations; most likely to say no: gay, straight an asexual orientations. Offering your audience all the information increases their confidence in your reporting and methodology while simultaneously strengthening your results.

Lastly, you have to be transparent about this issue with all stakeholders of your project: the people working on it, the people involved in it, and the audience of your findings. Data science with humans in inherently full of difficult decisions and compromises. Don’t craft surveys that remove agency from respondents, don’t hide differences in statistical confidence between categories, and don’t conceal assumptions and choices you’ve made. This can be difficult because often you are inclined to do these things to protect equity. Letting people see how you’ve grappled with issues like this will only increase trust and true equity in your data projects.

We now have unparalleled access to enormous amounts of data, automatically generated and gathered, which represent sample sizes that are nearly impossible to replicate with traditional survey methods. People excitedly tell us that many major equity issues, particularly representation issues, will be a thing of the past now that we can leverage these massive data sets.

At We All Count, we agree that Big Data is a valuable resource but we think there are some very important concerns that Big Data alone won’t fix. We think that what’s really exciting about Big Data is the ability to combine the efficiency and power of large datasets with the intentionality of small, curated data samples.

What is Big Data?

The term ‘Big Data’ gets thrown around a lot with varying definitions. Is the U.S. Census big data? It is a large and comprehensive data set. Are large, international data sets amalgamated from a variety of sources, like U.N. or World Bank datasets, Big Data? Is live data from a mid-sized phone app Big Data, because it has a lot of data points?

These are the kind of datasets that have massive disruptive implications for our world. They are a shift in what’s possible on the scale of the Industrial Revolution. They also have some major equity issues.

Big Data Power

In data science, the statistical strength of a given analysis is often limited or supported by the sample size. If you want to find answers to a question about an entire population, you need to get data about a statistically relevant percentage of that population. Large samples are inherently expensive, making answers about huge groups of people very difficult to achieve. Much of the focus of modern statistics has been discovering and refining statistical methods to achieve high statistical reliability. This research has also confirmed that the quality of your sample matters equally to its quantity.

Imagine that you want to find out if people in your town prefer shopping at Walmart or online. For the same cost, you can either survey a well randomized sample of 100 people across your town, or you can camp out in the Walmart parking lot and ask 1000 respondents. You can clearly see that simply increasing sample size without any regard to equitable representation would dramatically skew your results.

Along comes Big Data, and we suddenly have enormous sample sizes available to us. Instead of using a tiny fraction of the entire population, Big Data offers us huge slices of our population to use as samples. For example a national Gallup poll about a presidential election might have 1500 respondents, (keep in mind that these respondents are very carefully selected and the statistical methodology used to interpret those results is very robust) while Facebook has live data available on around 244 million Americans. That’s a sample over 100,000 times larger.

The statistical strength that you can achieve with such an enormous sample size, paired with the up-to-the-minute nature of a lot of this data can make it feel like we can answer statistical questions with an almost prophetic certainty. Smaller companies, local governments and NGOs are incredibly eager to harness the power of Big Data, and rightly so as it can offer incredible insight into policy decisions, impact studies and effectiveness. The tricky part is that Big Data is always collected with a specific intention and by a collector with a specific mandate. Nine times out of ten that mandate is to make money.

Big Data fans see it as a silver bullet to equity issues due to its scale and its inhuman indifference to most equity issues. Amazon’s data collection algorithms are being adjusted to maximize profits, not to maximize sales to a certain race or gender. If Walmart discovered it’s data collection process was ignoring all potential female customers, it would be changed immediately. Also with a data set that might say include 30% of all U.S. citizens, it’s easy to feel like the sample size is so large that it must include at least some representation of all types of people in the population.

Two Issues with Big Data and Equity

Amazon generates data about Amazon customers. Phone apps generate data about people who own smartphones. Uber has data about people who ride in Ubers. Big data has an inherent representation problem when compared to a well crafted traditional sample: it automatically doesn’t include the people it isn’t concerned with.

Additionally, because the sample sizes are so large, Big Data concentrates the impact of the most prolific data providers. That means if you shop a lot on Amazon and take a lot of Uber rides, your data is being counted way more than someone who only has the resources to do those things occasionally, or never.

So representation and weight are two challenges to overcome with Big Data. Businesses have a mandate to make money, so that’s alright for them but how can someone who also cares about finding equitable solutions use this data?

The Best of Both Worlds

Let’s pretend you are the local government of the city of Toronto. You want to know where to expand your subway system. What new location makes the most sense for the greatest number of Torontonians? You have a very large dataset from your ‘swipe card’ entry system, so you can see all kinds of data from people on the subway, street cars and busses. You’d like to also supplement your information with Uber’s massive dataset: this will allow you to see a huge sample of citizens and where they are using a form of transportation other than transit.

You know that both of your Big Datasets don’t represent everyone, you don’t have info on people who take neither transit or ubers, like drivers, pedestrians or people who can’t afford either option. You also know that these data sets concentrate the impact of the most frequent users and will have to make sure you account for that statistically. You have a limited budget and you could spend it on an expensive but rigorous survey that gets good representation but a smaller sample size, or you could use the money to access Uber’s massive and statistically significant dataset, even if there are some equity issues, you might get a more firm answer to your question.

Or you can harness the power of both. What Big Data offers is amazing efficiency. Yes it is expensive to operate the massive systems that collect, store and analyse such a huge volume of data but the cost per data point is many orders of magnitude cheaper than traditional survey methods. These savings can be used to supplement Big Datasets and fill in representational gaps using additional statistical methods. We can use the money saved to conduct a smaller, more targeted survey that can get answers specifically from people who aren’t represented by our Big Data.

We can use the Transit Data, the Uber Data and conduct our own research in a more targeted way to make sure we’re making the fairest decision for all stakeholders, without ignoring the predictive or authoritative power of the large datasets. Ignoring Big Data today is like ignoring steam engines in favor of a horse and cart, the difference in power and efficiency is inarguable. On the other hand assuming that Big Data will automatically solve equity issues when it wasn’t designed to do so is wishful thinking. By focusing on equity and using the power of rigorous statistical methodology to flesh out and re-weight Big Data, we can have the best of both worlds.

]]>https://weallcount.com/2020/01/10/why-big-data-needs-small-data/feed/0Representation and Visibility in Quantitative Surveyshttps://weallcount.com/2020/01/10/representation-and-visibility-in-quantitative-surveys/
https://weallcount.com/2020/01/10/representation-and-visibility-in-quantitative-surveys/#respondFri, 10 Jan 2020 13:27:06 +0000https://weallcount.com/?p=2012The post Representation and Visibility in Quantitative Surveys appeared first on We All Count.
]]>

What is the role of quantitative surveys in shedding light on different lived realities?

Which choices made during the design of an impact evaluation affect who is represented and listened to?

Data collections can make structural inequalities invisible and these are all questions which we regularly ask ourselves whilst working as advisors in impact evaluations for Oxfam GB. We do impact evaluations as a learning and accountability mechanism to enhance programme quality. In doing so, we acknowledge that power matters in evaluation – and that the choices we make as evaluators are not neutral. And yes, for us too equity in impact evaluations matters!

Gender is one dimension of power which also intersects with other dimensions such as class, race, sexuality, ethnicity, etc. Gender shapes relationships, access to resources and decision-making, including at a personal level, within the household, and more broadly. Feminists – and feminist economists in particular – have said it for a while, and still, we have observed a tendency in data collection processes to focus at the household level as a unit of analysis without considering intra-household dynamics. Sometimes, survey protocols rely on hearing from respondents already in positions of power in the household, hence reinforcing patriarchal norms. From a statistical perspective, we have to be intentional in our sampling strategies to enable representation and visibility of different social groups, and make statistical analyses by social group possible.

As part of our Effectiveness Reviews, we have embarked on a journey of integrating a gender lens at the core of our impact evaluations. Which means – among other things – hearing from women and men! By doing so, we make sure to systematically look at gender differences and test for differential impacts of Oxfam’s programmes.

And in practice?

Well, we have tried two different sampling strategies. The first strategy is inspired by the Women’s Empowerment in Agriculture Index, and consists of surveying several household members, women and men, for individual surveys, within the household. As much as possible, the individual surveys take place at the same time to ensure privacy. Respondents are then brought together to complete the household survey. You can see an example of how this sampling approach was used in practice during an impact evaluation here, and we also shared a post on Oxfam’s REAL Geek series which you can read here in which we discussed the pros and cons of using this sampling strategy compared to the following one.

The second strategy consists of randomly varying whether to survey a woman or a man in each household. You can see an example of how this approach was applied in practice here. Before going into the details of the protocols we developed, it is critical to highlight here that we regularly work in contexts where a comprehensive roster of individual household members is not available, nor is it feasible to conduct the full listing prior to data collection. If you are carrying out evaluations where such rosters are available or can be collected, you will draw your sampling frame before carrying out the survey and you may want to skip to the last paragraph of this blog! If you want to hear more, please carry on reading.

As a first step, we define the main respondent irrespective of gender (for example adult household members involved in certain activities). As a second step, because we acknowledge that interviews are a social interaction and recognize the role of power dynamics – and gender one in particular – in such an interaction, two different protocols may be followed depending on the context in which the survey takes place and the content of the questionnaire.

The first protocol is enabled by technology and is irrespective of the gender of the enumerators. Using digital data collection, the survey software can randomly allocate whether the respondent of the survey should be a woman or a man, each time a survey form is open. In the second protocol, we want to match the gender of the interviewer and interviewee. One way of doing this is to randomly allocate the gender of the interviewer in charge of interviewing a given household, which will then determine the gender of the person to interview.

In both protocols, there are implications on the composition of the team of enumerators. We usually aim to have a team which is balanced equally in gender (half women and half men). There are also implications on the flexibility of the team during the survey. In a lot of cases, availability of respondents is not gender-neutral. This means that enumerators may have to come back at the time when the selected respondent is available to complete an interview. Also, in order to match the gender of the interviewer and interviewee the team may need to redeploy enumerators during data collection if the household was composed of no one who could be the main respondent of the randomly identified gender.

Is that it?

While sampling is critical to enable visibility and representation, the integration of a gender lens also means changing our measurement and analytical tools – we are currently working on guidelines so do stay tuned!

Of course, there are limitations, some of which have already been mentioned here, but let me add two more.

The focus on women and men in these analyses carries a risk of essentialization – fixing and naturalizing the meaning of social categories. Feminist quantitative social scientists have written about this (Sigle-Rushton, (2014)), and two co-workers and I touch on it here, while reflecting on feminist values in Monitoring, Evaluation and Learning practices at Oxfam GB. One way to overcome this would be to adopt intersectional analyses, to acknowledge that social groups are not homogeneous and that the intersection of power structures shape specific positions and experiences, in a given context.

]]>https://weallcount.com/2020/01/10/representation-and-visibility-in-quantitative-surveys/feed/0Result or Interpretation?https://weallcount.com/2019/12/03/result-or-interpretation/
https://weallcount.com/2019/12/03/result-or-interpretation/#respondTue, 03 Dec 2019 18:16:19 +0000https://weallcount.com/?p=1959The post Result or Interpretation? appeared first on We All Count.
]]>

]]>https://weallcount.com/2019/12/03/result-or-interpretation/feed/0Who is the Head of Your Household?https://weallcount.com/2019/11/07/who-is-the-head-of-your-household/
https://weallcount.com/2019/11/07/who-is-the-head-of-your-household/#commentsThu, 07 Nov 2019 14:51:38 +0000https://weallcount.com/?p=1692The post Who is the Head of Your Household? appeared first on We All Count.
]]>

Who is the head of your household? It probably depends on who you ask. In my household, you will either find out that I am the head of the household, my husband is the head of the household, or that my bulldog is the head of my household. It depends on how you measure. Is it who makes most of the decisions? Is it who makes the most money? Is it who spends most of the money? Is it who the waiter brings the bill to?

This is an important question that is a standard indicator used all over the world in data collection, especially by projects in developing countries. Whether or not I qualify for various programs, whether or not you think your project is working, and many more answers to research questions are entirely depending on who is the head of the household and how it is defined.

And yet we almost never know.

A lot of projects and donors and large organizations measure and report progress broken out or disaggregated by the gender of the head of the household. And there are many good reasons for this. When there are limited budgets, prioritizing households that might benefit the most from resources or projects makes sense. When calculating impact and progress, it’s very helpful to disaggregate findings along key social power dynamics in order to understand what’s going on in a more nuanced way.

From the World Bank:

“The household is regarded as the fundamental social and economic unit of society. Transformation at the household form, therefore, has impact at the aggregate level of a country. An increasing number of female-headed households (FHHs) in developing countries are emerging as a result of economic changes, economic downturns and social pressures, rather than as a product of cultural patterns. In many developing countries of Asia and Latin American, there has been a significant increase in the percentage of FHHs. The majority of women in FHHs in developing countries are widowed, and to a lesser extent divorced or separated. In the developed countries most female-headed households consist of women who are never married or who are divorced. The feminization of poverty – the process whereby poverty becomes more concentrated among Individuals living in female-headed households – is a key concept for describing FHH social and economic levels. The composition of a household plays a role in the determining other characteristics of a household, such as how many children are sent to school and the distribution of family income”

The World Bank provides microdata as part of their open data initiatives. They break out many of their indicators by Female Headed Households and Male Headed Households. When you look into the metadata this is what you find:

“The definition of female-headed household differs greatly across countries, making cross-country comparison difficult. In some cases it is assumed that a woman cannot be the head of any household with an adult male, because of sex-biased stereotype. Caution should be used in interpreting the data.”

We did a survey of sixteen different projects and did not find agreement between the projects or even within the projects on which of these households would be recorded as a Female Headed Household:

It’s extremely hard to find reliable information on how Female-Headed Household is defined in any given data product. And the results make a very big difference.

We’re working on a financial inclusion project in Uganda, Ghana, and Tanzania. And we’re conducting an impact analysis. The donor and stakeholders want the progress and the impact broken out between Female-Headed Households and Male Headed Households.

Using the data from our project we analyzed the data in three different ways. First, we split the households into Female-Headed Households and Male-Headed Households based on the answer the female respondents gave to an open-ended question about who was the head of the household. Second, we split the household into Female-Headed Households and Male-Headed Households by defining a Male-Headed Household any household that had an adult male that contributes over 50% of the household income, whether living in the household or sending remittances from a different location. Third, we split the households by calling a Female-Headed Household all households in which there are no men over the age of 16 living in the household.

In our current analysis, there are significant differences in results depending on which definition is used:

The way that we define the head of a household matters not only to the people being defined but also to the actual results of your research. Make sure you know what your definition is, and how it impacts the equity in your results.

]]>https://weallcount.com/2019/11/07/who-is-the-head-of-your-household/feed/3Supercharge your Averages with an Equity Gap Scorehttps://weallcount.com/2019/10/22/supercharge-your-averages-with-an-equity-gap-score/
https://weallcount.com/2019/10/22/supercharge-your-averages-with-an-equity-gap-score/#commentsTue, 22 Oct 2019 14:50:32 +0000https://weallcount.com/?p=1675The post Supercharge your Averages with an Equity Gap Score appeared first on We All Count.
]]>

When you need to know more than just a general average, you need an Equity Gap Score. If you have a mandate that includes equity between any category of people, whether race, sex, income, education, geography, or whether or not they like sugar in their tea, you need an Equity Gap Score.

An Equity Gap Score is simply a number that helps to contextualize a statistic. It will help add meaning to any data result and can help you track equity issues in any data project. At We All Count, we think they should be included as standard practice any time you see an average involving people.

They look like this:

“The city has an average yearly income of $54,000, with an equity gap score of 3.7 between the women and men”

Or

“The rate of death by drowning has fallen by 17% in the last three years, while the equity gap score has worsened from 1.4 to 1.8 between poor neighbourhoods and rich neighbourhoods”.

The first number tells you the overall information, and the second number tells you how the equity between categories of people is doing. An equity gap ratio of 1 is a perfect equity score, getting worse the higher the score is.

Let’s take a step back and make an Equity Gap Score from scratch:

Imagine we’re looking at literacy rates at a state level. Let’s say the adult literacy rate is 82% in this state. Only 18% of adults in this state have real trouble reading and writing. While it’s useful to know the average, that’s not enough information if we care about equity. Let’s say we’re funding a literacy pilot project and we want to understand if there is a literacy gap between different districts in the state.

We can easily calculate an Equity Gap Score from the data tables. In the data tables, we can see what district each respondent lives in. All we have to do is average each district and we’ve got some very useful information. To make an Equity Gap Score we just have to divide the rate from the worst district by the rate in the best district.

Worst in category / Best in category = Equity Gap Score for the category.

Let’s say the 5 districts looked like this:

District:

Illiteracy Rate

1

23%

2

12%

3

3%

4

5%

5

46%

District 5 (46%)/ District 3 (3%) =15.3

15.3 means that the rate of people struggling to read and write is over 15 times higher in district 5 than in district 3. It shows the gap in equity between the highest and lowest categories. You can make an equity gap score for anything you care about as long as you can get your hands on the data.

Ok, those are the basics, and maybe that’s enough info for you right now, that’s cool. We encourage you to ask about the Equity Gap Score the next time you encounter an average. If you want more concrete examples of how to use this tool or how this can help you, say, supercharge the UN’s Sustainable Development Goals, read on!

Equity Gap Scores For Trends

I’m currently working on a project with BRAC in Bangladesh supporting Rohingya refugees. We’re collecting data on food programs and looking at the number of children who only receive one meal a day, versus two or three meals. At the start of the project, 11% of all children in the study were receiving only one meal per day, while at the end that was reduced to 4%. A great improvement which meant fewer hungry kids. However, we also wanted to see how equitable the improvement was between families with different levels of income. We needed an Equity Gap Score to accompany the change in averages!

We divided the participants’ data into five quintiles of income and then calculated the percent of children receiving one meal per day in each. Turns out there was a big difference between the richest and poorest families:

All Families Average

Highest Wealth

Lowest Wealth

Equity Gap Score

Start

11%

7%

22%

3.14

End

4%

1%

9%

9.00

Yes, there was a reduction from 11% to 4% overall in the number of kids getting only one meal per day, but there was a significant increase in the Equity Gap Score between the rich and poor. This additional information helped us to see that the interventions of the program were more effective for one group than another and if we wanted to even things out, we’d need to do more work with the lower-income families if we wanted to bring the Equity Gap Score closer to 1.

It’s important to note that we didn’t assume that the greatest difference in meal amounts would be between the richest and poorest, we let the data speak for itself in determining how to calculate our EGS. In many cases, you won’t know what or where the equity gaps will look like and always calculating an Equity Gap Score to your averages will help you keep track of them!

The Big Picture – Looking for Equity in the SDGs

I was recently listening to a conversation between Winnie Byanyima, the Executive Director of Oxfam International and author Anand Giridharadas. Ms. Byanyima brought up the issue of equity – or the lack thereof – in the data being used to measure progress in the Sustainable Development Goals (the SDGs). She mentioned that while the infant mortality rate in the United States is lower than the infant mortality rate in Libya, a black infant in the United States has less of a chance of reaching their first birthday than a child in Libya.

Reducing infant mortality among children is a goal that UN countries have agreed to work towards and measure with data. The specific goal they have agreed to is:

“By 2030, end preventable deaths of newborns and children under 5 years of age, with all countries aiming to reduce neonatal mortality to at least as low as 12 per 1,000 live births and under-5 mortality to at least as low as 25 per 1,000 live births”

However, there is nothing in the SDG data indicators that track equity in this category – or most other issues that are being measured. Ms. Byanyima, who was part of the group of people at the SDG negotiations, attributes at least part of this to the fact that many people simply are not willing to include measures of equity in their goals. (This is specifically discussed around income inequality at around the 15-minute mark in the video.)

If the SDGs released their national averages with Equity Gap Scores across the more common areas of inequity – race, income, sex, urbanness, education level, etc. We could easily see that Libya has a higher infant mortality rate (11 per 1000 in 2016) than the US (5.6 per 1000 in 2016), but the Equity Gap Score between ethnicities/races in the US is very high!

Equity Gap Scores are critical information to understanding standalone averages and averages over time. They can be crafted to reflect any area of equity you care about. They can show progress when overall averages are moving slowly; “yes the average is the same, but look how much better the equity is!”. They can also be red flags when results look good, but certain groups are getting left behind. We want to see them everywhere that data is published. If we were donors, we’d demand them in every report and if we were CEOs, we’d expect them with every average.

If you want to talk about how to start using them in your work feel free to contact us!

Data is a commodity. Do your survey respondents feel data-rich?

One of the things that made me uncomfortable when I first started working as a data scientist was the power imbalance in the data collection methods I was being trained in. One of my very first jobs was to help design and administer a survey to rural women in Southeast Asia. I felt uncomfortable asking survey questions to these women while they put their lives on hold for almost two hours per survey. These women were mostly living on less than $2 per day.

I knew that the data we were collecting through these surveys was important and useful. Including to the women who were answering the questions. The analysis of the data was helping us understand which parts of our project were working, and for who, and sometimes even why.

I still struggled with the fact that many of the North American women I knew would not consider spending 2 hours with me answering detailed questions about their lives and jobs and incomes. Particularly not during their actual working day.

This discomfort helped found the We All Count project for equity in data. The Data Collection step in the Data Lifecycle has always been a core focus for me.

One of the things my team experimented with was to pay the women for the time it took them to complete the survey. We also offered to provide child care during that time. I’m proud of this step, however, donors and funders were concerned that this would skew the results of the data. And some donors forbid us outright from using any of their money to do this saying that it simply wasn’t allowed in their funding guidelines.

As the saying goes, “data is the new oil” and wealthy people are starting to get worried about the value of data and who owns their data; leading to a new focus on equity and data collection. This is a great opportunity to move the ball forward in how the social sector collects data and to rebalance the power dynamics in Data Collection.

Forward Thinking

Jaron Lanier has just published a series of videos in the New York Times addressing the issue of data ownership and the economic imbalance between “data creators” and the people making money off of the data. He’s talking specifically about e-commerce and social media and similar stuff – but the very same argument applies to data in the social sector.

He’s come up with a plan called “Data Dignity.”

The fundamental idea is that “You should have the moral rights to every bit of data that exists because you exist, now and forever.”

What if every person who contributed data to a social sector project retained ownership of that data? What if they received a small amount of money from the research reports, the impact reports, the journal articles, and the social media posts that included any result from their data? These products generate income for many people including foundation staff, non-profit staff, researchers, academics, and social media companies. That income could be shared with the people who are actually generating the data. It would be a step in the direction of rebalancing the equity in the Data Collection Life Cycle.

Jaron has a specific plan about how to implement this in the real world. And the steps he suggests would work in the social sector. They would work on the ground in most places I’ve worked.

What about our donor’s concerns that it would change the answers to the questions? There isn’t a lot of good research out there on this topic. Which is an important signal. The research that is out there seems to indicate that it would not be a problem. If anything, they found that many answers were more honest if the survey participants were compensated. This piece of research did find some changes in the reported level of income – but it’s inconclusive as to where these changes were towards accuracy or not. One difference with this research and what we’re thinking about is that these projects frame the payment to participants as ‘a gift’. This creates a further imbalance in terms of equity. In the new model, the compensation is not framed as a ‘gift’ but rather as an actual legitimate compensation for the creation of a valuable commodity – data.

The Social Sector is Different

It’s true that the social sector and nonprofit organizations have a different mandate than say, Facebook and Google. And paying participants for data may, in fact, be outside of your reach for the time being. We can still change the lens through which we see the participants giving their data.

In most of the existing literature about compensating for time spent generating survey data, it is framed as ‘a gift’, but if we adjust our perspective we see that the data being produced by our social surveys is an extremely valuable commodity; a source of fuel that runs an entire section of our industry. The people providing data are actually a large and significant donor base. These people should be thought of as donating their data to the nonprofit sector. What if we gave them tax receipts in appropriate countries for their donation? What if we treated these people as a donor class and were forced to invest in our relationships, pitch our methodologies, and defend our spending? What if we ensured that they were getting appropriate reports and updates in the same way our “top-tier” donor base is? This would potentially add a lot of equity to the data collection and ownership process.

What I Ask Myself:

Data collection equity can be improved through direct compensation, changes in respondent treatment, and a paradigm shift in who is considered a donor. There’s a touchstone thought I want all of us in the social sector to start thinking: “Would I take this survey? For free?”.

]]>https://weallcount.com/2019/09/27/what-is-data-worth-to-you/feed/0Symbols in Data Communicationhttps://weallcount.com/2019/09/12/symbols-in-data-communication/
https://weallcount.com/2019/09/12/symbols-in-data-communication/#respondThu, 12 Sep 2019 15:35:25 +0000https://weallcount.com/?p=1541The post Symbols in Data Communication appeared first on We All Count.
]]>

The way that data is communicated is inextricable from data itself. We should pay attention to how we symbolize, contextualize, and convey our data. Right now, data visualization is a hot topic, with new research and understandings coming out all the time. New mediums like interactive and dynamic graphics, easy to create videos, and live, up to the minute dashboards offer all kinds of exciting ways for us to get information to our audiences. In all the fascination with new forms, it can be easy to miss foundational equity problems in data communication.

At We All Count, we work on the cutting edge of data viz design and we have a lot to say about the equity ramifications of various styles, mediums, and methods of distribution. Today though, we want to start our conversation about communicating data at a simpler level: symbols.

All data communication is narrative. Even the most ‘objective’ or ‘academic’ chart has placed an interpretation of results into a very human framework in order to communicate it. That’s how it should be, humans tend to understand everything as a story. One of the most important parts of telling data stories are symbols. Applying meaning to shapes, colours, icons, scales, direction and motion allows us to get a ton of complex information through to the human brain. Of course, symbols aren’t universal and have to be understood by the audience – an often-overlooked area of equity issues for another article.

What’s in a Symbol?

The following series of images is a widespread visual metaphor about equity and equality. In the first image, the problem is represented: some people can’t see over the fence. In the second image an ‘equality’ based solution is proposed: a box for everyone. However, this doesn’t solve everyone’s problems and gives a boost to people who didn’t need it in the first place. In the third image people are given boxes according to their needs and a ‘equity’ solution is created where everyone can see over the fence.

INEQUALITY

EQUALITY

EQUITY

It’s a clear, resonant way to communicate an idea in pictures. We can see why it is so often reproduced. However, we’d like to use it as an example of the equity dangers of not examining your symbols deeplyenough.

There’s something off about the images that doesn’t seem to reflect the way we perceive the issues addressed in the metaphor. This is a good example of how even a very simple series of images is a specific story that, intentionally or not, indicates a specific world view. Like an x or y axis, the elements of these images are representative. The barrier to what the people want is represented by the height of the fence and the advantages that the people have in getting what they want are represented by how tall they are. Simple enough.

However, there’s an equity problem with this choice of symbols. When you use human height as an indicator of ‘advantage’ you are suggesting that some people are inherently disadvantaged in the way that some people are inherently taller or shorter. It suggests that different types of people need different types of help. This choice of symbol enforces a worldview that A) people are different and some are inherently disadvantaged and that B) people are the cause of their own experienced inequality.

If we reject that people are inherently disadvantaged, let’s not use the people’s heights to represent that and make them all the same height:

Sure, it feels more equitable, but it doesn’t really communicate anything useful. There’s a second issue with this visual metaphor. This time it’s not with symbol choice, but with symbol use. Having one shared fence suggests that everyone is facing the same barriers. If you believe that people are generally the same and that their barriers (institutional, cultural, legal, economic, historical, prejudicial, etc.) are different, then let’s show that with different fences. Oh wait, but we also lost our way to represent advantage once we made everyone the same height! Let’s show advantage with different ‘ground level’ starting points:

Now by changing what symbolizes what, we have an image that can be used to communicate the same, important idea but better reflecting our world view: that people are the same, but their barriers and starting points vary. When we want to see the differences that equality or equity-based solutions can have, we can visualize them. We have visually put the problems onto the barriers and uneven ground, rather than putting the onus on the people to figure out how to get taller. This isn’t a more ‘right’ or ‘correct’ way to visualize this it just better represents our world view and what we are trying to communicate.

That’s what we want you to keep in mind at the outset of our exploration of step 7 of the Data Life Cycle: Data Communication and Distribution. All communication is storytelling that couches your data in a specific world view and that even at the least complex, least technical mediums have equity pitfalls that need attention. We’re in the midst of a revolution in how we communicate and use data and we’re excited to discover the best ways to keep the world of data science equitable, fair and free.