Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

which consisted of quarter-long, on-site projects where researchers come and hang out in our space for two days a week.

they write a two-page proposal, and we select 4-6 teams per quarter.

In each of these fields, my research interests are driven by this question. We like to ask researchers how much time they spend "handling data" as opposed to "doing science”. They say things like 90%, and they don’t even blink.

So my overarching research question is “How can we reduce this "data science overhead.”?

On which projects should we engage? How can we ensure fairness, accountability, and transparency for algorithmic decision-making? How do we ensure privacy? How do we avoid junk science?

3.
Predictors of Permanent Housing for Homeless Families
Project Leads: Neil Roche & Anjana Sundaram, Gates Foundation
DSSG Fellows: Joan Wang, Jason Portenoy, Fabliha Ibnat, Chris
Suberlak
ALVA High School Students: Cameron Holt, Xilalit Sanchez
eScience Data Scientist Mentors: Ariel Rokem, Bryna Hazelton
When homeless families engage in
services and programs, what factors
are most likely to lead to a
successful exit?
The DSSG team
• developed algorithms to identify
‘families’ and to identify ‘episodes’ of
homelessness including back-to-back,
or overlapping enrollments in
individual programs
• devised innovative ways to visualize
and analyze the ways families
transition between programs
The Gates Foundation, together with Building Changes have partnered with King, Pierce and
Snohomish counties to make homelessness in these counties rare, brief and one-time.

7.
Common trajectories lead to different outcomes:
• a successful exit from an episode would mean that the family found a permanent housing
solution
• a proportion of these still receive government subsidies
• other exits are exits back into homelessness, or to other, unknown destinations
Novel Analyses of Family Trajectories through Programs
An example using
Pierce County data

8.
How much time do you spend “handling
data” as opposed to “doing science”?
Mode answer: “90%”
10/6/2017 Bill Howe, UW 8

9.
My research for 10 years:
Making it easier to work with large, noisy,
heterogeneous datasets
• SQLShare: Easier to use databases
• Myria: Easier to use scalable systems
• Worked great in the physical sciences
• But social, health, and civic colleagues have
stricter requirements…
October 6, 2017 9

12.
“Should I be afraid of risk assessment tools?”
“No, you gotta tell me a lot more about yoursel
At what age were you first arrested?
What is the date of your most recent crime?”
“…and what was the culture of policing in the
neighborhood in which I grew up in?”
Technical.ly, September 2016
“Philadelphia is grappling with the prospect of a racist computer
algorithm”

13.
First decade of Data Science research and practice:
What can we do with massive, noisy, heterogeneous datasets?
Next decade of Data Science research and practice:
What should we do with massive, noisy, heterogeneous datasets?
The way I think about this…..(1)

14.
The way I think about this…. (2)
Decisions are based on two sources of information:
1. Past examples
e.g., “prior arrests tend to increase likelihood of future arrests”
2. Societal constraints
e.g., “we must avoid racial discrimination”
10/6/2017 Data, Responsibly / SciTech NW 14
We’ve become very good at automating the use of past examples
We’ve only just started to think about incorporating societal constraints

15.
The way I think about this… (3)
How do we apply societal constraints to algorithmic decision-
making?
Option 1: Rely on human oversight
Ex: EU General Data Protection Regulation requires that a human be
involved in legally binding algorithmic decision-making
Ex: Wisconsin Supreme Court says a human must review algorithmic
decisions made by recidivism models
Issues with scalability, prejudice
Option 2: Build systems to help enforce these constraints
This is the approach we are exploring
10/6/2017 Data, Responsibly / SciTech NW 15

17.
Closing thoughts….
• WA State has an opportunity to play a
leadership role in legislation around
algorithmic bias, fairness, accountability, and
transparency
• We have the private and public tech expertise,
the community engagement, and the political
will to address this issue directly.
• If we let the technology guide the policy, we’re
in trouble.