Jekyll2017-09-05T03:07:09+00:00http://ellenbroad.com/ellenbroadcomA space for writing about data, tech, and anything elseSpeech transcript 19 July Cranlana Programme: ‘How do we build ethical machines?’2017-08-30T12:00:00+00:002017-08-30T12:00:00+00:00http://ellenbroad.com/speech-transcript-from-Cranlana-Programme-<p><em>On 19 July 2017 19 July 2017 I gave a speech about data ethics to the Cranlana Programme, which was broadcast as part of ABC Radio National’s ‘Big Ideas’. You can listen to the full conversation on ABC Radio National <a href="http://www.abc.net.au/radionational/programs/bigideas/ethical-machines/8738670">here</a>. A transcript of the prepared speech (before conversation with the room) is below.</em></p>
<p>Google thinks that I like american football.</p>
<p>Google also thinks that I like combat sports and the blues, but it’s american football that confuses me most.</p>
<p>Who here has a gmail account? Have you ever opened your ad settings? You can see how ads for you are personalised, based on the kinds of things you search for via Google and watch on YouTube. And you can change them, or turn off personalisation entirely.</p>
<p>Google has correctly guessed that I like pop music, sci fi and - although I’m ashamed to admit it - celebrity gossip.</p>
<p>But have I been sleep-searching american football?</p>
<p>See, Google’s algorithm has assembled a figure representing me - a kind of data shadow - based on the data it has access to about me. Of course Google doesn’t know everything about me but it’s had a go at figuring me out.</p>
<p>Maybe Google thinks I like american football because I genuinely do like Friday Night Lights, an american TV series set in a small town in Texas that revolves around american football. And maybe Google’s algorithm can’t tell the difference between liking a show about fictional american football and the real thing.</p>
<p>In this context - showing me ads that I never click on anyway - it doesn’t really bother me. It does capture rather beautifully, though, the difference between my data shadow as Google interprets it to be and what the data actually reveals about me.</p>
<p>We use the term ‘artificial intelligence’ to describe machines that can perform tasks and make decisions that we used to think only people could make. While AI have been around in various forms for decades, the kinds of tasks and decisions artificial intelligence can make are quickly becoming more sophisticated and widespread, and it’s because of data.</p>
<p>Enormous, endlessly expanding, diverse, messy data.</p>
<p>Many of our interactions take place online now. We sign up to loyalty programmes and browse the Web. We pay our bills electronically and research symptoms online. We buy smart fridges and smart TVs. Sensors, mobile phones with GPS and satellite imagery capture how we move through the world. And our online lives leave thick data trails.</p>
<p>Data is powering automated cars, trains and planes. Automated systems learn from data to make lots of different kinds of decisions: about what we might like to buy online, when we could be at risk of getting sick. They decide who our potential romantic partners might be. The insurance premiums we get. The news we’re exposed to.</p>
<p>The rapid advances in AI have been exhilarating for some and disturbing for others. For me, it’s a bit of both.</p>
<p>Tonight I want to talk about three themes: <strong>access, control and accountability</strong>. Because within the question, ‘how can we build ethical machines?’ are profound structural and historical choices regarding data - how it is collected, who has access to it, and how it is used (or misused) - to be unpacked.</p>
<p>Because data if you like is what gives AI life. It makes it smarter. You can’t build smart machines without it.</p>
<p>And so we need to ask questions like:
Who has access to data? Who collects enormous data sources? What kind of organisations? And what responsibilities should they have? Do we as people have the ability to control and question automated decisions made about us? And, who gets held accountable when a machine gets it wrong?</p>
<p><strong>Because things can go wrong.</strong><br />
There’s lots of stories about AI getting into trouble.</p>
<p>The social media chatbot that quickly becomes horrifically sexist and racist. Updates to Google Photos that accidentally see black people identified as gorillas. The camera that recognises images of Asian faces as people blinking.</p>
<p>These kinds of glaring problems are typically picked up quickly. But sometimes the issues training AI out of biases and prejudice can be more insidious, and more troubling.</p>
<p>Joy Buolamwini, a computer science researcher at the MIT Media Lab in the US, has spoken about issues she’s had as a researcher getting robots to interact with her: to recognise her face, to play peekaboo.</p>
<p>But when Joy, who is black, puts a white mask on over her face, the robots can see her.</p>
<p>The problem here is poor data being used to train a robot about what faces look like.</p>
<p>Facial recognition software learns faces from big datasets of images of faces. If the images in what is called your ‘training data’ aren’t diverse, then the software doesn’t learn to recognise diverse faces.</p>
<p>A bit like humans really. AI is shaped by its environment just as we are. It’s impressionable. And so we need to take care not to encode biases within machines that we’re still wrestling with as humans.</p>
<p>In 2016, the first international beauty contest judged by AI - and which promoted itself as analysing ‘objective’ features like facial symmetry and wrinkles - identified nearly all white winners.</p>
<p>In the US, sentencing algorithms are being developed to predict the likelihood of people who have been convicted of crimes reoffending and so to readjust sentencing. One of these algorithms was found to falsely flag black defendants as future criminals at twice the rate of non-black defendants.</p>
<p>It’s not just race either: researchers from Carnegie Mellon University have discovered that women are significantly less likely than men to be shown ads online for high paying jobs.</p>
<p>In one machine learning experiment helping AI make sense of language, words like “female” and “woman” were closely associated by the AI with arts and humanities and with the home, while “man” and “male” associated with science and engineering.</p>
<p>In that experiment, the machine learning tool was trained on what’s called a “common crawl” corpus: a list of 840 billion words in material published on the Web.</p>
<p>Training AI on historical data can freeze our society in its current setting, or even turn it back.</p>
<p>If women aren’t shown advertisements for high paying jobs, then it will be harder for women to actually apply for high paying jobs. There’ll be less women in high paying jobs.</p>
<p>Robots that struggle to read emotions on non-white faces will only reinforce the experiences of otherness, of invisibility, that can already be felt by racial minorities in western societies.</p>
<p>The extent to which a person or an organisation can be held responsible for a machine that is racist or sexist is a question coming up a lot in AI debates.</p>
<p>On the one hand, there’s a fairly straight forward answer: people designing AI need to be accountable for how AI could hurt people. The hard part with AI can sometimes be figuring out when harm could reasonably have been prevented.</p>
<p>The creeping, quiet bias in data and AI can be hard to pin down. I have no idea if I’m not being shown ads for high paying jobs <em>because I’m a woman</em>. I don’t know what I’m not being shown.</p>
<p>As AI becomes more sophisticated, and depending on the technique being used, it can be hard for the people who have designed an AI to figure out why it makes certain decisions. It evolves and learns on its own.</p>
<p>Take my american-football loving data shadow from Google.</p>
<p>I don’t know how Google’s algorithm actually works, even though I can see all of the data being used to guess (because Google’s actually pretty transparent about it). And what’s weird is, of all of the topics Google thinks I like, there are none related to technology or data or AI. And yet every day - I can see in the data - it’s technology and data related stories that I’m looking at online.</p>
<p>Maybe the algorithm deduced that data is my job based on the frequency of my data-related searches, so I might not “like” it.</p>
<p>Or maybe it’s based its assumptions about what I might like more on my gender and age than what I actually search for. I don’t know what’s being weighted. I don’t really have a way of asking Google whether they can explain it either.</p>
<p><strong>What does ‘control’ mean - who can ask questions - in an age of machines?</strong></p>
<p>In the United States a class action lawsuit been underway for two years about cuts that have been made to Medicaid assistance for people with developmental and intellectual disabilities.</p>
<p>The decisions about where cuts would fall were based on a closed data model. When lawyers representing people affected by the cuts asked to see how the data model worked, the Medicaid program came back and said, “we can’t tell you that. It’s a trade secret.”</p>
<p>In California a defendant was jailed for life without parole in a case in which the prosecution relied on the results of a piece of software that analysed DNA traces at crime scenes.</p>
<p>When expert witnesses for the defendant asked to see the source code for the software, the developer refused, saying the code was a trade secret.
The language and expectations of business are increasingly intertwined with government when it comes to AI. A “trade secret” is something we understand from the commercial world.</p>
<p>But when should it be ok to refuse someone the information they need to exercise their democratic right to an appeal, because the algorithm being used is a “trade secret”?</p>
<p>Partnering with private sector organisations to deliver automated, predictive public services is becoming a necessity for government. We don’t have clear expectations of the nature of those relationships: who owns the AI being developed using public funding; who should have control over and access to data used by the AI; and what our democratic rights are to understand and control how automation, algorithms, artificial intelligence, shape our interactions with government.</p>
<p>We need to have this discussion in Australia. Just this year, as well as the much-covered Centrelink debt recovery programme, the government has also announced investments in predictive systems to identify welfare recipients for drug testing and - just last week - identifying ‘at risk’ gamblers online.</p>
<p>When Centrelink began sending automated debt notices over Christmas in 2016, it became front page news and the subject of a Parliamentary inquiry. The data model had flaws. The systems surrounding its implementation had flaws.
The data matching process at the heart of Centrelink’s debt recovery programme wasn’t new. Automating the process simply exposed existing flaws and scaled them up with devastating effect.</p>
<p>Access to data is power. If you’re a startup, a business, a researcher, or a government department building AI, you need access to high quality data sources.</p>
<p>And if you’re someone on the receiving end of an automated decision, not having ready access to data to challenge it with immediately puts you in a less powerful position.</p>
<p>In the Centrelink case, the only way to challenge a decision was to validate the model – submit data about your employment and pay slips that might expose an error. How accessible to you are your employment histories as data? Not the snippets, the payslips and documents. Your employment details as data that can be interpreted by a machine.</p>
<p>As more and more services are automated - applying for a home loan, getting health insurance - having access to our own data, or the ability to entrust it to someone else, will become increasingly important.
The world we live in now is shaped by information flows and information hierarchies. And there’s a trend emerging in the machines being built for tomorrow.</p>
<p>Automation is disproportionately affecting already vulnerable and marginalised people. We’re at risk of entrenching - making permanent - existing structural inequalities.</p>
<p>In this new age of machines our power structures might look a little different at the top - tech and online giants replacing mainstream media giants - but it’s the same people left excluded and even more marginalised at the bottom.</p>
<p><strong>The good news is while there are challenges there are also great possibilities</strong></p>
<p>At the same time we’re wrestling with these challenges, systems are being developed to try to address some of the issues of bias and under representation we struggle with in society.</p>
<p>Take recruitment. Challenges addressing gender and racial bias in recruiting processes have been well documented.</p>
<p>Today a range of tools are being developed which try to reduce that particular aspect of recruitment bias. One UK based startup, Applied, offers gendered language detection in job descriptions and blind application scoring.</p>
<p>Historically in medical research, treatments that have been developed tend to be most suitable for middle aged men. That’s because men are overrepresented in Australian clinical trials. Women make more difficult clinical trial participants because we menstruate. The impacts of drugs and other treatments are rarely tested on pregnant women at all.</p>
<p>Now, we have access to data about how people respond to treatments beyond expensive clinical trials. We have digitised scans, x-rays, blood tests, DNA histories. We have smart devices and mobile applications tracking symptoms and reactions in real time that we can use to devise fairer treatments for everyone, with the right data security mechanisms in place.</p>
<p>Artificial intelligence is being used to support and protect marginalised communities. In the UK, volunteers are teaching AI to spot potential slavery sites using satellite imagery - South Asian brick kilns, which are often the site of forced labour.</p>
<p>But when we see and hear stories about how data is being misused and abused, and driving bad automated systems, it makes it harder to have meaningful conversations about these kinds of possibilities. It makes it harder to trust.</p>
<p>A lack of trust is bad for business and bad for government. The economics are rubbish. When trust is low, investment is low and innovation is harder.
But the issues we’re dealing with in AI aren’t new issues.</p>
<p>Statisticians, scientists and social researchers have always worked within guidelines managing data responsibly and reducing bias. Issues around bias and prejudice in decision making aren’t new either - society’s reckoning with them is reflected in our anti-discrimination laws, our employment laws, our consumer rights laws.</p>
<p><strong>What we need for this next machine age is a systems update.</strong></p>
<p>People and organisations around the world are designing ways to handle data ethically, to build ethical machines and drive a fairer future for everyone.</p>
<p>Sage Bionetworks, a non-profit research organisation in the US is developing design solutions for data sharing and consent - meeting people where they live with the ethics, not just the technology. And they’re building massive, intentionally diverse health datasets for future use as training data.</p>
<p>The Open Data Institute is developing a data ethics canvas to help teams work through the risks and potential impacts of data projects. The ODI has also been leading conversations in the UK and Europe about how openness can help organisations build trust.</p>
<p>Elon Musk is one of the sponsors of a non-profit called OpenAI, committed to researching and promoting AI safety. Just last week Google launched PAIR: the People + AI Research Initiative to study how humans interact with AI.</p>
<p>In New York AINow, an initiative co-founded by Australian researcher Kate Crawford, was recently launched to study the social impacts of AI.</p>
<p>There is a gap though. It’s a knowledge gap that exists between people working on AI-related issues and our senior leaders who make decisions about where AI should be deployed.</p>
<p><strong>We don’t all need to become machine learning experts.</strong>
We don’t need to know how to build a car engine from scratch to know when it’s at risk of breaking down. We have lights that flash on our dashboards, we have smells and sounds that trigger warnings. We understand some of the basic things that keep our cars healthy, and we learn how to respect others on our roads.</p>
<p>We do all need to develop a basic awareness of AI warning signs (dodgy data, unreasonable secrecy about how they work, over reliance on automated results over common sense) - the bad smells.</p>
<p>And organisations designing artificial systems or debating their role within different sectors <strong>need to develop the dashboard warnings, the indicators, to help people investing in AI check for errors before pressing the accelerator.</strong></p>
<p>We need to give senior decision makers, our politicians and leaders, the skills and information they need to ask the right questions. To follow their noses. To know when AI stinks.</p>
<p>There’s also broader policy questions to be be debated about how what a healthy AI ecosystem looks like, and how it should be regulated. This is where I return to those three themes that will shape the evolution of our AI systems and who gets to benefit from them: access, control and accountability.</p>
<p>Data privacy is no longer the biggest challenge we’re facing - we have other challenges like data monopolies. Technology giants like Google, Facebook, Amazon are sitting on enormous data sources of billions of people and acquiring artificial intelligence startups quickly.</p>
<p>We talk about accessing data held outside government for national security purposes, but what about for public interest purposes? Healthcare, transport planning? How do we generate competitive AI economies when who holds data holds the power? And what controls do we put around this?</p>
<p>When we talk about a dystopian future in which man is slave to machines, we tend to have these images of beings with super intelligence and super strength.</p>
<p>I’m more worried about stubborn, short sighted AI who can’t distinguish me from my data shadows. Who will not listen, can’t be argued with and can’t be changed. Who respond to every request with “computer says no”.</p>
<p>The control we retain as humans - to appeal, to challenge, to choose - will determine the power structures in this new age of machines.</p>
<p>Organisations designing and implementing AI now who will determine the controls we have. What are their responsibilities? How should they be held accountable for systems that make unethical or simply inaccurate decisions?</p>
<p>Access, control, accountability. How we apply these concepts to AI now will shape our future. We can’t simply ignore the bad smells. But we also can’t throw the keys away, halt development. There are risks and questions to be worked through, but there’s also opportunities for AI to be used in genuinely powerful ways to improve our lives.</p>
<p>So. Take a moment, clear your nose. And let’s work on that sense of smell of yours.</p>
<p>Thank you.</p>ellenbroadOn 19 July 2017 19 July 2017 I gave a speech about data ethics to the Cranlana Programme, which was broadcast as part of ABC Radio National’s ‘Big Ideas’. You can listen to the full conversation on ABC Radio National here. A transcript of the prepared speech (before conversation with the room) is below.Encouraging women studying in STEM: why I want universities to publish open data2017-07-20T12:00:00+00:002017-07-20T12:00:00+00:00http://ellenbroad.com/open-data-can-help-improve-women-in-STEM<p><strong>I almost failed statistics last semester.</strong></p>
<p>The experience reignited old insecurities about being bad at maths. But it also made me realise just how little I actually knew about the unit I was doing: how people had performed in the past; how women were going with it; whether others were struggling with the content or the teaching approach - to help put my insecurities in perspective.</p>
<p>Without any of this information, I filled the vacuum with a nagging sense of self-doubt.</p>
<h2 id="universities-cough-up-your-data">Universities, cough up your data</h2>
<p>This year I’ve been balancing work alongside a masters in applied data analytics - a mix of computer science, statistics and social research units. My experiences so far have me thinking about whether universities should be compelled to make public - or at least share with students - certain kinds of data. Data like:</p>
<ul>
<li>
<p><strong>The performance of past unit cohorts - median and mean marks of students broken down by gender and by tutor.</strong> This could help people, especially women, understand their individual performance in context. It could also help to identify in STEM where women might be, on average, under performing or dropping out of units, and figure out why.</p>
</li>
<li>
<p><strong>Aggregated results from student feedback regarding unit content and learning experience.</strong> We fill out these surveys for every unit, but the information they gather about teaching quality is never made available. Understanding how past students have rated a unit - even if it’s at a high level - can help students make informed decisions about units to enroll in.</p>
</li>
<li>
<p><strong>Basic contextual data about cohort composition.</strong> Non personal data about things like total students enrolled in a unit, gender breakdown, whether domestic or international student. This kind of information helps to make sense of cohort averages and trends.</p>
</li>
</ul>
<p>I’ve already had arguments with academic friends about why this couldn’t work - because academic performance is measured in terms of publications and research money, not teaching quality. Because all of the major university rankings focus primarily on research, and so universities aren’t ultimately very interested in teaching. Because publishing data about poor quality units would make lecturers feel bad. One university couldn’t do it alone, all universities would need to.</p>
<p>I’ve thought a lot about the reasons I’ve heard for not publishing data, and I think my arguments in favor of publishing data are better.</p>
<p>To start with, students are paying more for university degrees every year. Maybe teaching quality should be more of a concern. Having greater visibility of unit quality and performance could drive more focus on teaching outcomes, and even increase competition between and within universities attracting students. It provides a mechanism for universities to distinguish themselves other than through research rankings.</p>
<p>Without open data, it’s also hard to get a handle on issues like whether women are underperforming in STEM subjects in Australia; whether they’re more likely to drop out STEM units; and why. The data can be collated, but it’s not reflected in summary statistics published by entities like the Department for Education or Universities Australia, and universities don’t individually report on it either.</p>
<p>There’s not much incentive for universities to share data about the performance of women as part of student cohorts, or to share data about student performance and teaching quality generally. It could expose weaknesses. It could expose systemic issues with certain teachers or units. But as US artist Mimi Onuoha has <a href="http://mimionuoha.com/thoughts/">said</a>, <strong>for every dataset where there’s an impetus for someone not to collect it, there’s a group of people who would benefit from its presence.</strong></p>
<h2 id="weeding-out-bad-stem-experiences">Weeding out bad STEM experiences</h2>
<p>My results so far in my masters have been good. I’m sitting on 90% for one applied research unit, and received 78% for my relational database design unit. But when I got to statistics, I flopped.</p>
<p>My statistics experience ticked a lot of the stereotypes about STEM. Of seven tutors, one was female. The unit coordinator was male. Throughout the unit, it wasn’t uncommon for my tutor and the lecturer to say things like, “if you’re not getting this yet, this subject may not be for you” and “if you’re finding this difficult, you will find every unit that follows it difficult”. The impetus being, if you’re struggling it’s because you’re ‘not good’ - not because of the teaching approach.</p>
<p><a href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157447">Research has shown</a> that women are less likely to be confident about their maths and science ability, and that they’re more likely to drop out of STEM subjects because of it.</p>
<p>I saw women on multiple occasions crying during lectures. Once when I went to see the unit coordinator about a particularly thorny practice exam problem, a woman waiting outside asked me anxiously how I was going, and confessed that she was dropping out. When I failed the mid semester exam, I cried too. My poor performance was evidence that I was bad at maths.</p>
<p>What I didn’t know then was that I was one of about 350 people who failed the mid semester exam. It felt like it was just me. We were told the median mark (54%) and the average mark (53%) but I had no sense of scale, that I was one of over 700 people enrolled in the unit.</p>
<p>After results were distributed, rumors also starting flying around about how much ‘harder’ the unit had become since this particular coordinator had taken it over. How marks were always poor, and were ultimately heavily scaled. These were all rumors, but they reduced the sense I had that I uniquely was terrible at statistics.</p>
<p>In parallel on Twitter a range of brilliant people working in web development, data science and senior leadership positions shared with me their own experiences of maths failures, of repeating units at university (sometimes more than once!).</p>
<p>The more information I had to put my own mid-semester exam result in perspective, the less I felt like I was innately bad at maths. My confidence improved. I worked doubly hard in preparation for the final exam and passed the unit comfortably.</p>
<p>Open data about past and present cohort experiences - their results, their feedback on unit content and tutors - isn’t just good for improving teaching quality. It can help people put their own performance in context, and see that often their struggle is a shared one. It can help to expose structural and cultural barriers to participation.</p>
<p>I had a bad experience with one particular unit, but I also know lots of lecturers and unit teams who are passionate about improving learning outcomes (my relational database unit through the computer science faculty in contrast was great!). They use data like student feedback to improve unit content and methods of delivery. They’re developing tools to help identify students who might be struggling and provide better support.</p>
<p>The problem is, these kinds of practices aren’t evenly distributed. And in the subject areas that we say we care a lot about increasing diversity (like maths and science), our expectations of teaching standards should be high. Greater transparency can only help to improve teaching across the board and provide students with better informed choices overall.</p>ellenbroadI almost failed statistics last semester.In the Guardian: Medicare breach (it just got a whole lot harder to trust government with our data)2017-07-05T12:00:00+00:002017-07-05T12:00:00+00:00http://ellenbroad.com/in-the-Guardian-Medicare-data-breach<p>I wrote for The Guardian about the kind of access to my medical records I wish I had, and why the Medicare data breach makes it harder to believe my wish could be fulfilled. You can read it <a href="https://www.theguardian.com/commentisfree/2017/jul/05/the-medicare-data-breach-proves-the-government-cant-be-trusted-with-our-data">here</a>.</p>ellenbroadI wrote for The Guardian about the kind of access to my medical records I wish I had, and why the Medicare data breach makes it harder to believe my wish could be fulfilled. You can read it here.The Consequences of Dodgy Data Decisions2017-05-12T12:00:00+00:002017-05-12T12:00:00+00:00http://ellenbroad.com/the-consequences-of-dodgy-data-decisions<p>The Australian government just announced its intention to begin drug testing trials of certain welfare recipients, as part of its 2017 federal budget.</p>
<p>Whether you agree with the policy or not, how welfare recipients will be identified for drug testing has implications for us all. More reflections <a href="https://medium.com/@ellenbroad/the-consequences-of-dodgy-data-decisions-ce85c5432159">here</a>.</p>ellenbroadThe Australian government just announced its intention to begin drug testing trials of certain welfare recipients, as part of its 2017 federal budget.The Great Untangle: reading the final report of the Australian Productivity Commission on Data Availability and Use2017-05-08T12:00:00+00:002017-05-08T12:00:00+00:00http://ellenbroad.com/the-great-untangle<p>Today the Australian government released the Productivity Commission’s final report on Data Availability and Use, an ambitious plan for data reform in Australia. I look at some of the headline recommendations in a Medium post <a href="https://medium.com/@ellenbroad/untangling-the-final-report-of-the-australian-productivity-commission-on-data-availability-and-use-74927b2e0885">here</a>.</p>ellenbroadToday the Australian government released the Productivity Commission’s final report on Data Availability and Use, an ambitious plan for data reform in Australia. I look at some of the headline recommendations in a Medium post here.Wrestling with the ‘e’ word: data ethics and Unroll.me’s data selling woes2017-04-28T12:00:00+00:002017-04-28T12:00:00+00:00http://ellenbroad.com/wrestling-with-the-e-word<p>News recently that Unroll.me — a popular tool for organising subscription emails — sold their users’ anonymised email data to Uber sparked a mass exodus of Unroll.me users.</p>
<p>I’m an Unroll.me user, and I’ve been following the public reaction to their data commercialisation model with interest. I wrote about how our perceptions of ‘ethical’ data use can change on Medium <a href="https://medium.com/@ellenbroad/wrestling-with-the-e-word-data-ethics-and-unroll-me-s-data-selling-woes-1a4307d212c6">here</a>.</p>ellenbroadNews recently that Unroll.me — a popular tool for organising subscription emails — sold their users’ anonymised email data to Uber sparked a mass exodus of Unroll.me users.We need to talk about data ethics, Australia2017-01-11T12:00:00+00:002017-01-11T12:00:00+00:00http://ellenbroad.com/we-need-to-talk-about-data-ethics-australia<p>Over Christmas I went offline and caught up on some reading.</p>
<p>One of the books on my reading pile was <a href="https://www.amazon.com/Weapons-Math-Destruction-Increases-Inequality/dp/0553418815" title="Buy Weapons of Math Destruction by Cathy O Neil">Weapons of Math Destruction, by Cathy O’Neil</a>. O’Neil, a mathematician turned quant turned data scientist, writes about the bad data models increasingly being used to make decisions that affect our lives.</p>
<p>It was while I was reading Weapons of Math Destruction that news emerged of issues with Centrelink’s automated data matching program in Australia.</p>
<p><img src="/assets/images/centrelink.jpeg" alt="Centrelink Australia" /></p>
<figcaption class="caption">Photo by Tracey Neamy AAP</figcaption>
<p>In 2016 the Department for Human Services <a href="http://www.abc.net.au/news/2017-01-04/centrelink-debt-recovery-system-designed-by-dunderhead-wilkie/8160990 “debt recovery system not working”">automated its processes for matching welfare recipients’ reported income</a>, as part of government efforts to reduce overpayments to welfare recipients and recover debt.</p>
<p>Since July last year the automated system has produced around 169,000 debt notices, of which one in five have been confirmed to have been sent in error. News broke over Christmas and into early January this year of people receiving erroneous debt notices, for amounts sometimes in the thousands of dollars.</p>
<p>Reading Weapons of Math Destruction as the Centrelink stories emerged has me thinking about how we identify ‘bad’ data models, what ‘bad’ means, and how we mitigate unfairness - stress, damage, errors and inaccuracies - to the people they affect.</p>
<p>In this blog, I ask what role data ethics might play in the design of data models which stand to impact on peoples’ lives.</p>
<p>I haven’t seen ethics come up yet in the context of Centrelink data developments, which has surprised me. How could taking an ethics-based approach to data projects help to mitigate harm? What ethical frameworks exist for government departments in Australia undertaking data projects like this?</p>
<h2 id="weapons-of-math-destruction">Weapons of Math Destruction</h2>
<p>There are different ways in which a data model can be ‘bad’. It might be overly simplistic. It might be based on limited, inaccurate or old data. Its design might incorporate human bias, reinforcing existing stereotypes and skewing outcomes.</p>
<p>A simply bad data model spirals into a weapon of math destruction when it’s used en masse, is difficult to question and damages people’s lives. O’Neil’s book is dedicated to examples of weapons of math destruction in banking, education, employment, the justice system and more. Peter Martin summarises O’Neil’s book in his article <a href="http://www.smh.com.au/comment/how-centrelink-unleashed-a-weapon-of-math-destruction-20170105-gtmsnz.html “Centrelink unleashes weapon of math destruction”">here</a>.</p>
<p>To some extent, we’re all exposed to data models making assumptions about us - targeting ads to us online, assessing our insurance premiums and our polling preferences.</p>
<p>But weapons of math destruction tend to hurt vulnerable people most. They might build on existing biases (e.g. <a href="https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing">because you are black</a> you are more likely to reoffend, or <a href="http://www.consumerreports.org/cro/car-insurance/credit-scores-affect-auto-insurance-rates/index.htm “credit scores affect car insurance”">because you have a bad credit rating you will be more likely to have car accidents</a>). Errors in the model might have starker consequences for people without a social safety net. And some people are likely to find it harder to question and challenge the data model making decisions about them.</p>
<p>O’Neil’s book shows how bad data models can become weapons of math destruction. It doesn’t spend much time, though, on ways in which these weapons of math destruction can be prevented and mitigated.</p>
<p>Data matching and data-driven decision making is only going to increase over time. Centrelink isn’t alone in moving to automate processes, and recovering debt is a perfectly legitimate government objective. (In the UK, the May government is close to introducing a <a href="http://www.publications.parliament.uk/pa/bills/lbill/2016-2017/0080/lbill_2016-20170080_en_1.htm “Digital economy bill”">new data sharing power</a> for recovering debt owed to the public sector as part of the Digital Economy Bill).</p>
<p>The premise of a data model may not be ‘bad’; but issues can arise in how it is designed, its capacity for error and bias, and how badly people could be impacted by that error or bias.</p>
<h2 id="helping-organisations-make-good-data-driven-decisions">Helping organisations make good data-driven decisions</h2>
<p>We need more ways to help data scientists and policymakers navigate the complexities of data projects that involve personal data, and which can impact on people’s’ lives. Regulation has a role to play. Data protection laws are being reviewed and updated around the world (in Australia, the <a href="https://medium.com/@ellenbroad/early-thoughts-on-the-australian-productivity-commissions-draft-data-sharing-report-c77637ca0fa5 “early thoughts on productivity commission data sharing report”">draft Productivity Commission report on data sharing and use</a> recommends the introduction of new ‘consumer rights’ over their personal data). And bodies like the Office of the Information Commissioner help organisations understand if they’re treating personal data in compliance with personal data principles, and promote best practice.</p>
<p>Guidelines are also being produced to help organisations be more transparent and accountable in how they use data to make decisions. The Open Data Institute in the UK has developed <a href="https://theodi.org/guides/openness-principles-for-organisations-handling-personal-data “open data institute openness principles”">openness principles for organisations managing personal data</a>, designed to build trust in how that data is stored and used. Algorithmic transparency is being contemplated as part of the EU Free Flow of Data Initiative, and has become a <a href="https://theconversation.com/we-need-to-know-the-algorithms-the-government-uses-to-make-important-decisions-about-us-57869 “we need to know about algorithms the government uses for decisions”">focus of academic study</a> in the US.</p>
<p><strong>But a data project can be legal, its processes transparent and accountable, and still be a ‘bad’ data model.</strong></p>
<p>There could be known errors in a model that, if left unaddressed, cause real harm to people. An organisation’s normal appeal processes might not be accessible or suitable for certain people who could be harmed by a model, e.g. the elderly, infirm, and those with limited literacy. It could be a data model within a sensitive policy area, where a higher duty of care exists to ensure data models do not reflect bias.</p>
<p>Ethics can help to bridge the gap between compliance and evolving societal expectations of what ‘fair’ and reasonable data usage is. In Weapons of Math Destruction, Cathy O’Neil describes data models as “opinions put down in maths”. Taking an ethics-based approach to data-driven decision making helps us confront those ‘opinions’ head on.</p>
<h2 id="data-ethics-frameworks-are-gaining-traction-australia-needs-to-get-on-board">Data ethics frameworks are gaining traction. Australia needs to get on board.</h2>
<p>Last year the UK Government accepted a <a href="http://www.computerweekly.com/news/4500272963/House-of-Commons-Science-and-Technology-Committee-calls-for-Data-Ethics-Council “calls for a data ethics council”">recommendation</a> of the House of Commons Science &amp; Technology Committee that a Council of Data Ethics be established. The UK’s Alan Turing Institute, founded by government to lead on data science, is expected to house the Council of Data Ethics and has begun <a href="https://www.turing.ac.uk/events/the-ethics-of-data-science-the-landscape-for-the-alan-turing-institute/ “event on the ethics of data science”">exploring data ethics in earnest</a>.</p>
<p>The Cabinet Office released its first iteration of a <a href="https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/524298/Data_science_ethics_framework_v1.0_for_publication__1_.pdf">Data Science Ethical Framework for Government</a> in May 2016. And around the same time in the US, the White House published its <a href="https://www.whitehouse.gov/blog/2016/05/04/big-risks-big-opportunities-intersection-big-data-and-civil-rights">report on big data, civil rights and ethics</a>.</p>
<p>In the research and statistics professions, varying codes of ethics relating to data and information already exist. The Australian National Data Service has published <a href="http://www.ands.org.au/guides/ethics-consent-and-data-sharing “guide for researchers on data sharing, consent and ethics”">ethics, consent and data sharing guidance for researchers</a>.</p>
<p>But data ethics doesn’t seem to have yet found its way into Australian government policy debates. There’s no ethical framework per se for government data projects, or an ethics council or committee to review potential high impact data projects.</p>
<p>Not all data projects require ethical review. And it’s important to note that ethics can encompass more than compliance with data protection principles and licence terms of use.</p>
<p>A data project might be tagged as ‘high impact’, and its model design subject to ethical review, if it stands to directly impact on more than x percent of the population. And by ‘directly’, I mean alter their day-to-day life.</p>
<p>Ethics frameworks can help us to put a data model in context and assess its relative strengths and weaknesses. Ethics can bring to the forefront how <strong>people</strong> might be affected by design choices made in the course of building a data model.</p>
<p>An ethics-based approach to data-driven decision making might have data science teams not only ask questions like,</p>
<ul>
<li>are we <strong>compliant</strong> with the relevant laws and regulation?</li>
<li>do <strong>people understand</strong> how a decision is being made?</li>
<li>do they have <strong>some control</strong> over how their data is used?</li>
<li>do they have <strong>mechanisms to appeal</strong> a decision made?</li>
</ul>
<p>But also</p>
<ul>
<li>in this context, <strong>who</strong> will be affected by the data model?</li>
<li>can the <strong>people affected make use of those appeal mechanisms</strong>?</li>
<li>have we taken steps to ensure as much as possible that <strong>errors, inaccuracies, bias in our data model have been removed</strong>?</li>
<li>what <strong>scale of impact</strong> could potential errors or inaccuracies have on people <strong>if included in the model</strong>? What is an acceptable margin of error?</li>
<li>have we clearly defined how this model will be used, and what <strong>decisions it can and can’t support</strong>?</li>
</ul>
<p>What is the ‘duty of care’ of a data scientist? Is ‘duty of care’ a concept that is readily applicable in data science? What would a data scientist’s ‘duty of care’ consist of?</p>
<p>There’s no baseline of debate right now to help us understand the parameters of reasonable and acceptable data model design. What’s considered ‘ethical’ changes over time, as we change, as technologies evolve and new opportunities (and consequences) emerge.</p>
<p>Ethics is messy and hard to pin down, which is why it can seem counterproductive to people who work with data every day. But I think a data ethics debate is worth having.</p>
<p>Bringing data ethics into data science reminds us that we’re human, that our data models reflect design choices we make, and which have the potential to affect peoples’ lives - in good and bad ways.</p>ellenbroadOver Christmas I went offline and caught up on some reading.You are not as bad at STEM as you think you are2016-12-08T12:00:00+00:002016-12-08T12:00:00+00:00http://ellenbroad.com/you-are-not-as-bad-at-STEM-as-you-think-you-are<p>This is a bit of a different post for me.</p>
<p>Next year I’m going to be starting a masters that is entirely computer science, statistics, applied analytics focused. But I don’t have a STEM background. I wrote about my experiences in STEM and realising I wasn’t as bad at maths as I thought I was <a href="https://medium.com/@ellenbroad/youre-not-as-bad-at-stem-as-you-think-you-are-a8727450d0f6#.hapcwuco4" title="you're not as bad at STEM as you think you are">here</a>.</p>ellenbroadThis is a bit of a different post for me.How to make ‘smart’ cities work for everyone2016-11-29T09:00:00+00:002016-11-29T09:00:00+00:00http://ellenbroad.com/how-to-make-smart-cities-work-for-everyone<p><em>Last week I gave a lunchtime lecture at the Open Data Institute Queensland (ODIQ). Below is a rough sketch of what I said. You can find slides <a href="http://www.slideshare.net/secret/gf0Gh8ZoXTuPK7 “slides from presentation”">here</a>.</em></p>
<p><em>This post originally appeared on the <a href="http://queensland.theodi.org/2016/11/29/1033/" title="making smart cities work for everyone blog">ODI Queensland website</a>.</em></p>
<p>When Maree (CEO of ODIQ) asked me to send through a summary for this lunchtime lecture, it was the day after Donald Trump won the US election.</p>
<p>We’d agreed on smart cities as a theme weeks in advance. That day, surfing waves of commentary and trying to understand the trends that shaped that election, I wondered whether smart cities were being designed for the benefit of all citizens or only a few.</p>
<p>Today I want to recast the smart cities narrative to place people - not technology - at its centre. I want to talk about the benefits of data and openness: open data, open source, open standards. And I want to reflect on the human experience of smart city solutions.</p>
<p>In this talk I ask myself three questions:</p>
<ul>
<li><strong>What makes a city ‘smart’?</strong></li>
<li><strong>What role does data play in a smart city?</strong></li>
<li><strong>Are we empowering or dividing people?</strong></li>
</ul>
<h2 id="what-is-a-smart-city">What is a ‘smart’ city?</h2>
<p>I’m going to start with some definitions because they shape how a policy maker or an organisation approaches smart city planning and investment. There’s lots of definitions out there, but we’re only looking at a few today.</p>
<p>The global industry coalition <a href="http://smartcitiescouncil.com/article/about-us-global “Smart City Council Website”">Smart City Council</a> says that a smart city:</p>
<blockquote>
<p>Uses information and communications technology (ICT) to enhance its livability, workability and sustainability.</p>
</blockquote>
<p>I don’t like this one because it leads with technology. To me, it implies that ICT will always help, regardless of its application or the problem at hand.</p>
<p>I’ve been looking around for an Australian approach to ‘smart cities’. The Federal Government doesn’t have an explicit definition of a ‘smart city’, but there’s an indication of how it approaches the term in its 2016 <a href="https://cities.dpmc.gov.au/htmlfile “aus gov smart city strategy”">Smart City Strategy</a>. Its goals are:</p>
<ul>
<li>Becoming smarter investors in our cities’ infrastructure</li>
<li>Coordinating and driving smarter city policy</li>
<li>Driving the uptake of smart technology, to improve the sustainability of our cities and drive innovation</li>
</ul>
<p>You don’t really get a feel for what a good ‘smart’ city looks like in the Strategy. There’s still a lot of focus on investment in smart technology being by itself a good thing for cities.</p>
<p>There are many great examples of data and digital being used effectively in cities. But for a policy maker about to embark on their own smart cities strategy, starting with technology encourages a disconnect from the broader challenges and priorities facing a council or government.</p>
<p>Making big investments in new technologies and data doesn’t necessarily mean cities are better off - it’s how they are used that creates impact.</p>
<p>The Open Data Institute <a href="http://theodi.org/smart-cities “ODI definition”">describes</a> a smart city as an ‘open city’,</p>
<blockquote>
<p>Putting people and openness at the heart of its design and operation.</p>
</blockquote>
<p>I like that it emphasises people at the core of smart city planning. We’ll talk about open data and openness as part of city planning a little later on.</p>
<p>This definition from the UK Department of Business, Innovation &amp; Skills is good too:</p>
<blockquote>
<p>A Smart City should enable every citizen to engage with all the services on offer, public as well as private, in a way best suited to his or her needs.</p>
</blockquote>
<p>It’s citizen-oriented and recognises that people have different needs. Not everyone is online and not all solutions are digital ones.</p>
<p>Ultimately, a smart city is one that aims to improve the lives of its citizens. A good smart city strategy considers the challenges and opportunities facing its community, and explores ways new technologies and data might help to respond to them. Technology might help you get to a solution; it’s not a solution in itself.</p>
<h2 id="how-new-technologies-and-data-are-contributing-to-smarter-cities">How new technologies and data are contributing to ‘smarter’ cities</h2>
<p>Having access to more data, and better tools to collect and analyse data, is changing the way we make decisions and deliver services. ‘Big data’, the Internet of Things and machine learning are some of the developments in the sights of policy makers.</p>
<p>From <a href="http://www.cmd.act.gov.au/smartparking/home “smart parking”">smart parking</a> to remote <a href="http://www.itnews.com.au/news/why-the-csiro-is-building-smart-homes-for-elderly-australians-419124 “smart homes for elderly”">monitoring of the infirm and elderly</a>, sensors are changing the way we plan city infrastructure.</p>
<p>Better use of data is helping to design new services, like <a href="http://www.industry.nsw.gov.au/invest-in-nsw/invest-case-studies/andrew-scores-goals-with-the-penrith-hub “Penrith Hub”">hubs for commuters</a> to work closer to home.</p>
<p>Some smart city solutions don’t involve new technologies and data at all. They can be about retrofitting old technologies for a new purpose, like turning <a href="http://lavamae.org/ “portable showers”">retired diesel buses into portable showers</a> for a city’s homeless population.</p>
<p><img src="/assets/images/lavamae.jpg" alt="an image of a lavamae bus" /></p>
<figcaption class="caption">Photo by Brunosuras CC-BY-NC-ND</figcaption>
<p>Not all smart city initiatives need to involve an app, a sensor or a chatbot. Different people have different needs. When evaluating smart city ideas, how people engage with information and where they go to find it needs to be taken into account. You can invest a lot of money in ‘smart’ infrastructure while <a href="https://www.greentechmedia.com/articles/read/Boulders-Smart-Grid-Leaves-Citizens-in-the-Dark">forgetting about the citizen</a>, resulting in little to no impact.</p>
<h2 id="building-open-city-data-infrastructure">Building open city data infrastructure</h2>
<p>The Open Data Institute describes <a href="https://theodi.org/what-is-data-infrastructure “what is data infrastructure”">data as infrastructure</a>. Just as roads help you navigate to a location, data helps you make a decision.</p>
<p>Sometimes cities make investments in new data technologies and sources (e.g. from sensors) without taking into account the health of their underlying data infrastructure. <a href="https://theodi.org/what-is-data-infrastructure">Having an understanding of the data assets</a> you need to deliver your services (private sector or public sector); clear responsibilities and expectations of organisations managing that data; and guidelines and policies to shape its use, is important.</p>
<p>Openness - open data, open standards, open source, open collaboration - is helping to build new services for citizens at scale. Cities and councils are starting to open up data for organisations and innovators to explore, and generate new business models. In the UK, opening up government data is <a href="http://theodi.org/research-economic-value-open-paid-data “open data research”">estimated to contribute an additional 0.5% to GDP annually</a>.</p>
<p>CityMapper, the journey planner app that originated in London (now in cities around the world, including Melbourne and Sydney) has <a href="https://theodi.org/news/citymapper-government-open-data-improve-cities “City Mapper and open data”">described</a> open data as the essential backbone for its service. It’s being used to provide <a href="https://www.theyworkforyou.com/ “they work for you”">new ways of engaging with government</a>, and helping people find their <a href="https://greatbritishpublictoiletmap.rca.ac.uk/ “Toilet Map”">nearest public toilet</a>.</p>
<h2 id="the-human-experience-of-smart-cities">The human experience of smart cities</h2>
<p>Not everyone accesses information the same way you do. In Australia 86% of households have the internet at home. With a population of over 24 million, that’s around 3.3 million people who aren’t online at home.</p>
<p>And while new technologies and data are helping to make services more efficient and automated, they’re also impacting on peoples’ jobs. Using e.g. drones to conduct site inspections will be safer and faster, but in the short term it will impact machinery and equipment providers and site inspectors. We’re going to see things like taxi and Uber drivers replaced by automated cars.</p>
<p>The impacts of automation are being studied and policies being developed to create new jobs. Being mindful of ways in which your citizens might be adversely affected by ‘smart’ solutions is part of city planning.</p>
<p>But data and digital can also empower and connect people together.</p>
<p>Open data is helping people engage with their community in unexpected ways. In Leeds, two developers and composers <a href="https://soundcloud.com/theodi/friday-lunchtime-lecture-making-music-with-open-data “Making music with open data”">used open footfall data</a> to create a musical score - and it’s really beautiful. Open data is also being incorporated into jewellery, art and even video games.</p>
<p>I’m always excited to see ‘smart city’ ideas that are trying to bring people together. I was just introduced to <a href="http://www.parkrun.com.au/ “park run”">ParkRun</a> in Australia, which organises free weekly runs for anyone and everyone - a great way to meet new people. A startup idea in the UK called <a href="http://www.newcitizenship.org.uk/rabble">Rabble</a> aims to connect families with communal volunteering projects.</p>
<p>Sometimes technology can bring you into contact with experiences you might not otherwise be aware of. A 2015 London project called <a href="http://www.londonischanging.org/ “London is Changing”">London is Changing</a> projected comments from people about their experiences of housing in London onto billboards in the inner city.</p>
<p><img src="/assets/images/londonischanging.png" alt="londonischanging billboards" /></p>
<figcaption class="caption">Photo by Duarte Carrilho de Graca, The Guardian</figcaption>
<h2 id="planning-your-smart-city-initiative">Planning your smart city initiative</h2>
<p>I’ve ranged across a few topics today. I’m still working out what a more people centric approach to smart cities might look like, and how to articulate it.</p>
<p>To summarise:</p>
<ol>
<li><strong>Lead with citizen needs - not new technologies.</strong> Your smart city initiative should be driven by the challenges and opportunities already facing your citizens. Connect it to other policies, link things together.</li>
<li><strong>Evaluate your data infrastructure - assets, technology, skills, governance.</strong> To make best use of new data sources and tools, you’ll need a robust foundation to build on.</li>
<li><strong>Invest in open - open data, open source, open standards, open collaboration.</strong> It helps make things better, faster, and cheaper.</li>
<li><strong>Be mindful - of impact, of how you deliver, of privacy and ethics.</strong> Design solutions that can be delivered, and be prepared for impacts both positive and negative. I barely touched on privacy in this talk but it’s an important area for policy makers to get into.</li>
</ol>
<p>I hope we’re finishing up feeling like there’s more to smart cities than a beautiful website or a shiny app. The technology or data by itself is of limited value; it’s what you do with it to ultimately change peoples’ lives for the better.</p>
<p>Hopefully you’re leaving here with lots to think about: a little wiser, a little ‘smarter’ and excited by the possibilities.</p>ellenbroadLast week I gave a lunchtime lecture at the Open Data Institute Queensland (ODIQ). Below is a rough sketch of what I said. You can find slides here.How I built this website2016-11-12T09:00:00+00:002016-11-12T09:00:00+00:00http://ellenbroad.com/how-i-built-this-website<p>I’m a novice coder. Even saying I’m a “novice” coder is probably giving myself too much credit. And yet I still managed to put this website together, which means that if I can do it, anyone can do it.</p>
<p>I’m writing down my experiences and the links and guides that helped me so I don’t forget, and so I can keep improving. There are parts of the process I mimicked without really understanding why I was doing what I was doing. Friends reading this who actually know how it all works and can fix my mistakes can comment via <a href="https://github.com/ellenbroad/indigo">GitHub</a>.</p>
<h2 id="where-i-started">Where I started</h2>
<p>I wanted to avoid anything too hands on at first. I started off playing with <a href="https://www.squarespace.com/">Squarespace</a> templates during a 14 day free trial.</p>
<p>Squarespace has some lovely templates, and the upside is you don’t have to do any coding if you don’t want to. The downside is that then you’re stuck with most of the style elements in a template - the changes you can make are pretty much limited to things like font size and colour.</p>
<p>I wanted to create something very simple and minimal, that I could develop later on. Little things irritated me about Squarespace, like being unable to change a blog’s layout, remove fixed headers, or reduce padding. I didn’t want to mess around too much in css (did I mention I’m a novice?) but I also didn’t want to go beyond a 14 day trial without something I was happy with.</p>
<p>So I started looking at <a href="http://jekyllthemes.org/">jekyll themes</a> instead.</p>
<h2 id="using-jekyll">Using jekyll</h2>
<p><a href="https://twitter.com/pikesley">Sam</a> was the first person to tell me about <a href="https://jekyllrb.com/">jekyll</a>. Then, my partner and I used jekyll to build our wedding website (but he did most of the heavy lifting). For my own website, I wanted to do everything myself.</p>
<p>Jekyll is a “simple, blog-aware, static site generator”. I don’t really know how to rephrase that. It gives you the essentials you need to build a simple website, and no more. In contrast, Squarespace felt like it had too many bits I didn’t need, and style elements I couldn’t change. I also explored <a href="https://wordpress.org/">WordPress</a> and had the same issues. WordPress felt more like a content management framework than a simple blogging platform. I also couldn’t find a WordPress theme I liked.</p>
<p>Luckily, there are lots of nice looking open source <a href="http://jekyllthemes.org/">Jekyll themes</a>. I chose <a href="https://github.com/sergiokopplin/indigo">this one</a>.</p>
<h2 id="using-github">Using GitHub</h2>
<p>GitHub takes some getting used to. I’ve used GitHub at the ODI for a couple of years. I found these guides and presentations helpful while setting myself up:</p>
<ul>
<li><a href="https://speakerdeck.com/alicebartlett/git-for-humans">“Git for humans” by Alice Bartlett</a></li>
<li><a href="http://readwrite.com/2013/09/30/understanding-github-a-journey-for-beginners-part-1/">“GitHub for beginners: don’t get scared, get started”</a></li>
<li><a href="https://guides.github.com/">GitHub’s own introductory guides</a></li>
</ul>
<p>I also found this <a href="http://jmcglone.com/guides/github-pages/">introduction to GitHub, GitHub pages and Jekyll</a> from Jonathan McGlone useful for walking through the mechanics of how Jekyll actually works on GitHub. The example website you build in the tutorial is from the GitHub web browser though. For this project, I wanted to be able to use the <a href="https://www.codecademy.com/learn/learn-the-command-line">command line</a> (namely because I’d need to be able to follow the instructions provided with my chosen Jekyll).</p>
<h2 id="getting-started-with-a-jekyll-theme">Getting started with a Jekyll theme</h2>
<p>I forked the <a href="(https://github.com/sergiokopplin/indigo)">repository</a> for the Jekyll theme I wanted to use so that I could make changes to it without editing the original code (explore ‘forking’ in the GitHub guides above). To be able to do anything with it, I needed:</p>
<ul>
<li>some <a href="https://www.davidbaumgold.com/tutorials/command-line/">basic guidance on using the command line</a></li>
<li>a text editor (I use <a href="https://atom.io/">Atom</a>)</li>
<li><a href="https://desktop.github.com/">GitHub Desktop</a></li>
</ul>
<p>Most GitHub repositories have READMEs with setup instructions. The instructions I was using looked like this:</p>
<p><img src="/assets/images/setup.jpg" alt="screenshot of setup instructions" /></p>
<p>I <a href="https://help.github.com/articles/cloning-a-repository/">cloned the repository</a> to my computer.</p>
<p>Then I tried to <a href="https://jekyllrb.com/">install Jekyll</a> from the command line. I got an error message immediately, saying I <a href="http://stackoverflow.com/questions/14607193/installing-gem-or-updating-rubygems-fails-with-permissions-error">didn’t have write permissions for the Ruby directory on my computer</a>. This is because I was trying to use a version of Ruby installed by Apple for their own use.</p>
<p>To get past this, I installed:</p>
<ul>
<li><a href="http://brew.sh/">Homebrew</a>,</li>
<li><a href="https://rvm.io/">RVM</a> (RVM then instructed me to install Ruby from the command line)</li>
</ul>
<p>A few <a href="https://www.moncefbelyamani.com/how-to-install-xcode-homebrew-git-rvm-ruby-on-mac/">online tutorials</a> said to install Xcode as well, but it looks like I already had it downloaded.</p>
<p>Then, I could install Jekyll, NodeJS and Bundler and follow the rest of the setup instructions, to open a version of the website locally in my browser.</p>
<p>After that, I made lots of changes to various folders and files in the repository in Atom, and tested them locally. I figured out how to make changes - like adding additional pages, changing photographs and aligning the navigation bar - pretty much by trial and error. I also added <a href="https://support.google.com/analytics/answer/1008080?hl=en">google analytics</a>.</p>
<h2 id="using-a-custom-domain">Using a custom domain</h2>
<p>The hardest part of the process came when I tried to point the domain I’d purchased (ellenbroad.com) at my GitHub Pages site. Or point my GitHub Pages site at the domain, whichever way round that is. I started off with a <a href="https://help.github.com/articles/using-a-custom-domain-with-github-pages/">GitHub guide to using custom domains</a> and <a href="https://help.github.com/articles/setting-up-an-apex-domain/">navigated to setting up an apex domain</a> and <a href="https://help.github.com/articles/adding-or-removing-a-custom-domain-for-your-github-pages-site/">adding a custom domain to your GitHub pages site</a>. These guides helped, but DNS still felt like an enigma wrapped in a mystery to me. I needed outside help navigating custom DNS settings and figuring out where I needed to add a CNAME record and an A record. Squarespace is my DNS provider (a throwback to when I thought Squarespace would be the easiest website option) - and now will be for another 35 days, until I can change to another provider (ICANN rules).</p>
<p>DNS changes take a while to propagate, so it was hard to spot errors quickly. I’m still completely confused by DNS but at least the website seems to be working - it must be, for you to all be reading this!</p>
<p>That’s all I’ve got to with the website so far. I’d like to continue improving it. Pull requests and suggestions <a href="https://github.com/ellenbroad/ellenbroadcom">welcome</a>.</p>ellenbroadI’m a novice coder. Even saying I’m a “novice” coder is probably giving myself too much credit. And yet I still managed to put this website together, which means that if I can do it, anyone can do it.