Data Science Ethics in the Digital Age, part 1

A course on Data Ethics was included in my graduate curriculum at Indiana University but it can be a topic that gets lost in all the blogs about coding, algorithms and technology tools. As I prepare to become a data science practitioner in a few months, I’ve had more time to do leisure reading and recently finished Matthew Salganik’s “Bit by Bit – Social Research in the Digital Age”. Salganik devotes the entire last chapter to ethics in the digital age. Fortunately as the field of data science continues to evolve, the topic of ethics continues to also grow and evolve. As it’s been a while since I’ve written on ethics in data science, I want to discuss this important topic in the next two blogs.

Former Chief US Data Scientist DJ Patil says that ‘Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.’ In a recent Medium article, Patil emphasizes that a code of ethics should be an integral part of the entire data science process. A University of Michigan Data Science Ethics Coursera course states that ethics are the basis for the rules we all voluntarily choose to follow. I would argue that in the case of data science, at least some of the rules and best practices should probably be laws so that they are mandatory rather than voluntary.

Salganik says that ‘social research in the digital age has different characteristics and therefore raises different ethical questions.’ He points out researchers have the ability to observe people’s online behavior without them even being aware they are being observed. With all the enormous amounts of data that is now being collected, researchers have more power than ever. The increased power and the lack of consent by those whose data is being examined, there will be ethical considerations into the future.

The consequences of unethical data use can range from somewhat to very harmful if the researchers don’t responsibly use their power. Salganik briefly discusses several extreme examples of data collection that was used for human rights abuses in the 20th century. As early as 1919 in the USSR, population census data was used to force minorities to migrate. More recent examples of less drastic but questionable data use include insurance companies charging higher rates when they learn you buy fast food often with your credit card. Continuing to discuss and then implement ethical standards and norms for data scientists could limit or altogether prevent power abuse.

KD Nuggets, Forbes and Salganik have all pointed out that it’s not necessary for data scientists to start from scratch to practice ethical principles. Next week, I will delve into four principles Salganik suggests that researchers can use to ensure ethical data practices: Respect for Persons, Beneficence, Justice and Respect for Law and Public Interest. These principles provide a solid framework for helping ensure all people benefit from data science advances.