5 Steps to Take as an Antiracist Data Scientist

Data scientists are data stewards. They collect data, store data, transform data, visualize data, and ultimately impact how data are used. In our data-driven world, stewards hold the objective responsibility to use data to tell stories and effect change in a positive way. In order to be a good steward, data scientists need to be more than simply “not racist”— they need to be antiracist. In an online article by Towards Data Science, Emily Hadley listed the five steps data scientists should take toward good stewardship in the face of racist/hateful ideology in our contemporary society:

Step 1: Educate ourselves about becoming antiracist

To be antiracist data scientists, we must take the steps to be antiracist individuals. Being antiracist is different for white people than it is for people of color. As written in this toolkit by the National Museum of African American History and Culture: “For white people, being antiracist evolves with their racial identity development. They must acknowledge and understand their privilege, work to change their internalized racism, and interrupt racism when they see it. For people of color, it means recognizing how race and racism have been internalized, and whether it has been applied to other people of color.” This excerpt from The Racial Healing Handbook by Dr. Anneliese Singh is a great place to start as it walks through the six responsibilities that individuals can take in the ongoing process to be antiracist: Read, Reflect, Remember, Risk, Rejection, and Relationship Building.

To white readers specifically who have begun to acknowledge privilege and are looking to Read and Reflect — before burdening Black, Indigenous, or People of Color (BIPOC) friends with requests for reading resources or conversation, start with the many resource lists that are currently available online, such as here and here, and reach out to white friends who are also on this journey for conversation.

Step 2: Learn about how data and algorithms have been used to perpetuate racism

As data scientists, we use data to answer questions, solve problems, and (hopefully) have a positive impact. But history has repeatedly shown that good intentions are not enough. Data and algorithms have been used to perpetuate racism and racist societal structures. It is imperative that we educate ourselves about these realities and the uneven effects they have had on Black lives*. This list is meant as a starting point and is by no means exhaustive; we must continue to learn from, contribute to, and amplify research and reporting on this work in our efforts to confront these challenges.

New ArticlesRacial Bias in a Medical Algorithm Favors White Patients Over Sicker Black PatientsMany Facial-Recognition Systems Are Biased, Says US StudyMachine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacksAs Cameras Track Detroit’s Residents, a Debate Ensues Over Racial BiasFacebook’s ad-serving algorithm discriminates by gender and raceHow community members in Ramsey County stopped a big-data plan from flagging students as at-risk

LecturesBig Data, Technology, and the LawAlgorithmic Justice: Race, Bias, and Big DataLegitimizing True Safety (which includes discussion of facial recognition and how police surveillance is currently being used against Detroit residents accused of violating social distancing orders)

Books (consider purchasing from a Black bookstore): Algorithms of Oppression: How Search Engines Reinforce Racism (Safiya Noble); Artificial Unintelligence: How Computers Misunderstand the World (Meredith Broussard); Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (Virginia Eubanks); Technically Wrong: Sexist Apps, Biased Algorithms, and Other Threats of Toxic Tech (Sara Wachter-Boettcher); Weapons of Math Destruction (Cathy O’Neil)

Experts to Follow: Nasma Ahmed (Digital Justice Lab); Alvaro Bedoya (Visiting Professor of Law at Georgetown University and Founding Director of the Center on Privacy and Technology); Meredith Broussard (Associate Professor at NYU); Joy Buolamwini (MIT Media Lab, Founder of Algorithmic Justice League); Max Clermont (Senior Political Advisor to Holyoke Mayor Alex Morse); Teresa Hodge (Co-founder and CEO of R3 Technologies); Tamika Lewis (Fellow at Data Justice Lab); Yeshimabeit Milner (Co-founder and Executive Director, Data for Black Lives); Tawana Petty (Non-Resident Fellow at the Digital Society Lab and Director of Detroit Community Technology Project); Rashida Richardson (Director of Policy Research at AI Now); Samuel Sinyangwe (Co-founder of Campaign Zero); Latanya Sweeney (Professor of Government and Technology in Residence at Harvard University, Director of the Data Privacy Lab)

Organizations to FollowData & SocietyAI NowDigital Civil Society LabCenter on Privacy and TechnologyData for Black LivesCampaign ZeroDigital Equity LaboratoryData Justice LabAlgorithmic Justice League

Step 3: Eliminate racist decisions and algorithms in our own work

As antiracist data scientists, we must commit to taking action every day in our own work to eliminate racist decisions and algorithms. There is no one checklist that will accomplish this, but Hadley found herself regularly applying a series of questions to the data science projects that she contribute to. Portions of these questions come from a 2018 lecture I attended titled “The Data You Have and the Questions You Ask It” by Logan Koepke, a Senior Policy Analyst at Upturn.

If the answers to these questions reveal underlying racism, we must speak out and challenge the status quo.

Start with the data you have. Review the data and always reach out to subject-matter experts to better understand:

  • How was the data obtained?
  • For whom was the data obtained?
  • By whom was the data obtained?
  • Was permission granted to obtain the data?
  • Would individuals be comfortable if they knew this data was being obtained?
  • Would individuals be comfortable if they knew how this data was being stored or shared?
  • To what end was the data obtained?
  • How might this data be biased?
  • Explore the zine Digital Defense Playbook to consider how you might better inform and include broader communities, including Black communities, into the conversation on obtaining and using data

Consider the questions you’re hoping to answer or the problems you’re hoping to solve with your data. Ask:

  • Are the communities that will be impacted by this analysis involved in the process of shaping the questions you’re hoping to answer? If not, why not?
  • Do current goals complicate the use of historical datasets and use them in ways that are different than originally intended?
  • To what extent are predicted outcomes dissimilar from the observations in the data? Is the question you’re asking trying to force a reality that isn’t grounded in truth?
  • Does the very act of prediction also change the future observation space? How might behaviors change because of the predictions?

When you’re building a model, think like an adversary:

  • How could this system be gamed?
  • How could it be used to harm people, especially those in BIPOC communities?
  • What could be the unintended consequences of this model?
  • As the model “learns” from new data, how might this new data introduce new biases?

When you’re communicating the results of the model:

  • Is the model communicated such that the community who contributed the data is able to view and understand the results?
  • Have you clearly communicated the ways in which the model was tested to uncover racial bias?

Learn the Technical Details:

There is a growing body of research of technical approaches to addressing race in algorithms in a way that considers fairness. Simply not including race as a variable in an algorithm and saying that you have “Fairness through unawareness” is unacceptable: just because an algorithm does not include race as a predictor does not mean that it is unbiased. Instead, data scientists should explicitly consider the sensitivity of algorithms to race. This article provides an introduction to algorithmic fairness including the concepts of Demographic Parity, Equalized Odds, and Predictive Rate Parity, and tools that can be used to reduce disparity during pre-processing, training, and post-processing. This article illustrates how to explore Demographic Parity using SHAP, an explainable AI tool. The report Exploring Fairness in Machine Learning for International Development by the MIT D-Lab explores how to integrate fairness into a machine learning project with considerable detail. For additional learning, utilize this free online textbook and these videos: Google Machine Learning Crash Course Fairness in ML2017 Tutorial on Fairness in Machine Learning21 Fairness Definitions and Their Politics.

Step 4: Commit to increasing diversity in the data science field

The 2020 Harnham US Data and Analytics Report found that only 3% of Data and Analytics professionals identified as Black, and even fewer in leadership positions. This is unacceptable, particularly as we (non-Black data scientists) continue to use data collected from and write algorithms that impact Black communities.

To push the organizations we work for and the data science community at-large to change, we must commit to:

  • Confronting our own unconscious biases and how they manifest themselves in the workplace so as to make our field a more inclusive space
  • Inventorying our internal company practices and making changes to advance equity, diversity, and inclusion at all levels of our organizations
  • Reviewing and updating our hiring processes so they don’t reflect unconscious biases of the individuals/teams responsible for hiring
  • Demanding representation on executive leadership teams, boards, and expert panels
  • Developing leadership pathways to support emerging leaders from historically underrepresented backgrounds

Step 5: Contribute financially to Black-led and community-driven organizations committed to data awareness and increasing diversity in data science

It is no secret that data science is a lucrative field with a mean annual salary of approximately $100,000. Since we were not born knowing data science, many of us have likely entered this field thanks to robust educational experiences. As antiracist data scientists, we must recognize that we live in a racist society where education opportunities are distributed unequally. Since data science impacts everyone, we must commit to using the financial resources we’ve received for our work to support educational experiences that increase diversity in the data science workforce (and make this lucrative field more accessible) as well as data awareness for everyone.

Support Black-led and community-driven organizations contributing to data awareness

Set up recurring monthly donations to Black-led and community-driven organizations contributing to data awareness, data collection, and data visualization of timely issues such as police violence. Organizations to consider include:

Support data science and tech programs that serve Black students

Set up recurring monthly donations to support data science and tech programs that serve Black students. While it may be tempting to volunteer for teaching opportunities, it can be extremely powerful for BIPOC students to learn from BIPOC data scientists. Consider financially supporting programs such as:

Start a scholarship at your local community college

In 2016, Google completed research highlighting the role that community colleges can play and the challenges they face in creating a pathway to increased diversity in computer science. Community colleges generally have substantially smaller financial requirements than universities for starting a scholarship, and these scholarships can go a long way. Reach out to the financial aid office at your local community college to get started today.

Start or contribute to a scholarship or data science program at a historically Black college or university (HBCU)

Many HBCUs have existing or new data science programs including:

Reach out to these programs directly to learn more.

Click here to read the full, original article.

Share this page
Suggested Articles
Key accomplishments: TDA's 2016 Status Report

Assembling a dynamic community of faculty affiliates is one of Translational Data Analytics’ greatest achievements since its launch in 2014. TDA’s 2016 Status Report illustrates other notable progress toward convening...

$500K Pandemic Response Challenge-Invitation to Register!

XPRIZE and Cognizant have launched the Pandemic Response Challenge, a new $500K, four-month competition. This challenge will focus on the development of AI and data-driven systems to predict COVID-19 infection rates...

Panda team to provide big data computing expertise for neuroscience in NSF spoke project

DK Panda TDA affiliate Prof. Dhabaleswar K. (DK) Panda will serve as principal investigator of a project entitled Advanced Computational Neuroscience Network (ACNN) thanks to a new grant from the...

Columbus tops for tech workers

Columbus, Ohio, tops SmartAsset‘s list of Best Cities to Work in Tech in 2017. The site, which provides tools to aid financial decision-making, notes that “the city also benefits from being...

Vuolo awarded Fulbright U.S. Scholar Grant

TDAI affiliate Michael Vuolo, associate professor of sociology, is the recipient of a prestigious 2018-2019 Fulbright-Schuman European Union Affairs Program award, which is designed to strengthen relationships between the U.S....