Big Data: Gaps, Inequality, Biases

In December 2017, Rediet Abebe was three years into her PhD and sitting atop a continent of data. A graduate student in Cornell Computing and Information Science, Abebe had spent the previous summer as an intern at Microsoft Research. She had used the opportunity to address what one of her collaborators there called a data gap. If you’re trying to design a public health campaign for an African nation, you have far less data about people’s everyday concerns, popular misconceptions, and unanswered questions than someone working on a similar project for the United States or another developed nation.

Abebe, however, had begun to think of the problem in other terms — not a data gap but data inequality, not unlike economic or social inequality. “What individuals are represented in our data sets?” Abebe asks. “Who gets left out or misrepresented?” Institutions today rely on big data to deliver targeted information, calculate need, and allocate resources. Misrepresentation or underrepresentation in large data sets, Abebe argues, can amount to invisibility, perpetuating or even amplifying social, economic, and political disparities.

What Search Queries for Health Information on African Countries Revealed

Abebe asked for search queries submitted to Microsoft’s search engine, Bing, that originated on the African continent and included the terms AIDS, HIV, malaria, tuberculosis, or TB. Microsoft anonymized the data before sharing it with Abebe. The searches could not be linked to individual users. Crucially, however, Abebe could link the searches to specific nations, and sometimes she knew the self-reported gender and age of the person who submitted the query. She spent the rest of the summer looking for geographic and demographic patterns in queries such as, AIDS symptoms; can breastfeeding pass HIV to babies; and can garlic cure HIV? To the best of her knowledge, Abebe and her coauthors are the first to use large data from the web to generate health information pertaining to all 54 African nations.

“A lot of faculty at Cornell have done really serious computer science and economics and mathematical work, but they also care about the social impact and implications of their work. They use those concerns to inform more technical questions.”

From the beginning, Abebe had been interested in how computer science could help address inequality and social disparities. It’s what brought her to Cornell. “A lot of faculty at Cornell have done really serious computer science and economics and mathematical work, but they also care about the social impact and implications of their work. They use those concerns to inform more technical questions,” she says. Abebe’s first attempts had tended toward general challenges, such as modeling receptivity to persuasion in social networks. The African search query study convinced her that serious computer science research could intervene directly in issues of social inequality.

The resulting paper outlines Abebe’s methods and gives evidence for significant disparities in access to reliable health information. In 16 pages, however, it could not provide the granular, nation-by-nation analysis that might be useful, for example, to the Ministry of Health in Ghana or aid workers in rural Namibia. Abebe wanted to put her findings in the hands of government officials, aid organizations, and health experts — people who could act on them and potentially benefit people on the ground. Africa is the second largest continent by size and by population, and Abebe comes from a low-income family in Addis Ababa, Ethiopia. Growing up, she did not have those kinds of connections.

Advocating Diversity and Collaboration in Computer Science

The previous spring, however, Abebe had cofounded a professional group, Black in AI, with friend and colleague Timnit Gebru. At the time, Abebe was the only Black person in her PhD cohort at Cornell’s Ithaca campus. Abebe and Gebru saw a need for greater diversity in computer science. Now a global network of 1500 scholars and practitioners, Black in AI fosters research collaborations, promotes diversity in computer science, and facilitates a mentorship program for prospective graduate students.

“I sent an email to the Black in AI list,” Abebe says, “and I said, ‘Hey, I have this paper that I think could really be useful for people in the African continent.’” Abebe was amazed by the response. “People wrote back things like, ‘My cousin works at the ministry of health in Ghana. Do you want to talk to my cousin?’ And I’m like, ‘Yes, I want to talk to your cousin!’” Sharing her findings with experts in Africa became a sort of feedback loop, helping Abebe conceive and refine future projects. It has also become a model for innovative, socially responsible computer science research that Abebe promotes at talks and conferences.

Another group that Abebe cofounded, Mechanism Design for Social Good (MS4SG), promotes similar collaborations between computer scientists, scholars in other disciplines, and community stakeholders. As Abebe reminded one audience, “A lot of the solutions that we are putting out there are not properly informed by the daily experiences of marginalized communities. So these marginalized communities are further being underserved by technological advances if a set of researchers in AI are not working in conjunction with a diverse set of stakeholders.”

How Can Computer Science Help Resolve Inequality and Social Disparities?

Abebe, who appears on Bloomberg’s 2018 list of “Ones to Watch” and the MIT Technology Review’s 2019 list of “35 Innovators Under 35,” has emerged as a thought leader in envisioning the role that computer science can play in creating a more equitable world. A forthcoming paper, coauthored with her adviser, Jon Kleinberg, Computer Science, and Matthew Weinberg, Princeton University, uses the concept of income shocks — sudden, unanticipated expenses — to revisit a longstanding policy issue: how best to define and remediate poverty? For individuals living near the poverty line, income shocks can lead to poor health, eviction, and job loss, triggering a downward spiral into poverty. By factoring income shocks into the equation, Abebe and her coauthors explore ways of calculating need that could optimize the long-term efficacy of assistance programs.

In December 2019, Abebe became the first Black woman to receive a PhD from Cornell Computing and Information Science. “To me that was heavy,” says Abebe. “Because you’re recognizing that there have been real barriers. We can mark this moment; we can try to celebrate it. It is an achievement, not just for me, but for the department, for my community, for everyone. But also, let’s recognize it’s not a good thing. There are people who could have passed through this door who did not, and we should really pay attention to that.”

While at Cornell, Abebe has supported efforts to attract more underrepresented students to the PhD program. “I know the department really cared about trying to improve in this dimension, and I tried to help. Among the current second years there are seven people who are Black or Latinx or indigenous. Seeing all this difference in one year is very meaningful for me.”

All Rights Reserved for Cornell Research

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.