
Food and movie critics play important roles in their respective ecosystems. Restaurant critics help to evaluate everything from the quality of the food to the experience and ambiance of ordering and consuming it, while movie critics offer consistent perspective on the latest releases. While both professions have come under pressure in the digital era, their utility raises the question of whether we need an equivalent professionalized role for data, especially within the corporate world. Could a new role of “data critic” emerge in the enterprise to help evaluate and advise data scientists on the latest available datasets, as well as the nuances and problems with each?
One of the greatest challenges with how today’s data scientists make sense of the world through data is that so few of them actually take the time to understand the data they are using. Data science groups are typically understaffed and overworked, leaving little time to step back and perform the kinds of in-depth data studies required to understand whether a dataset is actually applicable to the questions posed to it.
Once-sacrosanct statistical practices like normalization have all but disappeared. Even within the academic literature, it is the rare study indeed that actually takes the time to normalize its findings, especially when working with social media data.
The challenge is that few data scientists are actually aware of how much their failure to normalize actually impacts their results. Without a resident dataset expert that deeply understands a particular dataset and is aware just how much the failure to normalize can impact results, analysts may be entirely unaware that their lack of normalization has entirely invalidated their results.
Data scientists are rarely fully aware of how the datasets they use have changed over time. Analyses will typically proceed based on the last public information about a dataset or the analyst’s own previous experiences, leading to woefully outdated assumptions.
In Twitter’s case, the platform has changed so existentially that a great deal of the academic research based on it is likely invalid.
Of course, as Twitter demonstrates, many researchers may be fully aware that their dataset is actually entirely unsuitable for their analysis, but proceed anyway because it is “the most available” to them. This is one of the reasons that so many researchers are both aware that Twitter has changed in ways that breaks their analyses but proceed anyway because it is the data they can most readily get their hands on.
This raises the question of what’s needed to help data scientists better understand the datasets they use.
One challenge is that data scientists have few incentives to perform data descriptive studies. Commercial researchers typically have little time for tasks not directly related to business objectives, while academic researchers suffer from a dearth of journals which will publish such studies.
Could the role of “data critic” help fill this gap?
Imagine a dedicated position embedded in a company’s data science division that spends their time doing nothing but data descriptive studies. They constantly search for new datasets and perform detailed analyses of their characteristics to understand their strengths and limitations.
Most importantly, much like a food critic reevaluates a restaurant periodically to see if it has changed, data critics would also reevaluate datasets at regular intervals to understand the ways in which they are changing and if those changes may invalidate existing analytic pipelines or call into question the assumptions that underpin those analyses.
Putting this all together, having a centralized role in each data science division that focuses on understanding the datasets their colleagues use and who have the time and resources to spend conducting in-depth descriptive evaluations and regular reviews of those datasets would go a long way towards helping companies avoid common data pitfalls and better understand the robustness of the data-driven findings that increasingly guide their businesses.Kalev LeetaruContributorBased in Washington, DC, I founded my first internet startup the year after the Mosaic web browser debuted, while still in eighth grade, and have spent the last 20 years…Read MoreBased in Washington, DC, I founded my first internet startup the year after the Mosaic web browser debuted, while still in eighth grade, and have spent the last 20 years working to reimagine how we use data to understand the world around us at scales and in ways never before imagined.
One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013 and a 2015-2016 Google Developer Expert for Google Cloud Platform, I am a Senior Fellow at the George Washington University Center for Cyber & Homeland Security. From 2013-2014 I was the Yahoo! Fellow in Residence of International Values, Communications Technology & the Global Internet at Georgetown University’s Edmund A. Walsh School of Foreign Service, where I was also adjunct faculty. From 2014-2015 I was a Council Member of the World Economic Forum’s Global Agenda Council on the Future of Government. My work has appeared in the presses of over 100 nations.Read Less
All Rights Reserved for Kalev Leetaru
