The researchers claim their data analysis can decrease workplace bias and increase performance, but the truth is much more complicated
How would you feel if your boss told you that, if you wanted that raise, you’d need to wear a tracking device 24/7?
It’s not an implausible future. Workplace wellness programs, which sometimes use fitness trackers and other devices to assess employee health — data that in many cases impact insurance rates — blossomed under the Obama administration, and now cover upwards of 50 million American workers. A new study, funded in part by the Office of the Director of National Intelligence with lead researchers from Dartmouth College, suggests a potential next step into this brave new world: day and night data surveillance that connects seemingly irrelevant data points — like how often you check your phone or leave your home on the weekend — to your work performance.
The study aims to “classify high and low performers” through the use of location-tracking beacons, wearables, and phone apps. It’s similar in design and purpose to two other research programs, called mPerf and MOSAIC, which both investigate how artificial intelligence can aid workers (and, of course, their employers). But experts warn there are many concerns around this kind of tracking.
“The features that are being used in this study are things like how much sleep people are getting, their heart rate, how much physical activity they’re getting,” says Natasha Duarte, a policy analyst at the Center for Democracy and Technology, a privacy and security nonprofit. “For a relatively young, privileged, healthy employee, those might be a salient factor… What about people who have disabilities? Basing their workplace performance on how physically active they are could be really discriminatory.”
A total of 554 subjects — 320 men and 234 women — were tracked in the study. They worked in various industries, but mostly in tech and consulting. The subjects regularly filled out classic workplace assessment surveys, in which they replied yes or no to statements such as “today I displayed loyalty to my organization” or rated themselves on whether they “adequately completed [their] assigned duties” or “ensured [their] tasks were completed properly.” These responses were then used to classify the workers as higher or lower performers.
Meanwhile, these same subjects were equipped with a number of different tracking devices. Each participant wore a waterproof Garmin bracelet, installed a tracking app called PhoneAgent on their smartphones, and had to use four beacons that tracked their location: one was placed in their wallet, one on their keychain, one at home, and one in the office. These devices recorded subjects’ sleeping habits, how often they got up from their desks, how often they left their homes at night and on weekends, how frequently they unlocked their phones, how much exercise they got, how well they slept, and other metrics.
The study authors then compared the data collected from the tracking devices to the survey results. Results were differentiated by the industry a participant worked in, as well as if they were a supervisor or nonsupervisor. For example, according to the study, nonsupervising high performers spent more time at work on the weekends (big surprise there) and visited fewer places on weekday evenings; higher performers working in consulting were less physically active on the weekends, while higher performers in tech were less physically active during the week.
Pino Audia, a professor of management at Dartmouth College and one of the study’s authors, says he infers from some of this data that high performers who visit fewer places during the course of the day have stable routines. This allows high performers to be proactive and resourceful, even in difficult work situations. “If you’re constantly interrupted, perhaps that impairs your performance,” he says.
The study’s authors hope such employee data could be used to make employee reviews less discriminatory and unfair. The existing approaches to worker evaluations are “fairly antiquated and potentially biased,” says Andrew Campbell, a professor of computer science at Dartmouth and another study author. He says that the team wanted to investigate how mobile sensing data could be employed to predict patterns that were reflective of higher performance. They eventually hope that in a decade or so, employees will use such data to reflect on and improve their own work.
Standard, written workplace evaluations are indeed imperfect ways to judge how well an employee is doing at their job. Open-ended survey questions, such as “how does the employee meet expectations?” or “what are the employee’s greatest skills?” compel managers to rely on stereotypes and biases, rather than data, in their responses, according to researchers at the Stanford VMware Women’s Leadership Innovation Lab. Men are more likely to receive specific feedback that focuses on their technical skills while women receive more general feedback, like “You’re a great communicator!” Specific feedback provides a blueprint for an employee to improve as well as capitalize on their strongest attributes, meaning that people who don’t receive such feedback can get left behind.
Audia says that the study points to “a near future in which we can be less reliant on surveys. We can rely more on objective indicators of behaviors. How can corporations ensure greater equality in the treatment and compensation and promotion rates of people by gender and by race and by nationality? […] You can think of some of the technology used in this study as moving in that direction.”
“If you build [an algorithm] off of men in their midtwenties, and apply it to women, or anybody over 30, or somebody with a disability… then yeah, it doesn’t hold water”
Naturally, there are some concerns with this work. There’s the issue of bias that Duarte mentioned; most of the individuals studied were men. And if data from this white-collar, primarily male cohort was used to predict or evaluate the performance of someone who didn’t fit into that population, they could be discriminated against for factors that have nothing to do with their performance.
“If you build [an algorithm] off of men in their midtwenties, and apply it to women, or anybody over 30, or somebody with a disability, or any kind of variance in physical activity… then yeah, it doesn’t hold water,” says Jen King, the director of consumer privacy at Stanford University’s Center for Internet and Society. “The training data is inherently biased.”
Someone with a mental illness, for example, might have anxiety that raises their heart rate or worsens their sleep quality; a mother of a teen might leave the house frequently in the evening to take them to extracurriculars, which could be used as a mark against them since, according to the study, high performers tend not to leave the home as often after work.
And, of course, there is the troubling privacy violation of having trackers on employees that follow their movements and record their biometrics, including when they’re not at work. There are few privacy protections for employees in the United States, so “requesting” that employees subject themselves to constant surveillance may be legally viable; that’s already happening in the case of employee wellness programs, in which employees have to lose weight, stop smoking, and track other biometrics to prevent skyrocketing health insurance rates. Such programs aren’t only legal — they’re actively encouraged by the Affordable Care Act.
“It shouldn’t be happening, but the legal landscape around it is very unclear, unfortunately,” says Duarte. Such tracking programs are unethical even when employees provide consent, she says, noting that people can feel coerced and pressured to participate to seem like “good workers.” “CDT’s position is that it shouldn’t be something that people are allowed to opt into. Like, location tracking that is not necessary to provide someone with a service that they’ve requested should be illegal,” she says.
King brings up the prospect that bosses could even use the location-tracking data to stalk employees. Imagine “the creepy supervisor that develops a crush on the female co-worker and follows her everywhere she goes or always knows where she is,” King says.
The study authors point out that the research is still in its early stages, and that it would be several years before such tracking is likely to be implemented in any workplace. Campbell says he understands how datasets like this can introduce bias in general.
“I take your point as a general criticism of any machine learning, or building algorithms from data, that if, for example, a demographic isn’t represented in that data, then the algorithm completely ignores them,” he says. “I don’t have a really good idea about how to solve that problem.”
But this is also not the only large study looking at this very topic. The mPerf study is funded by the same governmental organization, and like this research, it looks at the correlations between workplace performance and mobile sensing data.
“It is not inconceivable that not too many years in the future there will come a day when high school students applying to colleges won’t have to take the SAT or the ACT,” Deniz Ones, the study’s lead investigator, said in a press release. “They’ll be asked to download some apps on their mobile devices and link their wearable sensors and let colleges collect data for a couple of months.”
There’s also the Multimodal Objective Sensing to Assess Individuals with Context, or MOSAIC, which tracks members of the intelligence community to assess employee performance, although some might point out that employees in the intelligence community can reasonably accept some amount of surveillance as part of their jobs.
Are there ways to improve performance evaluations without invasive tracking? According to researchers from VMware, the Stanford group focused on women’s leadership, managers should agree to specific performance requirements months in advance. An employee’s performance can then be checked against those requirements, which helps eliminate bias in a review. Managers should refrain from general, ambiguous praise like “she’s a fantastic communicator” and instead rely on the agreed-upon performance rubric to fairly evaluate how well the worker communicates.
Further research on how employers can effectively guide their employees without success is absolutely necessary. But even beyond the invasive nature of the tracking suggested in research like the Dartmouth and mPerf studies, the problem of new and often invisible biases being introduced by technology persists. Humans may argue with each other’s judgments, but too often people believe these systems are objective. We should know better.
All Rights Reserved for Angela Lashbrook