A guide to the reality-melting technology in your phone’s camera.
When a prominent YouTuber named Lewis Hilsenteger (aka “Unbox Therapy”) was testing out this fall’s new iPhone model, the XS, he noticed something: His skin was extra smooth in the device’s front-facing selfie cam, especially compared with older iPhone models. Hilsenteger compared it to a kind of digital makeup. “I do not look like that,” he said in a video demonstrating the phenomenon. “That’s weird … I look like I’m wearing foundation.”
He’s not the only one who has noticed the effect, either, though Apple has not acknowledged that it’s doing anything different than it has before. Speaking as a longtime iPhone user and amateur photographer, I find it undeniable that Portrait mode—a marquee technology in the latest edition of the most popular phones in the world—has gotten glowed up. Over weeks of taking photos with the device, I realized that the camera had crossed a threshold between photograph and fauxtograph. I wasn’t so much “taking pictures” as the phone was synthesizing them.
This isn’t a totally new phenomenon: Every digital camera uses algorithms to transform the different wavelengths of light that hit its sensor into an actual image. People have always sought out good light. In the smartphone era, apps from Snapchat to FaceApp to Beauty Plus have offered to upgrade your face. Other phones have a flaw-eliminating “beauty mode” you can turn on or off, too. What makes the iPhone XS’s skin-smoothing remarkable is that it is simply the default for the camera. Snap a selfie, and that’s what you get.
These images are not fake, exactly. But they are also not pictures as they were understood in the days before you took photographs with a computer.
What’s changed is this: The cameras know too much. All cameras capture information about the world—in the past, it was recorded by chemicals interacting with photons, and by definition, a photograph was one exposure, short or long, of a sensor to light. Now, under the hood, phone cameras pull information from multiple image inputs into one picture output, along with drawing on neural networks trained to understand the scenes they’re being pointed at. Using this other information as well as an individual exposure, the computer synthesizes the final image, ever more automatically and invisibly.
The stakes can be high: Artificial intelligence makes it easy to synthesize videos into new, fictitious ones often called “deepfakes.” “We’ll shortly live in a world where our eyes routinely deceive us,” wrote my colleague Franklin Foer. “Put differently, we’re not so far from the collapse of reality.” Deepfakes are one way of melting reality; another is changing the simple phone photograph from a decent approximation of the reality we see with our eyes to something much different. It is ubiquitous and low temperature, but no less effective. And probably a lot more important to the future of technology companies.
In How to See the World, the media scholar Nicholas Mirzoeff calls photography “a way to see the world enabled by machines.” We’re talking about not only the use of machines, but the “network society” in which they produce images. And to Mirzoeff, there is no better example of the “new networked, urban global youth culture” than the selfie.
The phone manufacturers and app makers seem to agree that selfies drive their business ecosystems. They’ve dedicated enormous resources to taking pictures of faces. Apple has literally created new silicon chips to be able to, as the company promises, consider your face “even before you shoot.” First, there’s facial detection. Then, the phone fixes on the face’s “landmarks” to know where the eyes and mouth and other features are. Finally, the face and rest of the foreground are depth mapped, so that a face can pop out from the background. All these data are available to app developers, which is one reason for the proliferation of apps to manipulate the face, such as Mug Life, which takes single photos and turns them into quasi-realistic fake videos on command.
All this work, which was incredibly difficult a decade ago, and possible only on cloud servers very recently, now runs right on the phone, as Apple has described. The company trained one machine-learning model to find faces in an enormous number of pieces of images. The model was too big, though, so they trained a smaller version on the outputs of the first. That trick made running it on a phone possible. Every photo every iPhone takes is thanks, in some small part, to these millions of images, filtered twice through an enormous machine-learning system.
But it’s not just that the camera knows there’s a face and where the eyes are. Cameras also now capture multiple images in the moment to synthesize new ones. Night Sight, a new feature for the Google Pixel, is the best-explained example of how this works. Google developed new techniques for combining multiple inferior (noisy, dark) images into one superior (cleaner, brighter) image. Any photo is really a blend of a bunch of photos captured around the central exposure. But then, as with Apple, Google deploys machine-learning algorithms over the top of these images. The one the company has described publicly helps with white balancing—which helps deliver realistic color in a picture—in low light. It also told the Verge that “its machine learning detects what objects are in the frame, and the camera is smart enough to know what color they are supposed to have.” Consider how different that is from a normal photograph. Google’s camera is not capturing what is, but what, statistically, is likely.
Picture-taking has become ever more automatic. It’s like commercial pilots flying planes: They are in manual control for only a tiny percentage of a given trip. Our phone-computer-cameras seamlessly, invisibly blur the distinctions between things a camera can do and things a computer can do. There are continuities with pre-existing techniques, of course, but only if you plot the progress of digital photography on some kind of logarithmic scale.
High-dynamic range, or HDR, photography became popular in the 2000s, dominating the early photo-sharing site Flickr. Photographers captured multiple (usually three) images of the same scene at different exposures. Then, they stacked the images on top of one another and took the information about the shadows from the brightest photo and the information about the highlights from the darkest photo. Put them all together, and they could generate beautiful surreality. In the right hands, an HDR photo could create a scene that is much more like what our eyes see than what most cameras normally produce.
Our eyes, especially under conditions of variable brightness, can compensate dynamically. Try taking a picture of the moon, for example. The moon itself is very bright, and if you try to take a photo of it, you have to expose it as if it were high noon. But the night is dark, obviously, and so to get a picture of the moon with detail, the rest of the scene is essentially black. Our eyes can see both the moon and the earthly landscape with no problem.
Google and Apple both want to make the HDR process as automatic as our eyes’ adjustments. They’ve incorporated HDR into their default cameras, drawing from a burst of images (Google uses up to 15). HDR has become simply how pictures are taken for most people. As with the skin-smoothing, it no longer really matters if that’s what our eyes would see. Some new products’ goal is to surpass our own bodies’ impressive visual abilities. “The goal of Night Sight is to make photographs of scenes so dark that you can’t see them clearly with your own eyes — almost like a super-power!” Google writes.
Since the 19th century, cameras have been able to capture images at different speeds, wavelengths, and magnifications, which reveal previously hidden worlds. What’s fascinating about the current changes in phone photography is that they are as much about revealing what we want to look like as they are investigations of the world. It’s as if we’ve discovered a probe for finding and sharing versions of our faces—or even ourselves—and it’s this process that now drives the behavior of the most innovative, most profitable companies in the world.
Meanwhile, companies and governments can do something else with your face: create facial-recognition technologies that turn any camera into a surveillance machine. Google has pledged not to sell a “general-purpose facial recognition” product until the ethical issues with the technology have been resolved, but Amazon Rekognition is available now, as is Microsoft’s Face API, to say nothing of Chinese internet companies’ even more extensive efforts.
The global economy is wired up to your face. And it is willing to move heaven and Earth to let you see what you want to see.
All Rights Reserved for Alexis C. Madrigal