The End of Infinite Data Storage Can Set You Free

The belief that we could save endlessly online turned us all into information hoarders. What society needs instead is better systems for preserving public knowledge.

In early January, Google sent an email notifying me that I’d used up 76 percent of my account’s free storage space—the 15 gigabytes shared across Gmail, Google Photos, and Google Drive. I had been vaguely aware that a storage limit did in fact exist and that I might someday reach it, but the notification still caught me off guard. Having lived with the illusion of effectively infinite Google capacity for a decade and a half, I could hardly imagine a world in which I would need to ration my cloud usage and had subconsciously assumed the day would never arrive.

If I failed to get my act together and did exceed my limit, the email informed me, a variety of life-disrupting inconveniences would begin: I wouldn’t be able to send or receive emails, upload files to Drive, create Google Docs, or back up any new photos. I started noticing the ever-present progress bar in the bottom corner of my Gmail window, ticking incrementally toward 100 percent of my limit (and adding a new layer of anxiety to an interface that already generates plenty).

In the same email, of course, Google offered me an easy way out, encouraging me to simply pay for a Google One storage plan—a mere $1.99 per month for 100 gigabytes, or $2.99 for 200. But reaching a personal free storage ceiling and having to pay for more, however inexpensive it is, marks a perceptual transition, an acknowledgment that the “cloud” is yet another finite resource distributed across physical servers, not an immaterial ether that can absorb exponentially growing amounts of information at no cost. And if Google eventually charges more for storage, we will almost certainly keep paying without thinking twice. Most likely, we won’t have much of a choice.

When Google launched Gmail in 2004, each account provided an unprecedented gigabyte of free storage space, more than 100 times what Yahoo and Hotmail offered at the time. The following year, that capacity doubled to 2 gigabytes in response to users who were already approaching their storage limit, prompting Georges Harik, then Gmail’s product management director, to suggest that Google should “keep giving people more space forever.” Google would expand individual capacity to 10 gigabytes in 2012 (with the launch of Google Drive) and then 15 gigabytes a year later, when Google unified its various repositories of personal data under a single umbrella with a single storage limit. In 2015, Google Photos spun off from the flatlining Google+ social network, launching with unlimited cloud storage for “high-quality” photos.

Then the trend of ever-increasing Google storage finally reversed. Near the end of 2020, the company announced that it would begin counting “high-quality” photos toward the 15-gigabyte limit. The announcement estimated that 80 percent of users would be able to store about three years’ worth of data before exceeding their free capacity (Google’s personalized tracker currently estimates that I have 10 months left).

By fostering the sense that our wells of personal information were bottomless, Google turned us all into information hoarders. At the time of the Google Photos announcement in late 2020, the service contained more than 4 trillion photos, with 28 billion new photos and videos being uploaded every week. Having transcended the physical scarcity of film, we now capture anything that seems remotely likely to hold future interest, from vacation photos to screenshots, deferring a stricter assessment of value that we’ll likely never get around to.

Many of the photos and videos we amass are never even viewed again after they are taken—we just toss them all into Google’s big bucket, knowing we’ll be able to find what we need later. We approach email similarly, archiving everything because the marginal cost of doing so is effectively zero, and there has been little reason to delete anything thus far. Anxious that we might delete something we will end up needing later, we err on the side of caution by saving it all. The prospect of having to downsize or even just organize one’s own archive of photos, emails, or files—the morass of data that slowly, haphazardly grows into a digital imprint of one’s life—is daunting. Many of us wouldn’t know how to decide which photos are worth keeping and which to delete, having always assumed that we could just keep them all.

These are not mere habits. They are fundamental expressions of our evolving relationship with information. Google’s first and most revolutionary product, search, enables us to be casual, even messy, with our data. We are only able to thoughtlessly accumulate such massive volumes of information in our personal accounts because we have search capabilities powerful enough to help us navigate that data, the same way we navigate the public internet. Largely because of Google, searching has replaced sorting in personal information management; instead of organizing our data using a legible system or knowing where things are, it can all go into one seemingly jumbled pile. It is not surprising, then, that to a younger generation raised on search, “the concept of file folders and directories, essential to previous generations’ understanding of computers, is gibberish.”

Our indecision about this digital inventory radiates outward from our private spheres; our failure to consider what we should keep and what we should discard, or to organize any of it, inscribes itself on the internet at large. This trade-off—between personal storage capacity and the imperative to carefully manage and organize the information we produce—appears even more consequential when we consider the internet’s present shortcomings as a public archive of knowledge, a condition that has likely been exacerbated by individuals’ ability to store huge amounts of data in private cloud repositories rather than in publicly accessible places.

As the internet has become dominated by a handful of major platforms, collective information stewardship has become more atomized, saddling individual users with the monumental task of finding ways to preserve the digital information they want to keep. Despite Google’s stated mission “to organize the world’s information and make it universally accessible and useful,” an effort that has been successful in many ways, the company has also contributed to the internet’s privatization. Personal archives have thus expanded tremendously during a time when large swaths of the internet have vanished altogether.

The rise of email newsletters on platforms like Substack, for example, has moved blogging to private inboxes, meaning that thousands of individuals frequently store their own duplicative copy of a post that would have previously just been hosted on a single server. (Meanwhile, many blogs that were active 10 years ago are no longer available on the internet at all.) As social media has grown to account for a larger share of internet content, that content has become more ephemeral, vulnerable to the disappearance of the platform on which it was shared or deletion by the user who created it. Today, the internet depends as heavily upon the unreliable stewardship of individuals and corporations as it did 20 years ago.

While the age of inexpensive or free personal data storage is far from over, its slowing expansion presents an opportunity to reimagine our relationship with the information that we possess as individuals and as a society. At the individual level, we might develop better systems for organizing, prioritizing, and even discarding the information that we accumulate—not because we’re concerned about running out of space, but because our hoarding behavior diminishes the utility of the information that is truly valuable. A more decisive attitude toward what belongs in our personal archives might improve our understanding of what information we actually value, while also enabling us to undertake similar efforts at the collective scale.

Such efforts are needed to combat the deterioration of the infrastructure for publicly available knowledge. As with any public good, the solution to this problem should not be a multitude of private data silos, only searchable by their individual owners, but an archive that is organized coherently so that anyone can reliably find what they need. In a 2021 Atlantic piece, Jonathan Zittrain invokes the library as a bulwark against the internet’s knowledge preservation problems: “Libraries exist, and they still have books in them, but they aren’t stewarding a huge percentage of the information that people are linking to … No one is. The flexibility of the web … diffuses responsibility for this core societal function.”

Although it may be logistically impossible for a library-like institution to preserve an archive of the internet that is even close to complete, libraries do offer a valuable model for public knowledge preservation, recentralizing the responsibility that the web has diffused. Such institutions and services would certainly improve upon the current ad hoc approach.

While the Internet Archive’s Wayback Machine (which describes itself as a digital library) scrapes the web continuously, saving as much of it as possible as frequently as possible, there are other complementary efforts to help preserve the internet in publicly accessible ways. One service, Perma, minimizes link rot by converting hyperlinks in scholarly documents into “reliable, unbreakable link(s) to an unalterable record of any page you’ve cited” (Perma’s website notes that more than half of all cited links in Supreme Court opinions no longer point to the intended page). Amber provides a similar service for websites, automatically preserving snapshots of linked pages in case the original versions become unavailable. And libraries themselves still exist: As Zittrain explains, tools like Perma and Amber enable these institutions to fulfill their potential as archives of digital information and organizing systems for knowledge, by reducing the ephemerality of material that is worth preserving.

The most ambitious and holistic efforts to make digital information more durable and publicly accessible, arguably, are Web3 and the blockchain technology that underpins it. Blockchains are inherently immutable and distributed publicly among peer-to-peer networks, seeming to directly address the shortcomings of the privatized, individualized internet. Web3 has also introduced new forms of speculative ownership such as NFTs, which seem antithetical to that public spirit; however, even in these applications, the transaction data itself remains broadly accessible.

All of these solutions assume the ongoing growth of digital storage capacity. But what if the cloud, or even one major company, like Google, actually did run out of space? Experts believe this is unlikely, despite the world’s rapidly growing production of data—a scenario that is “far away beyond a horizon that society will never reach,” as Harvard economist Shane Greenstein says, due to a reliable ongoing cadence of innovations in storage efficiency. Engineer J. Metz does anticipate, however, that as the sheer volume of data continues to increase, finding our information could likely become more difficult than finding a place to store it.

Regardless of information’s ability to keep growing, it will be necessary to restore collective approaches to organizing that information and to continue rebuilding the infrastructure for public knowledge that has atrophied alongside the rise of private archiving. Instead of personally owning most of the information we need—on our own devices and in the cloud—we can inhabit a world where more of that information exists in the public sphere and we simply know where to find it.

All Rights Reserved for Drew Austin

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.