People used to think the crowdsourced encyclopedia represented all that was wrong with the web. Now it’s a beacon of so much that’s right.
Remember when Wikipedia was a joke?
In its first decade of life, the website appeared in as many punch lines as headlines. The Office‘s Michael Scott called it “the best thing ever,” because “anyone in the world can write anything they want about any subject—so you know you are getting the best possible information.” Praising Wikipedia, by restating its mission, meant self-identifying as an idiot.
That was in 2007. Today, Wikipedia is the eighth-most-visited site in the world. The English-language version recently surpassed 6 million articles and 3.5 billion words; edits materialize at a rate of 1.8 per second. But perhaps more remarkable than Wikipedia’s success is how little its reputation has changed. It was criticized as it rose, and now makes its final ascent to … muted criticism. To confess that you’ve just repeated a fact you learned on Wikipedia is still to admit something mildly shameful. It’s as though all those questions that used to pepper think pieces in the mid-2000s—Will it work? Can it be trusted? Is it better than Encyclopedia Britannica?—are still rhetorical, when they have already been answered, time and again, in the affirmative.
Of course, muted criticism is far better than what the other giants at the top of the internet are getting these days. Pick any inflection point you like from the past several years—the Trump election, Brexit, any one of a number of data breaches, alt-right feeding frenzies, or standoffish statements to Congress—and you’ll see the malign hand of platform monopolies. Not too long ago, techno-utopianism was the ambient vibe of the elite ideas industry; now it has become the ethos that dare not speak its name. Hardly anyone can talk abstractly about freedom and connection and collaboration, the blithe watchwords of the mid-2000s, without making a mental list of the internet’s more concrete negative externalities.
Yet in an era when Silicon Valley’s promises look less gilded than before, Wikipedia shines by comparison. It is the only not-for-profit site in the top 10, and one of only a handful in the top 100. It does not plaster itself with advertising, intrude on privacy, or provide a breeding ground for neo-Nazi trolling. Like Instagram, Twitter, and Facebook, it broadcasts user-generated content. Unlike them, it makes its product de-personified, collaborative, and for the general good. More than an encyclopedia, Wikipedia has become a community, a library, a constitution, an experiment, a political manifesto—the closest thing there is to an online public square. It is one of the few remaining places that retains the faintly utopian glow of the early World Wide Web. A free encyclopedia encompassing the whole of human knowledge, written almost entirely by unpaid volunteers: Can you believe that was the one that worked?
Wikipedia is not perfect. The problems that it does have—and there are plenty of them—are discussed in great detail on Wikipedia itself, often in dedicated forums for self-critique with titles like “Why Wikipedia is not so great.” One contributor observes that “many of the articles are of poor quality.” Another worries that “consensus on Wikipedia may be a problematic form of knowledge production.” A third notes that “someone can just come and edit this very page and put in ‘pens are for cats only.’” Like the rest of the tech world, the site suffers from a gender imbalance; by recent estimates, 90 percent of its volunteer editors are men. Women and nonbinary contributors report frequent harassment from their fellow Wikipedians—trolling, doxing, hacking, death threats. The site’s parent organization has repeatedly owned up to the situation and taken halting steps to redress it; several years ago, it allocated hundreds of thousands of dollars to a “community health initiative.” But in a way, the means to fix Wikipedia’s shortcomings, in terms of both culture and coverage, are already in place: Witness the rise of feminist edit-athons.
Wikipedia is built on the personal interests and idiosyncrasies of its contributors. You could even say it is built on love.
The site’s innovations have always been cultural rather than computational. It was created using existing technology. This remains the single most underestimated and misunderstood aspect of the project: its emotional architecture. Wikipedia is built on the personal interests and idiosyncrasies of its contributors; in fact, without getting gooey, you could even say it is built on love. Editors’ passions can drive the site deep into inconsequential territory—exhaustive detailing of dozens of different kinds of embroidery software, lists dedicated to bespectacled baseball players, a brief but moving biographical sketch of Khanzir, the only pig in Afghanistan. No knowledge is truly useless, but at its best, Wikipedia weds this ranging interest to the kind of pertinence where Larry David’s “Pretty, pretty good!” is given as an example of rhetorical epizeuxis. At these moments, it can feel like one of the few parts of the internet that is improving.
One challenge in seeing Wikipedia clearly is that the favored point of comparison for the site is still, in 2020, Encyclopedia Britannica. Not even the online Britannica, which is still kicking, but the print version, which ceased publication in 2012. If you encountered the words Encyclopedia Britannica recently, they were likely in a discussion about Wikipedia. But when did you last see a physical copy of these books? After months of reading about Wikipedia, which meant reading about Britannica, I finally saw the paper encyclopedia in person. It was on the sidewalk, being thrown away. The 24 burgundy-bound volumes had been stacked with care, looking regal before their garbage-truck funeral. If bought new in 1965, each of them would have cost $10.50—the equivalent of $85, adjusted for inflation. Today, they are so unsalable that thrift stores refuse them as donations.
Wikipedia and Britannica do, at least, share a certain lineage. The idea of building a complete compendium of human knowledge has existed for centuries, and there was always talk of finding some better substrate than paper: H. G. Wells thought microfilm might be the key to building what he called the “World Brain”; Thomas Edison bet on wafer-thin slices of nickel. But for most people who were alive in the earliest days of the internet, an encyclopedia was a book, plain and simple. Back then, it made sense to pit Wikipedia and Britannica against each other. It made sense to highlight Britannica‘s strengths—its rigorous editing and fact-checking procedures; its roster of illustrious contributors, including three US presidents and a host of Nobel laureates, Academy Award winners, novelists, and inventors—and to question whether amateurs on the internet could create a product even half as good. Wikipedia was an unknown quantity; the name for what it did, crowdsourcing, didn’t even exist until 2005, when two WIRED editors coined the word.
That same year, the journal Nature released the first major head-to-head comparison study. It revealed that, for articles on science, at least, the two resources were nearly comparable: Britannica averaged three minor mistakes per entry, while Wikipedia averaged four. (Britannica claimed “almost everything about the journal’s investigation … was wrong and misleading,” but Nature stuck by its findings.) Nine years later, a working paper from Harvard Business School found that Wikipedia was more left-leaning than Britannica—mostly because the articles tended to be longer and so were likelier to contain partisan “code words.” But the bias came out in the wash. The more revisions a Wikipedia article had, the more neutral it became. On a “per-word basis,” the researchers wrote, the political bent “hardly differs.”
But some important differences don’t readily show up in quantitative, side-by-side comparisons. For instance, there’s the fact that people tend to read Wikipedia daily, whereas Britannica had the quality of fine china, as much a display object as a reference work. The edition I encountered by the roadside was in suspiciously good shape. Although the covers were a little wilted, the spines were uncracked and the pages immaculate—telltale signs of 50 years of infrequent use. And as I learned when I retrieved as many volumes as I could carry home, the contents are an antidote for anyone waxing nostalgic.
I found the articles in my ’65 Britannica mostly high quality and high minded, but the tone of breezy acumen could become imprecise. The section on Brazil’s education system, for instance, says it is “good or bad depending on which statistics one takes and how they are interpreted.” Almost all the articles are authored by white men, and some were already 30 years out of date when they were published. Noting this half-life in 1974, the critic Peter Prescott wrote that “encyclopedias are like loaves of bread: the sooner used, the better, for they are growing stale before they even reach the shelf.” The Britannica editors took half a century to get on board with cinema; in the 1965 edition, there is no entry on Luis Buñuel, one of the fathers of modern film. You can pretty much forget about television. Lord Byron, meanwhile, commands four whole pages. (This conservative tendency wasn’t limited to Britannica. Growing up, I remember reading the entry on dating in a hand-me-down World Book and being baffled by its emphasis on sharing milkshakes.)
The worthies who wrote these entries, moreover, didn’t come cheap. According to an article in The Atlantic from 1974, Britannica contributors earned 10 cents per word, on average—about 50 cents in today’s money. Sometimes they got a full encyclopedia set as a bonus. They apparently didn’t show much gratitude for this compensation; the editors complained of missed deadlines, petulant behavior, lazy mistakes, and outright bias. “People in the arts all fancy themselves good writers, and they gave us the most difficult time,” one editor told The Atlantic. At Britannica rates, the English-language version of Wikipedia would cost $1.75 billion to produce.
There was another seldom remembered limitation to these gospel tomes: They were, in a way, shrinking. The total length of paper encyclopedias remained relatively finite, but the number of facts in the universe kept growing, leading to attrition and abbreviation. It was a zero-sum game in which adding new articles meant deleting or curtailing incumbent information. Even the most noteworthy were not immune; between 1965 and 1989, Bach’s Britannica entry shrank by two pages.
By the time the internet came into being, a limitless encyclopedia was not just a natural idea but an obvious one. Yet there was still a sense—even among the pioneers of the web—that, although the substrate was new, the top-down, expert-driven Britannica model should remain in place.
Wikipedia isn’t raised up wholesale, like a barn; it’s assembled grain by grain, like a termite mound.
In 2000, 10 months before Jimmy Wales and Larry Sanger cofounded Wikipedia, the pair started a site called Nupedia, planning to source articles from noted scholars and put them through seven rounds of editorial oversight. But the site never got off the ground; after a year, there were fewer than two dozen entries. (Wales, who wrote one of them himself, told The New Yorker “it felt like homework.”) When Sanger got wind of a collaborative software tool called a wiki—from the Hawaiian wikiwiki, or “quickly”—he and Wales decided to set one up as a means of generating raw material for Nupedia. They assumed nothing good would come of it, but within a year Wikipedia had 20,000 articles. By the time Nupedia’s servers went down a year later, the original site had become a husk, and the seed it carried had grown beyond any expectation.
Sanger left Wikipedia in early 2003, telling the Financial Times he was fed up with the “trolls” and “anarchist types” who were “opposed to the idea that anyone should have any kind of authority that others do not.” Three years after that, he founded a rival called Citizendium, conceived as an expert-amateur partnership. The same year, another influential Wikipedia editor, Eugene Izhikevich, launched Scholarpedia, an invitation-only, peer-reviewed online encyclopedia with a focus on the sciences. Citizendium struggled to attract both funding and contributors and is now moribund; Scholarpedia, which started out with less lofty ambitions, has fewer than 2,000 articles. But more notable was why these sites languished. They came up against a simple and apparently insoluble problem, the same one that Nupedia encountered and Wikipedia surmounted: Most experts do not want to contribute to a free online encyclopedia.
This barrier to entry exists even in places where there are many experts and large volumes of material to draw from. Napoleon Bonaparte, for instance, is the subject of tens of thousands of books. There are probably more dedicated historians of the Corsican general than of almost any other historical figure, but so far these scholars, even the retired or especially enthusiastic ones, have been disinclined to share their bounty. Citizendium’s entry on Napoleon, around 5,000 words long and unedited for the past six years, is missing events as major as the decisive Battle of Borodino, which claimed 70,000 casualties, and the succession of Napoleon II. By contrast, Wikipedia’s article on Napoleon sits at around 18,000 words long and runs to more than 350 sources.
The Wikipedia replacement products revealed another problem with the top-down model: With so few contributors, coverage was spotty and gaps were hard to fill. Scholarpedia’s entry on neuroscience makes no mention of serotonin or the frontal lobes. At Citizendium, Sanger refused to recognize women’s studies as a top-level category, describing the discipline as too “politically correct.” (Today, he says “it wasn’t about women’s studies in particular” but about “too much overlap with existing groups.”) A wiki with a more horizontal hierarchy, on the other hand, can self-correct. No matter how politically touchy or intellectually abstruse the topic, the crowd develops consensus. On the English-language Wikipedia, particularly controversial entries, like those on George W. Bush or Jesus Christ, have edit counts in the thousands.
Wikipedia, in other words, isn’t raised up wholesale, like a barn; it’s assembled grain by grain, like a termite mound. The smallness of the grains, and of the workers carrying them, makes the project’s scale seem impossible. But it is exactly this incrementalism that puts immensity within reach.
The heroes of Wikipedia are not giants in their fields but so-called WikiGnomes—editors who sweep up typos, arrange articles in neatly categorized piles, and scrub away vandalism. This work is often thankless, but it does not seem to be joyless. It is a common starting point for Wikipedians, and many are content to stay there. According to a 2016 paper in the journal Management Science, the median edit length on Wikipedia is just 37 characters, an effort that might take a few seconds.
From there, though, many volunteers are drawn deeper into the site’s culture. They discuss their edits on Talk pages; they display their interests and abilities on User pages; some vie to reach the top of the edit-count leaderboard. An elect few become administrators; while around a quarter of a million people edit Wikipedia daily, only around 1,100 accounts have admin privileges. The site is deep and complex enough—by one count, its policy directives and suggestions run to more than 150,000 words—that its most committed adherents must become almost like lawyers, appealing to precedent and arguing their case. As with the law, there are different schools of interpretation; the two largest of these are deletionists and inclusionists. Deletionists favor quality over quantity, and notability over utility. Inclusionists are the opposite.
Most dedicated editors, whether deletionist or inclusionist, are that category of person who sits somewhere between expert and amateur: the enthusiast. Think of a railfan or a trainspotter. (Wikipedians disagree on which is the better term.) Their knowledge of trains is quite different from an engineer’s or a railway historian’s; you can’t major in trainspotting or become credentialed as a railfan. But these people are a legitimate kind of expert nonetheless. Previously, their folk knowledge was reposited in online forums, radio shows, and specialist magazines. Wikipedia harnessed it for the first time. The entry on the famous locomotive the Flying Scotsman is 4,000 words long and includes eye-wateringly detailed information on its renumbering, series of owners, smoke deflectors, and restoration, from contributors who seem to have the most intimate, hard-won knowledge of the train’s working. (“It was deemed that the A4 boiler had deteriorated into a worse state than the spare due to the higher operating pressures the locomotive had experienced following the up-rating of the locomotive to 250 psi.”)
Pedantry this powerful is itself a kind of engine, and it is fueled by an enthusiasm that verges on love. Many early critiques of computer-assisted reference works feared a vital human quality would be stripped out in favor of bland fact-speak. That 1974 article in The Atlantic presaged this concern well: “Accuracy, of course, can better be won by a committee armed with computers than by a single intelligence. But while accuracy binds the trust between reader and contributor, eccentricity and elegance and surprise are the singular qualities that make learning an inviting transaction. And they are not qualities we associate with committees.” Yet Wikipedia has eccentricity, elegance, and surprise in abundance, especially in those moments when enthusiasm becomes excess and detail is rendered so finely (and pointlessly) that it becomes beautiful.
In the article on the sexual revolution, there was a line, since deleted, that read, “For those who were not there to experience it, it may be difficult to imagine how risk-free sex was during the 1960s and 1970s.” This anonymous autobiography in miniature is an intriguing piece of editorializing, but it’s also a little legacy of the sexual revolution all by itself, a rueful reflection on a moment of freedom that didn’t last. (The editor who added “Citation needed” is part of that story as well.) In the article on the anticommunist intellectual Frank Knopfelmacher, we learn that “his protracted, usually freewheeling, invariably slanderous late-night telephone monologues (visited alike upon associates and, more often, antagonists) retained a mythic status for decades among Australian intellectuals.” The Hong Kong novelist Lillian Lee, we are told, seeks “freedom and happiness, not fame.”
Pedants have a reputation for humorlessness, but for Wikipedians a sense of humor is at the core of the good-faith collaboration that defines the project. There is probably no need for an exhaustive history of a giant straw goat erected in a Swedish town each Christmas, but the article on the Gävle Goat chronicles its annual fate fastidiously. It is prone to vandalism by fire, and the article centers around an exacting timeline that lists the date of destruction, the method of destruction, and the new security measures put in place every year since 1966. (In 2005, it was “burnt by unknown vandals reportedly dressed as Santa and the gingerbread man, by shooting a flaming arrow at the goat.”)
Why do Wikipedians perform these millions of hours of labor, some expended on a giant straw goat, without pay? Because they don’t experience them as labor. “It’s a misconception people work for free,” Wales told the site Hacker Noon in 2018. “They have fun for free.” A 2011 survey of more than 5,000 Wikipedia contributors listed “It’s fun” as one of the primary reasons they edited the site.
This is why the meta side of Wikipedia—the Talk pages, the essay commentaries, the policies—is suffused with nerdy jokes. We’re so used to equating seriousness with importance that this jars at first: It’s hard to square the encapsulation of all human knowledge with a policy called “Don’t be a dick” (since revised to “Don’t be a jerk”). But expressing the directive that way carries a purpose. It’s the same purpose that drives Wikipedians to collect and celebrate the site’s “Lamest edit wars,” which include long-running skirmishes on Freddie Mercury’s ancestry, the provenance of Caesar salad, the proper pronunciation of J. K. Rowling’s surname (“Perhaps it rhymes with ‘Trolling’?”), the wording of certain captions (“Is the cat depicted really smiling?”), and the threshold of notoriety required to appear on a list of fictional badgers.
Few architects of a world encyclopedia would think to include a forum for jokes, and in the unlikely event that they did, no one could anticipate that it would be important. But on Wikipedia the jokes are very important. They defuse tensions. They foster joyful cooperation. They encourage humility. They promote further reading and further editing. They also represent a surprise return to the earliest days of Enlightenment reference works. Samuel Johnson’s dictionary, compiled in 1755, gives one definition of “dull” as “not exhilarating; not delightful: as, to make dictionaries is dull work.” Perhaps the most important encyclopedia of the late modern period, the Encyclopédie, is barbed with satirical and anticlerical quips: The entry on “Cannibals” cross-references with “Communion.”
If it is a mistake to keep comparing Wikipedia to Britannica, it is another kind of category error to judge Wikipedia against its peers in the internet’s top 10. Wikipedia ought to serve as a model for many forms of social endeavor online, but its lessons do not translate readily into the commercial sphere. It is a noncommercial enterprise, with no investors or shareholders to appease, no financial imperative to grow or die, and no standing to maintain in the arms race to amass data and attain AI supremacy at all costs. At Jimmy Wales’ wedding, one of the maids of honor toasted him as the sole internet mogul who wasn’t a billionaire.
The site has helped its fellow tech behemoths, though, especially with the march of AI. Wikipedia’s liberal content licenses and vast information hoard have allowed developers to train neural networks much more quickly, cheaply, and widely than proprietary data sets ever could have. When you ask Apple’s Siri or Amazon’s Alexa a question, Wikipedia helps provide the answer. When you Google a famous person or place, Wikipedia often informs the “knowledge panel” that appears alongside your search results.
These tools were made possible by a project called Wikidata, the next ambitious step toward realizing the age-old dream of creating a “World Brain.” It began with a Croatian computer scientist and Wikipedia editor named Denny Vrandečić. He was enthralled with the online encyclopedia’s content but felt frustrated that users could not ask it questions that required drawing on knowledge from multiple entries across the site. Vrandečić wanted Wikipedia to be able to answer a query like “What are the 20 largest cities in the world that have a female mayor?” “The knowledge is obviously in Wikipedia, but it’s hidden,” Vrandečić told me. To get it out “would be huge work.”
Drawing on an idea from the early internet called “the semantic web,” Vrandečić set out to structure and enrich Wikipedia’s data set so that it could, in effect, begin to synthesize its own knowledge. If there were some way to tag women and mayors and cities by population size, then a correctly coded query could return the 20 largest cities with a female mayor automatically. Vrandečić had edited Wikipedia in Croatian, English, and German, so he recognized the limitations of using plain English semantic tagging. Instead, he chose numerical codes. Any reference to the book Treasure Island might be tagged with the code Q185118, for example, or the color brown with Q47071.
Vrandečić assumed this coding and tagging would have to be carried out by bots. But of the 80 million items that have been added to Wikidata so far, around half have been entered by human volunteers, a level of crowdsourcing that has surprised even Wikidata’s creators. Editing Wikidata and editing Wikipedia, it turns out, are different enough that they don’t cannibalize the same contributors. Wikipedia attracts people interested in writing prose, and Wikidata compels dot-connectors, puzzle-solvers, and completionists. (Its product manager, Lydia Pintscher, still comes home from a movie and manually copies the cast list from IMDb into Wikidata with the appropriate tags.)
As platforms like Google and Alexa work to provide instant answers to random questions, Wikidata will be one of the key architectures that link the world’s information together. The system still results in errors sometimes—that’s why Siri briefly thought Bulgaria’s national anthem was “Despacito”—but its prospective scale is already more ambitious than Wikipedia’s. There are subprojects aiming to itemize every sitting politician on earth, every painting in every public collection worldwide, and every gene in the human genome into searchable, adaptable, and machine-readable form.
The jokes will still be there. Consider Wikidata’s numerical tag for the author Douglas Adams, Q42. In Adams’ book The Hitchhiker’s Guide to the Galaxy, a group of hyperintelligent beings build a vast, powerful computer called Deep Thought, which they ask for the “Answer to the Ultimate Question of Life, the Universe, and Everything.” What comes out is the number 42. That wink of self-awareness—at the folly and joy of building something as preposterous and powerful as a world brain—is why, with Wikipedia, you know you are getting the best possible information.
All Rights Reserved for Richard Cooke