Give These Apps Some Notes and They’ll Write Emails for You

Entrepreneurs are building tools that create emails or marketing copy using GPT-3, text-generation technology released earlier this year.

Michael Shuffet didn’t waste any keystrokes when responding to a message about the automated email writer he’s building. He tapped out “Yes 45m” and clicked a button marked “Generate email.” His app, Compose.ai, drafted a courteous three-sentence reply with a link to schedule a 45-minute call. Shuffet checked it over and clicked Send.

Compose is one of several automated writing tools built on striking new text-generation technology known as GPT-3, revealed in June by OpenAI, an artificial intelligence research institute. GPT-3 went viral this summer after people marveled at how it could fluently crank out memes, code, self-help blog posts, and Hemingway-style Harry Potterfanfic. WIRED and others showed that GPT-3 can also spout nonsense and hate, because its algorithms learned to generate text by digesting wide swaths of the internet.

Now, some entrepreneurs are harnessing GPT-3 to perform real work, like drafting emails or marketing copy. “Billions of people write email,” says Shuffet, a cocreator of Compose. “It’s a space that has not had much innovation for years.” Google’s Gmail will suggest ways to complete sentences and supply short, peppy replies to some emails—”Thanks so much!” But it doesn’t draft fuller messages.

Snazzy.ai, which launched to early testers last week, generates verbiage for web pages and Google ads, based on basic information about a campaign or brand. When supplied with keywords about WIRED and a phrase from its founding manifesto, Snazzy suggested marketing gloss with bits of robotic inspiration. One proffered Google ad included the coinage “geekspace,” a word that is rare online and has appeared on WIRED.com only twice, most recently eight years ago.

Chris Frantz, a Snazzy cofounder and marketer by trade, says the service reduces the drudgery of creating an initial splurge of ideas to be honed into a fresh campaign. “The goal is to offload the somewhat monotonous job of writing the copy, and move to the editing part,” he says.

VWO, which helps companies measure the performance of marketing content, has tested GPT-3 against human-written material for clients including travel site Booking.com. Of six tests with statistically significant results, AI-generated copy gained more clicks or interactions twice, and human-authored copy performed better once. The remaining three matchups were tied. More tests are ongoing, but VWO founder Paras Chopra believes marketers will gravitate to auto-generated material because it speeds experimentation. “The more you can test, the higher the likelihood you end up impacting your business metrics,” he says.

For email, Compose and others trying to inject GPT-3 have converged on a similar design: Write terse bullet points, click a button, and the AI will transform your laconic input into flowing paragraphs.

In a demo, Matt Shumer, a cofounder of OthersideAI, typed six short lines in response to a mock email asking what features coders should build next. When he clicked a button marked “Generate,” 21 words of snippets like “ofc,” “voice integration is easiest,” and “free every monday at 1pm” became 43 flowing words in reply.

Behind the scenes, apps built on GPT-3 send snippets of text dubbed “prompts” to OpenAI’s cloud servers. GPT-3 sends back new text it calculates will follow seamlessly from the input, based on statistical patterns it saw in online text.

That unusual way of interacting with a computer makes GPT-3 fun to play with but tricky to work with. Its broad experience of the web and lack of grounding in physical reality means it often veers into non sequiturs and nonsense.

Shumer says OthersideAI has been focusing on making GPT-3’s output “reliable and safe.” An early version of the app was too creative. It correctly interpreted prompts like “1pm meeting” but added fabrications like nonexistent doctor’s appointments. Shumer says that’s been fixed, and he’s now testing ways to make his service mimic a person’s writing style. He hasn’t yet decided if that should stretch to including curse words.

WIRED’s own experiments with the GPT-3-powered service Magic Email showed both the promise and perils of automated writing.

Drafting formulaic emails such as scheduling calls generally went well. Given three blunt bullet points—thanks; can call u Weds around 6pm yr time; yr new project snds interesting—Magic Email drafted the six-line email below, conveying the same information more warmly.

That could be useful for conversations where telegraphic, ungrammatical messages would come off as impolite. Magic Email provides buttons to give suggestions a thumbs up or down, and to ask the algorithms to try again.

The writing algorithms were less reliable on more complex messages. When drafting an email including a link about Covid-19 research, the app incorrectly wrote that it was related to skin cancer, perhaps because another part of the prompt mentioned dermatology.

Fixing such glitches was easy but raised the question of whether correcting auto drafts is more efficient than writing the email from scratch. Samanyou Garg, the UK developer behind Magic Email, says his service is improving and can be useful if as much as 30 percent of generated text needs tweaking. “I’ve had good feedback from people who say they’re saving a lot of time,” he says.

Magic Email refused to write at all when prompted with bullet points about online hate speech that included the phrase “Misogynist content big problem on Facebook.” A message appeared warning, “Your provided input seems to contain unsafe content so we have blocked your request.”

The safety filter may have been overcautious on that occasion, but it’s there with good reason. GPT-3 has learned the patterns of unsavory text online. Researchers at the Middlebury Institute of International Studies reported last month that it can fluently mimic anti-Semitic and terrorist content. More than four years since Microsoft’s chatbot Tay went rogue and tweeted favorably about Hitler, researchers still don’t know how to prevent algorithms from repeating or amplifying bias or bad judgment in text or other data.

OpenAI says that it vets customers before giving them access to GPT-3 and reviews every application of the service before it goes live. It has implemented a toxicity filter and suggests customers apply constraints such as always having a human review auto-generated text and implementing filters of their own.

The economics of AI-generated text are still unknown. With no competition from rival text-generation services, GPT-3 is relatively expensive—reflecting how OpenAI has pushed the boundaries of machine learning with help from immense computing power. Frantz of Snazzy gave up on offering a free tier when OpenAI revealed its pricing last month. Yet it shouldn’t be difficult for other AI players such as Google and Amazon to create similar text generators, and the technology could get cheaper and more accessible quickly.