If you're ever reading a book or watching a movie and get the distinct feeling you've come across the story before – or even better, can predict exactly what's going to happen next – there could be a good reason for that.

Computer scientists have sifted through the language of more than 1,700 works of fiction and discovered that English literature consists of just six kinds of emotional arcs that make up nearly all of the most well-known stories.

While literary theorists have for centuries characterised and counted the basic plots and structures that writers use in stories, it's unlikely there's ever been such a rigorous scientific analysis of English fiction like this before.

Researchers from the Computational Story Laboratory at the University of Vermont mined the complete text of some 1,737 fiction works available on Project Gutenberg, an online collection of more than 50,000 digital books in the public domain.

By analysing the sentiment of language used in chunks of text 10,000 words long in each of these texts, the researchers were able to register the emotional ups and downs for the stories as a whole. Negative words like "poverty", "dead", and "punishment" dragged the sentiment down, while positive terms like "love", "peace", and "friend" brought it up.

Doing this for over 1,700 books and charting the dynamics of each text, the team discovered that all stories basically boil down to one of a set number of emotional patterns. "[W]e find a set of six core trajectories which form the building blocks of complex narratives," the authors write in their study.

According to the researchers, those six core emotional arcs are:

  • "Rags to riches" (An ongoing emotional rise, eg. Alice's Adventures Under Ground)
  • "Tragedy, or riches to rags" (An ongoing emotional fall, eg. Romeo and Juliet)
  • "Man in a hole" (A fall followed by a rise)
  • "Icarus" (A rise followed by a fall)
  • "Cinderella" (Rise–fall–rise)
  • "Oedipus" (Fall–rise–fall)

Interestingly, based on download statistics from Project Gutenberg, the researchers say the most popular stories are ones that use more complex emotional arcs, with the Cinderella and Oedipus arcs registering the most downloads.

Also popular are works that combine these core arcs together in new ways within one story, such as two sequential "Man in a hole" arcs stuck together, or the "Cinderella" arc coupled with a tragic ending.

The researchers explain that the emotional arcs they've detected aren't quite the same thing as plots, which is the focus of many other literature analyses, as their own study is purely based on the sentiment of the language used, not particular story details otherwise affecting the progression of the narrative.

"While the plot captures the mechanics of a narrative and the structure encodes their delivery, in the present work we examine the emotional arc that is invoked through the words used," they write. "The emotional arc of a story does not give us direct information about the plot or the intended meaning of the story, but rather exists as part of the whole narrative."

It's worth bearing in mind that the research has yet to be published in a peer-reviewed journal, but is available on the pre-print site arXiv.org, while it undergoes the approval process.

In the meantime, you can take a close-up look at the researchers' data-mining for yourself, with interactive visualisations of all the Project Gutenberg books analysed here, along with a selection of classic and popular works (including the complete Harry Potter series) here.

Drag the slider from left to right on the graph on the left, to update the language analysis shown on the right side of the screen. Here you can see what particular words are affecting the score for the portion you've selected.

Fascinating stuff!