Computers scientifically prove how horrible The Matrix's dialogue is

The Matrix is a groundbreaking, visually arresting Singularity opera about our species' precarious balancing act between the myths of old and the technologies of tomorrow. But for all the film's many achievements (we'll leave Reloaded and Revolutions out of the mix for now), The Matrix is far from perfect. Leading man Keanu Reeves' unique brand of anti-acting comes to mind, as does the issue of the film's dialogue. The silly horrible dialogue.

The Matrix brims with unfathomable fauxlosophical babble like "The answer is out there, Neo, and it's looking for you, and it will find you if you want it to." Just ridiculous. But how ridiculous is it exactly? Back when the film was released in 1999, we didn't have an easy way to answer this important question. Thankfully, technology has advanced to the point where great cinematic queries such as these can finally be fulfilled!

The Internet (the real Matrix, if you will) is home to a bevy of tools that will coldly analyze any source material and rate it according to pre-programmed algorithms. They allow anyone with a Web connection to analyze any chunk of text, be they term papers; a letter to your congressman; or yes, even movie dialogue.

Here we present a little revenge by "the machines" on the film which so ham-handedly warned of their impending rise and rebellion. (Or has this been part of their plan since the very beginning? Eh? Think about it.*)

* Don't spend too much time thinking about it.




The-Matrix-Neo-bullets-2.jpg

There Is No Spoon

While there are technologies that will automatically transcribe spoken dialogue into a digital format, for our layman's strategy, we utilized an actual film script. As it turns out, the Web is a treasure trove of complete screenplays, especially for fanboy favorites such as The Matrix. The first step was to locate a version of the script that is close to the final edit of the film, such as the one found here.

Of course, film scripts aren't just dialogue, they also include scene descriptors, character designations, and other conventions like so:

INT. CAR

A large black man named APOC is driving. Beside him is a beautiful androgyne called SWITCH, aiming a large gun at Neo.

NEO
What the hell is this?!
TRINITY
It's necessary, Neo. For our protection.
NEO
From what?
TRINITY
From you.

She lifts a strange steel and glass device that looks like a cross between a rib separator, speculum and air compressor.

Since we're just interested in the dialogue, the next step was to erase all the screenplay flotsam. And to do that, the entire script would need to be transferred to a malleable medium such as a Microsoft Word document. We converted the above on-screen version of the script into a 143-page Word doc. Then by utilizing a complicated series of find/replace functions, we were able to remove all the excess scene descriptors and character headers, thus distilling the script down to pure cyber punk blabber, like so:

Get in.

What the hell is this?!

It's necessary, Neo. For our protection.

From what?

From you.

Take off your shirt.

What? Why?

Stop the car.

Now we have a document ready for the digital wringer.




The-Matrix-Neo-waking-up.jpg

Revenge of The Machines

There are several free online text analysis tools that will pick apart just about any uploaded text. For example, voyeurtools.org, offers a free tool that will create a word cloud of the most commonly found words. Here is the word cloud of the most common single words spoken in The Matrix.

For giggles, other tools will allow you to separate out the most common phrases of various lengths within the text. For example, when you filter down to the top phrases with just three words, by far the top utterance in The Matrix is "I don't know" with 11 occurrences. That's followed by three-word classics such as "I'm going to," "you want to," and "what do you," which come in at nine appearances.

You can also filter out the most utilized "rare English" words from the text. So, the computers comb out "the" and "it" and little nonsense words like that to find the popular words that are unique to that text. As you might expect, the character names are top: "Neo" and "Morpheus" have the top spots with 70 and 51 mentions respectively. And the word "Matrix" comes in at third with 29 appearances. If you take out character names, the film's next five favorite words of wisdom are "shit" (21 mentions), "yeah" (12 mentions), "fuck" (7 mentions), "Zion" (6 mentions) and "exit" (6 mentions).

This is all somewhat interesting, but we want to start looking at the quality of the uploaded text.

The word nerds over at online-utility.org offer a whole suite of free text analysis tools including ones that gauge the number of years of education a person would need "to understand the text easily on a first reading." And the result isn't all that complimentary to the film's writers (or they are just very complimentary of our nation's elementary schools).

Here's a variety of indexes with the mathematically-calculated amount of formal education (i.e. grade level) needed to comprehend the text on first try.

Gunning fog index: 4.13
Coleman-Liau index: 2.08
Flesch Kincaid Grade level: 2.63
SMOG: 6.20
ARI (Automated Readability Index): -0.03

As you can see, one index, the ARI seems to indicate that the audience should have less education than when they were born.

Of course, these analysis may be unfair as the spoken dialogue was never designed to undergo the same scrutiny of the written word. However, if we compare these films to other genre film dialogue distillations, they don't hold up well. For example, if you break down the script for the original Jurassic Park, it scores better in every category:

Gunning Fog index: 5.11
Coleman Liau index: 3.99
Flesch Kincaid Grade level: 3.35
SMOG: 6.85
ARI (Automated Readability Index): 1.39

Even silly old Will Smith-helmed Independence Day (which was enjoyable but tailor made to appeal to as wide an audience map as possible) proved to be a more sophisticated source material (than Jurassic Park, even):

Gunning Fog index: 5.14
Coleman Liau index: 4.19
Flesch Kincaid Grade level: 3.46
SMOG: 6.97
ARI (Automated Readability Index): 1.58

But these analytical tools aren't only used to passively judge, the analysis can also highlight passages of particular concern. Or as they put it, it can point out sentences "we suggest you should consider to rewrite to improve readability of the text."

Here are the silly phrases that the machines felt needed the most help:

  • "I believe that if you are serious about saving him then you are going to need my help and since I am the ranking officer on this ship, if you don't like it then I believe that you can go to hell, because you aren't going anywhere else."
  • "Inside the power plant, I watched them liquefy the dead so they could be fed
    intravenously to the living and standing there, facing the efficiency, the pure, horrifying
    precision, I came to realize the obviousness of the truth."
  • "You move to an area and you multiply and multiply until every natural resource is
    consumed and the only way you can survive is to spread to another area."
  • "Whatever you think you know about this man is irrelevant to the fact that he is wanted
    for acts of terrorism in more countries than any other man in the world."
  • "I say 'your civilization' because as soon as we started thinking for you, it really became our civilization, which is, of course, what this is all about."

These analytical tools were originally designed to facilitate cleaner writing and to aid academics in sorting out plagiarized content. But, as it turns out, they have also found a use in paving the way for the oncoming mechanized dystopia of the future.

Who'da thunkit?

Evan Dashevsky is a DVICE contributor and a professional word nerd for hire. Follow his cough-sized thoughts on Twitter @haldash.