Understanding text as “data” and accepting its fluidity

The survival of the humanities in the digital age relies upon the understanding that text is data: specific information that is carefully packaged for our analysis. In my experience as an English major, analysis is typically undertaken as a quest for interpretive meaning, bringing to the fore questions of symbolism and literary devices, like: “What does the green light symbolize in The Great Gatsby?” and “How does Fitzgerald’s diction convey this particular meaning, and not another?”

After focusing primarily on qualitative approaches to literary analysis, ENGL-340 introduced me to the quantitative analysis of text as data. With quantitative analysis, each bit of information is examined without projecting more meaning than what’s provided. An example of quantitative analysis applied to The Great Gatsby might be: “How many times are the eyes of T.J. Eckleberg mentioned? Where are they mentioned most often?”

ENGL-340 also introduced me to text as a corpus of data, or corpora if there’s more than one distinct corpus. We apply quantitative analysis to one body of text, or a collection of related texts, in search of patterns. Patterns are the keystone of meaning according to quantitative analysis, especially when examining a large corpora. For instance, you might wonder “What patterns can we discern across classic novels written by the modernist “Lost Generation”?” These “patterns” might materialize as the repeated use of a word, theme, or structural organization. From there, you might imagine why the great writers of the Lost Generation made similar, or different, stylistic choices.

Incorporating this mode of analysis into our literary approach is crucial because it bridges the perceived gap between the humanities and the sciences. Human thoughts crafted into words crafted into sentences crafted into coherent bodies of text are historical objects of information worthy of scientific analysis. They are markers of humanity’s achievements: some of the greatest, most timeless self-knowledge we’ve touched upon as a species is found in literature. Why not examine its concordance, then? Some tips for quantitative analysis that I picked up in ENGL-340, and which I hope to bring to future classes, are:

  • Searching a corpus for patterns at the command line: using a digital copy of the text you’re analyzing and the command line, you can search the text for the occurence rate of specific words, for a total word count, for the longest and shortest words in the text, for the word that occurs most often, and so on.
  • Generating graphs and other visuals for your data on Voyant: rather than using the command line, you can also view the quantitative analysis of textual data you upload on the website, VoyantTools.org. The cool thing about Voyant is that it generates different charts, graphs, and visuals of the data you’re focusing on.
  • Comparing data across versions using a Fluid Text Edition: here, we can analyze how data in a corpus has changed or stayed the same over time.

The second necessary development in my perception of literature as a result of ENGL-340 is tied to this final bullet point. First, there’s understanding text as data. Then, there’s understanding that text is fluid: not a fixed and stable object, but rather, an ongoing project.

Our favorite books, poems, and plays were formed gradually and with great care– not suddenly and miraculously crystallized. If we have access to earlier manuscripts/revised copies, we can compare versions of a text to better understand its development. Referring to a fluid text edition also brings awareness to the humanity of the author: reading Thoreau’s first draft of Walden, we are reminded of the flaws inherent in everyone’s first draft. It also serves to remind us, quite fittingly for Thoreau, that everything is done with some degree of deliberation in the literary world. Individual words are chosen with extreme care: Thoreau scratched out the words “book”, “work”, and “lecture” when describing Walden across various versions before finally settling on “book”. The attention put toward such minute matters reveals Thoreau’s dedication to “getting it right” as a writer– a quality not all writers, or authors, possess.

Thanks to ENGL-340, I am able to see the oft-overlooked juncture between qualitative and quantitative analysis in literary studies. With attention to the nature of text as “fluid data,” we can study the information before us both objectively and subjectively. Subjectivity particularly applies to imagining plausible revision narratives that explain changes made to a text. Of course, quantitative data is also fodder for further interpretation. Information, writing, art, life– I think James Gleick once called it a “moving target,” and I find that more than adequate. These things are constantly in flux as a result of being alive; final stability is like death, or whatever name you give to completion. Just like the way we learn from one another, we learn from art as we observe and analyze its change over time, its variations and constancies. In this way, ENGL-340 enhanced my understanding of literature and life.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.