Data-Mining Walden: Tools for Literary Analysis

Henry David Thoreau had a fraught relationship with technology. As we discussed in our presentation, it is difficult to tell whether he would be on board with our digital projects regarding his work. What we can say for sure is that the technology we have engaged with this semester have allowed us to read his book, Walden, as deliberatively and as reservedly as it was written. By apprehending his text in the digital dimension we achieved new and unique insights into the way Thoreau thought about place and how he crafted his thoughts into writing. 

Melissa, Sean, Cal, and Emma each took a chapter to mine in order to track the language of place and its developments throughout the text. This required the downloading and installation of some software with the help of Kirk Anne and Dr. Schacht. Brianne worked on answering the “so what?” question by analyzing the data collected by the other group members. We worked with the Natural Language Toolkit (NLTK) and spaCy,both of which allowed us to mine for certain words and types of words. However, eached proved to have their own limitations within each chapter. We found that spaCy was better equipped in Cal’s mining of “The Ponds” whereas NLTK was more helpful for Melissa, Sean, and Emma.

Zooming out, data mining a text such as
Walden did not come without challenges. Whether it was the virtual machine or the local server, Python proved to be a very demanding language, one with a steep learning curve which kept us guessing a lot of the time. Similarly, NLTK and spaCy had to be downloaded directly to our devices in order to accomplish the task at hand. It became pretty clear that while digital tools can often make reading easier learning the tools necessary to do so is all but simple. Still, when grappling with the limitations of all of our tools we seemed to be simultaneously addressing larger questions about the utility of technology, just as Thoreau does in
Walden.

Nevertheless, the technology proved indispensable for our project because it helped us to expedite the mining/reading process. Python, the language we used to learn more about Walden, allowed us to operate on the text, while spaCy and NLTK provided a bank of resources that we could apply to the chapters we all chose. Each tool informed us on a general sense of place which we followed up with closer readings. We were able to clearly discern between the broadly spatial chapters (“The Village” and “House Warming”) and the specifically geographic ones (“The Ponds” and “Conclusion”). Whether he was talking about physical places or metaphorical spaces, as in headspace, Thoreau constantly framed his thinking through place specific language. This sort of “mapping” truly makes Thoreau into the “Surveyor of the Soul” that Huey Coleman claims him to be. His attention to the local and the distant, from Concord to Siberia, demonstrates both the interconnectedness that technology in the 19th century was making possible and the expansive reach of an inner geography, a soul whose territory outran the map.

Just as some of Thoreau’s themes exceed the scope of a geographic specific reading, so too did our task at hand exceed the capabilities of some of our tools. One thing our group really wanted to stress in our presentation is the importance of validating failure in digital projects. All of the setbacks, miscues, and limitations faced by engaging with Jupyter Notebook, Atom, Python, Anaconda, spaCy, NLTK, and beyond were equally as useful to thinking about the digital humanities as our successes with each of these tools. When we encountered errors in our work we were forced to ask why. This moment of self-reflection was critical for doing digital work because of the knowledge that stood to be gained by asking questions about the tools. Coming to this class with a variety of digital backgrounds, it was very important that we moved as a unit. Fortunately, the tools we used leant themselves well to collaboration and, ultimately, this project became about creating our own community space around Walden. 

From his comparative measures of White and Walden Ponds, to his rambles through Concord, to his building of a house in the woods, and his reflections on place inward and outward, Thoreau was constantly attuned to the language of place. We too were attuned too language, constantly seeking the instances of geography in his text by moving through it digitally. Just as Thoreau spatializes his world in Walden, so too do we attend to space by tracking its relative importance throughout the book. By using digital tools we were able to read Walden collectively, collaboratively, effectively, and deliberately.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.