Group 6- Text Analysis

In our analysis of Walden, we utilized Voyant tools to enhance our understanding of the text. Voyant tools features  tracking options and offers the ability to strip filter words from the text. By making key words more accessible, readers may focus more on words that give insight into Thoreau’s thoughts and their evolution throughout the text of Walden. First, we re-read portions of Walden that fascinated us. Then, we used the helpful features of Voyant tools to analyze these chapters of Walden individually, and then pieced together word trends that related to the topics each of us chose to explore.

The chosen topics were frequency of religious words, frequency of positive words, frequency of experiential words and positive and negative correlations throughout Walden as a cohesive text. The graph tool showed us where words were more or less frequent throughout the text and let us infer how that conveyed Thoreau’s message of living deliberately. We could see where words that may have not been used before and then were suddenly utilized. These red flags keyed us into searching for Thoreau’s moments of enlightenment and discovery. Voyant tools helped us to thoroughly analyze the text beyond our own comprehension of language. We tend to look at the message of the text, not the particles that make it up, and the small bits actually brought new information about Walden and Thoreau to light. Utilizing Voyant allowed us to incorporate statistics into our research so our results were no longer just opinion, but irrefutable data to back our opinions.

Our first day toying with Voyant, we removed all the filter words. After removing all filter words, “life” and “man” are the most prevalent words in Walden as a whole. Reading Walden, it is clear that he is a man on a mission: to live life to the fullest. But now, we can  give substantial mathematical support to enhance understanding of Thoreau’s search for greater meaning. Each group member followed their own path, but we were ultimately able to see how our individuals aspect of text analysis conveys Thoreau’s philosophy of living life deliberately.


Digitizing Donald Ross

This project was centered on finding ways to apply previously completed scholarly work on Walden to the digital frame of our class. Donald Ross played a major role in analyzing the different versions of Walden, as well as tracing the inspiration for many of Thoreau’s messages. Ross was able to draw connections between Walden’s narrative to major ideas such as Biblical allusions, references to Eastern Philosophy, and so on. Our mission for the semester was to take Ross’ information and translate it into a form that we could use for the Fluid Text edition of Walden.

RossOur first obstacle was simply figuring out how to start; Ross’ materials stretched on for 72 pages of dense information, most of which we spent a good deal of time trying to decipher. We settled on using R to analyze Walden, using digital methods to parse out Walden the same way that Ross did by hand. By doing this we were able to test out the possibilities and limits of digitally analyzing a work of literature by comparing it to the analysis of the same work by a human scholar.

What we found was that R has a huge amount of potential in this field, though it will take some work to fully reach that potential. The way the analysis worked was by searching for “n-grams,” or a set of n number of words (in our case 5), looking for matches between Walden and other specified books. The n-grams were compiled into a spreadsheet which showed us the word matches along with the entire sentence of Walden and the entire sentence of the other work. The program found over 2,000 n-grams, most of which were coincidences driven by similar word-usage. We focused on the Biblical references Thoreau is notorious for, and so scrolled through the enormous spreadsheet searching for n-grams that held deeper meaning than simple coincidence.


The program written for the analysis performed by R was still in its early stages and yet was able to perform the analysis in a number of hours; surely record time in the field of literary analysis.

There is clearly a lot of future work that can be put into this project.  Sifting through the information example by example was a lengthy process, even with three people working simultaneously.  As a result, our group only managed to make a modest dent in the material.  We only focused on the Biblical allusions Ross found in Walden, passing up his lists of allusions to Eastern philosophers, verbal wit, and other categories.  We also only managed to sift through a fraction of what the R program additionally found.  Although most of these were coincidental, one of them seemed to be an honest Biblical allusion that Ross overlooked.  There could be many more that Ross didn’t find, but that we were simply unable to get to in the allotted time.

Overall we have looked at the program as a success. After sifting through the bibical references that Harding had already found and comparing them to what the computer found, the computer was able to pick up on one more reference that Ross had missed. While this may seem like a small victory compared to the thousands of invalid references that the computer had found, it must be considered that the references were found in a matter of hours and the program is still in its early stages. While it may have taken Ross months to be able to find these references, the computer was able to do it in just a few hours.  We believe this project has some real potential to affect the Digital Humanities, and there is a lot more work to be done.

Building a Thoreau Timeline

The Thoreau Timeline

Henry David Thoreau was a busy man, and our group was tasked with listing and detailing his many lectures, publications, journal entries, and general biographical facts along with his work on A Week on the Concord and Merrimack Rivers and Walden. We were provided with a rough outline of what such a timeline should encompass from Stephen Adams’ and Donald Ross’ Revising Mythologies: The Composition of Thoreau’s Major Works, as shown below.

A valuable resource but not the most intuitive.
A valuable resource but not the most intuitive.

Our main goal was to expand upon this information and put it in a more visual, interactive format. To do so, we used TimelineJS. This service allows users with little technical knowledge to create an attractive, intuitive timeline complete with media and layered information. TimelineJS provides a template for a Google spreadsheet, then generates a code for the published timeline that allows users to embed it into a website.

An example of what entries in the spreadsheet look like.
An example of what entries in the spreadsheet look like.

Our group split the categories into six parts, and we initially created six different spreadsheets of data. Alexa had the biography, Cassie had the journal entries, Gabe had the lectures and articles, and Holly had A Week and the Walden versions. Our text was drawn from a variety of sources (the most helpful to us being Walter Harding’s The Days of Henry Thoreau), and the images were generally licensed for use under Creative Commons and gathered from Flickr or Wikimedia Commons. If possible, Gabe linked articles and essays written by Thoreau into the timeline using PagePeeker, which provides an image and link of the website he found the essay on. Our main obstacles were working with TimelineJS itself – figuring out how to enter dates without knowing specific days of  month, where to source our information, dealing with invalid image links, and tagging information that fell into different categories.

We tagged our entries in order to create a six-layered timeline.
We tagged our entries in order to create a six-layered timeline.

In the end, we combined our six timelines into one monstrous spreadsheet dubbed “The Master TImeline.” Once this was published and fine-tuned, we had a visually intriguing and well-organized timeline that allows readers to easily connect different areas of Thoreau’s life to a certain time period. Building this timeline not only pulled together some of the technological skills we had learned over the semester, but gave each of us insight into Thoreau’s life and work.

(Holly) A Week and WaldenThe timeline allowed me to pull together various aspects of the class – the various readings, the fluid text edition, the significance of the manuscript changes – and really illustrate the development of Thoreau’s literary works. Tracing the progress of Thoreau’s two books showed me how much Thoreau’s writing was impacted by his life experiences and the ups and down of his literary career. What struck me most was how prepared Thoreau seemed to be for the publication of Walden after finishing and working on publishing his first book, A Week on the Concord and Merrimack Rivers. As we know, it would be many years and manuscript changes later before Walden was actually published, and this was (arguably) partially due to the failure of A Week. What would Walden be had Thoreau’s first book sold well on the market? The timeline is a valuable tool that offers a big-picture perspective on how Thoreau’s life experiences interacted with his writing. Studying the intertwining nature of some of the entries, for example, can show how certain events may have shaped others, as is the case with A Week and Walden.

From the Walden timeline.
From the Walden timeline.

(Alexa) Biography: When creating my timeline on the biography of Thoreau’s life, I certainly found that Thoreau was a very exploratory person. He had taken multiple visits to Cape Cod and Maine, as well as Hill & Plymouth in Massachusetts for exploration purposes and to gather information for his journals. He also had taken an opportunity to travel to Canada mainly because the train ticket was very cheap. Along the course of Thoreau’s life, I found dates in the biography relating to what we found through reading Walden: when the cabin was being built at Walden Pond, when Thoreau lived at Walden Pond, the date Thoreau spent the night in jail, and when he returned to his home in Concord. I also found it interesting that this timeline helped show a different perspective on Thoreau not really seen when you read Walden. There was an event on the timeline dealing with the Burns Affair, for example. After researching this, it turns out this had to do with an escaped slave on trial, Anthony Burns, and Thoreau was protesting the return of Burns to his slave owner. I found some great detail about Thoreau’s opinion on slavery.

(Cassie) Journals: This task was challenging because it required me to organize the two journal sections from the original timeline, and then figure out where each entry recorded is located in the journal. The journal entries ended up being separated into two different sections, the 1906 and the Princeton/Mss versions. Although there was a bit of overlap, for the most part, where one journal ends, the other journal begins. In both of these versions, Thoreau manages to stay consistent with narrating the little things that go on in his day-to-day life. By organizing these entries, I was able to learn more about Thoreau’s thought process. When Thoreau was deciding to leave “high society,” for example, I was surprised to learn that he had much more going on in his life than just this desire to “live deliberately.” I also found that he left to “cure” his writer’s block. He talked about his fear that if he did not leave, he would never be able to finish his writings. In some ways, Thoreau may have surprised himself with how much his departure from society changed him. This realization made it easier for me to be able to connect the writing with the human. This is partly why the timeline itself is important; it connects all the parts of Thoreau’s life to give an impression of the real man, not just a series of disjointed writings. While reading these entries and figuring out a timeline of the feelings, thoughts, and emotions reflected in them, I was better able to understand not only the premise for Walden but also the way that Thoreau’s brain works.

From the journals timeline.
From the journals timeline.

Lectures and Publications: My portion of the timeline was dedicated to recording the dates of the lectures that Thoreau would give on his various works across the Northeast, as well as the publication of those works beginning with his first lecture in May of 1835. I think the thing that most interested me about this reconstruction was how it demonstrated how hard it is to reestablish the past – so much of what I inputted was only known about from an off-handed mention in a journal entry of Thoreau’s, and because of that we often know that a lecture on a subject occurred but cannot give an exact date, or know Thoreau lectured on a certain date but are unaware of what it was on. I think that really demonstrates the necessity of linking the digital world with the humanities, so that in the future we do not have to piece together fragments in order to gain only the shadow of an understanding of our context, but can preserve it whole for the future.



Digitalizing Deliberately

Throughout our digital thoreau photo illustration finalproject, we had to utilize various new technologies that many of us were not familiar with.  A majority of our in-class planning time was spent writing scripts for our videos, and deciding what information was vital for using the site, and therefore warranted a video.  We used the preexisting help page to guide us in deciding how to break up and designate the videos.  We pulled from our knowledge of help pages we’ve used in the past, and came to the conclusion that one of the problems we’ve encountered with video help pages is that it’s difficult to pinpoint the instructions that directly address the individual problem.  We considered a table of contents for our video but even that seemed too inefficient. We endeavored to create a more productive tutorial that could offer users instant and specific guidance. As a result, we decided to make a multitude of shorter and more direct videos, to establish that the help page is as user-friendly as possible.  It was up to us to condense as much aid and instruction into a series of short tutorials. It was not a simple feat since the previous year’s group could compile their detailed instruction into a drawn-out text. Additionally, we searched for a way to put our own spin on it.

Once we decided how to break up our videos, we scheduled a time to meet with the Digital Media Lab assistants.  Our preliminary meeting was simple: we asked what kinds of tools we could access if we were to use the Digital Media Lab and how easy this technology would be to operate.  We were introduced to Camtasia, a program that allows you to capture and record screen activity as you navigate. A microphone records your voice to provide simultaneous narration.  In essence, Camtasia allows for the user to be “present” by being able to visually and audibly learn what the instructor is doing. We decided to use Camtasia to make our videos; we planned to create an account and record us chronicling the step-by-step.

We booked a time to meet in the DML and recorded a video on our first visit.  Aside from a few last minute script rewrites to ensure fluidity, everything was going according to plan.  A funny note about a problem we had was when realized that the DML itself shares a wall with a bathroom in Milne, so there were some times where we had to wait to record to make sure no sounds of running water or hand dryers made their way into our video!

We faced a few administrative troubles, including a time when the Milne staff could not find the key to the DML, but after some discussion and searching, our plans went off relatively without a hitch. There was also the instance when Dr. Schacht denied our example account entrance into the group forum after mistaking it as spam. Without access, one of our videos could not be produced. This mild setback halted progress for that video by a day or two, but we hold no grudges. sounds like a pretty faceless email address. We would have done the same.

We had to do a few retakes, but the reasons were nothing more than verbal stumbles; minor edits were accomplished through iMovie.  iMovie was another technology we used, as stated before, mainly for smaller edits and some decorative polishing, such as the title slides.

In retrospect, it is interesting to consider what Thoreau might have thought of our videos, and the Digital Thoreau website in general.  Would he think that those who need the videos should not have access to the information, much like he thought that the Classics should not be read by those who did not understand the original language? Or would he be pleased to know that his work was being spread to a larger audience? Thoreau’s revulsion towards innovation and technology is apparent throughout Walden, but his encouragement for advancement through learning is evident as well. In his conclusion Thoreau writes, “Things do not change; we change,” and we can justify our digital work with this maxim. As a future generation we have access to the technology and modernization of the 21st century, which we have employed in order to spread Thoreau’s teachings. Thus, a man who promoted learning and acknowledged change would deem our efforts worthy.


Group 3: Allison Fox, Aran Fox, Maya Merberg, Kaitlin Pfundstein and Grace Rowan

A Walter Harding Chronology

Walter Harding was a professor of English at SUNY Geneseo for several decades and was appointed chair of the department in 1959. One of the most influential and decorated scholars to have spent time at Geneseo, Harding specialized in all things Thoreau.

Walter Harding
Walter Harding

Milne library has created an extensive archive of both Harding’s writings on Thoreau, and documentation of the various honors and recognitions he received both during and after his time at Geneseo. Our group sought to carry on the project that previous semesters’ groups had begun of digitizing this archive and making it accessible to a larger audience.

We began the project by familiarizing ourselves with the physical archive and gaining a sense of who Dr. Harding was as both a scholar and a person. Each of us were immediately struck by Harding’s contagious enthusiasm for Henry David Thoreau, and impressed by the sheer amount of knowledge he had about the man. Then we were all floored by just how extensive his work with Thoreau was. We quickly discovered that Harding was a founding member of the Thoreau Society and served as its secretary for 50(!) years. Harding carried his enthusiasm for Thoreau across oceans during his time spent lecturing students about Thoreau in Japan. It was abundantly clear to us from the Walter Harding Archive that Walter Harding’s life work needed to be preserved.

That was when we began familiarizing ourselves with Omeka: the digital archive tool that we would use to help preserve this information.omeka Essentially we gathered the physical documents we deemed fitting for the exhibit we wanted to create, which we titled “A Walter Harding Chronology,” and then scanned them. After being scanned, each item was uploaded to the website with its metadata being updated as well, including a title, source, description, contributor, publisher, date, language, etc.

Metadata editing tool in Omeka
Metadata editing tool in Omeka

After finding, scanning, and uploading items to Omeka, we began the process of tooling around with the appearance of the site, and made a few organizational and aesthetic changes to the theme.

Here is an image of the landing page of the website:

landing page


And here is a link to visit the website yourself:

The students working on this project for the Spring 2015 semester were Julia Kinel, Kelly Langer, Casey Vincelette, Melanie Weissman, and Emily Peterson

A Thoreau Approach to TEI

       Why Encode the Journals?

Thoreau has several journal entries from his life in the hands of the scholarly community, but these entries are not encoded, existing only in manuscript and transcript form. This makes it very difficult for scholars to analyze, search through, and work with the texts as they strive to learn more about him and his work. The journal entries provide information about Thoreau that is absent from his books and other published works, and so are of great value to the community. Encoding these entries will make it easier for scholars to find patterns in who he interacted with, where he went, what he observed in nature, and what he did throughout his life.


The Process

Receiving the Journals

Our group got off to a rather late start, in part due to our lack of resources and lack of knowledge about XML and TEI. Within time, we received the files of Thoreau’s journal entries from Beth Witherell over Google Drive, including manuscripts, transcripts, and notes. After receiving the files we began to look through them in search of patterns and themes that we could focus on. Beth Witherell shared sets of journal categories and information so that we would have a starting point for our journey.


The Google Doc

Once we determined the themes that we wanted to focus on, we then created a Google Doc containing the journals that we were going to encode. From here we had to split up the elements that we wanted to identify, and to do so used a color coordination method that consisted of us highlighting the specific words in different colors to make it easier for us to locate them. These elements included dates, activities, times of day, possessions, animals, plants, weather, people, and places. We each picked two elements and went through the document, highlighting their occurrences in several entries.


Going into Depth

After a discussion with Dr. Schacht we decided to elaborate on the elements that we had identified by providing more information, rather than tagging more elements. This process required us to assign an XML:ID to proper names of people and places, and also include the “ref” attribute within the tags that would link to a website with more information. To do this, we created a spreadsheet listing the tag, XML:ID, name in the text, place in the text, and reference link.


Starting to Encode

We divided up the documents amongst ourselves and started coding with the text editors on our own computers. For this we used both TextWrangler and Notepad ++. We each had approximately sixteen lines to encode and referred back to the spread sheet and GoogleDoc to keep track of our tags.



We ran into several roadblocks because TEI does not have have specific tags for elements such as animals, weather, plants, activities, and possessions. With Dr. Schacht’s help we were able to find ways around this problem and use alternative methods to tag plants and animals; however, we were still unable to find tags for weather, possessions, and activities, and had to leave those elements out. This would be more possible for more advanced TEI editors who have more time to focus on this. We also struggled with tagging names, because in the entry we focused on, Thoreau references vague and ambiguous characters, including a person only referred to as “C”.  We had to make several decisions on what each element could apply to within the text. For example, did a “salt marsh” count as a place, or should only proper places be included?


After we tagged the elements individually we combined our individual sections using the more advanced text editor, Oxygen, including a header that was created by Dr. Schacht. We downloaded Oxygen as late as possible into the process in order to utilize the thirty-day free trial. Oxygen was especially helpful because it validated our document as we worked, and gave hints at the source of any issues.



At one point, we found that several of our XML:IDs were not accepted by Oxygen, and we didn’t know why. Dr. Schacht updated the template he had given us, and explained to us how to include the XML ID’s in the heading rather than within the body of the text, as well as how to have references within the text itself.


Final Meeting

We met with Dr. Schacht with some final inquiries and adjustments to our document. We discussed referring to IDs from the header within the text and tagging geographic features, as well as tagging drawings and uncertain text within the entry. We also adjusted the spacing and formatting of the document. By the end of our meeting, the XML document was complete and validated within oxygen.



Throughout the process, we had several questions that Beth Witherell may have been able to answer, or possibly provide guidance on. These include:

  1. How important is preserving the format of the original journal entry manuscript? For example, we didn’t encode any of the line breaks, page breaks, or paragraphs that were in the manuscript. Additionally, the paragraphs were not included in the transcript we worked from, and the page breaks were not clearly defined.

  2. Is the quantity of elements tagged more important than the depth of information provided? For example, we tagged a very general list of elements, but provided further information through the use of reference links. Because of this, there were many tags suggested by Beth Witherell that we did not include.

  3. Are there some elements that are more important than others, that you think would provide more insight to Thoreau and his work than others?

  4. Are there specific journal entries that would be more beneficial to Thoreau scholars to encode than others?

    What We Learned

    We learned a lot about encoding throughout this process, especially about the limitations of it. There were several element tags that seemed simple to us and relevant to the text, but the TEI guidelines could not provide. Although these elements could have been customized by us, we did not have the skill nor the time to do so. We also learned that interpreting the manuscripts written by Thoreau is a delicate and difficult process that includes a lot of decision making and judgement on behalf of the interpreter. Finally, we learned that there is a sort of authority given to the encoder, because she decides how elements should be organized, what is important, and what is not.

    Group 4 Members: Daisy Anderson, Emily Buckley-Crist, Darby Daly, Melissa Rao