How You Can Help/Have Helped Digitize Literature

The ways in which people are digitizing books is amazing, especially because you’ve likely already helped digitize a book by using Captcha.

It’s a tool that makes sure an internet user is a real person who can read funny-looking words and not a malicious computer program looking to cause trouble.  When these Captcha tests first appeared, they had one crazy looking word or code to copy.  Now, they often have two words… one that is there to test you… and one that is a scanned word from a book for you to help digitize.  The inventor of Captcha realized that he could use this tool to crowd source and create a form of digitizing books that is more reliable than a computer.  Since people are better at reading printed words than computers, it made perfect sense to use an application that was already dependent on people reading better than computers.
You gotta hear the story from the man himself:

Here’s a mistake that the computer made when scanning a book. The word “they” is upside down, but a computer user will hopefully recognize this and type “they” instead of “yeht”.

Another great way to help digitize literature is through Librivox.  It’s a place for anyone to help digitize a book by recording themselves reading literature that is in the public domain.  Anyone can volunteer to create free audiobooks for Librivox.  You can read a chapter of a novel, a poem, a short story, or a role in a play.  Librivox builds a catalog of free audiobooks in the public domain through volunteers who want to digitize books in an audio format.

You can learn how to voluneer to read for Librivox or browse their growing selection of free audiobooks.  It’s great for English majors or anyone taking the mandatory Humanities courses at Geneseo!

Albin Grau: The Director Who Didn’t Let Copyright Block His Creativity

We were asked in class whether or not copyright laws limit creativity.  I have found an early example of an acclaimed creative work that would not have been possible if its creator didn’t blatantly ignore copyright laws.

The 1922 silent German horror film Nosferatu is probably best known to college-age folks for the cameo appearance in Spongebob Squarepants.
Nosferatu is one of the very first horror movies ever made.  It’s also a total rip-off of Dracula.

Director Albin Grau wanted to make a movie about a vampire.  After doing some research, he decided to direct an adaptation of Bram Stoker’s novel Dracula.  Bram Stoker was at this point deceased, and his copyrighted work belonged to his wife, Florence Stoker.  All Grau had to do was get Florence Stoker to sell him the rights to make a Dracula movie.  Unfortunely, Florence Stoker refused.


Grau was dismayed that he didn’t get permission to adapt Stoker’s copyrighted work, but he knew just what to do.  He made the movie anyway.

Grau’s clever plan was to adapt the story of Dracula, but change the names of all the characters.  For example, Count Dracula became Count Orlok.  And instead of turning into a bat, Count Orlok would have powers more reminiscent of a rat.  He wouldn’t be a vampire as much as a “nosferatu”… whatever that is.

Grau’s movie turned out to be a hit!  It was a revolutionary movie that showed off state-of-the-art effects and told a great, if oddly familiar, story.  All would be wonderful as long as Florence Stoker didn’t notice.

“I noticed.”

The court case that followed resulted in the judge ordering all copies of Nosferatu to be destroyed, but years later, a copy surfaced.  Now Nosferatu can be seen by anyone and is studied by film enthusiasts and scholars decades later as one of the most influential creative works of cinema.  Many horror movie clichés can actually be traced back to Nosferatu.  And these clichés never would have been if it weren’t for Grau’s complete disregard for copyright rules.

Does this cliché vampire shot seem familiar? This is the original. Grau came up with this in his iconic copyright-infringing Nosferatu.

What goes around comes around, however, and now people can do anything with Nosferatu without Grau’s permission because the film is not copyrighted.  Dracula is in the public domain (aka not copyrighted) in the United States because Stoker failed to follow proper copyright procedures, so that eliminates any copyright conflicts with Dracula, as well.  You could upload it to YouTube and no one would complain.
In fact, it’s been done several times.
You could even upload a dubstep remix of Nosferatu.  Kind of pointless since it’s a silent film, but you could.  People even remaster the film in different ways and sell their version of it, often claiming it to be most authentic with the clearest picture or most accurate color tinting or music that would have been used in theaters in the 1920s.  People are making money off of Grau’s work just as Grau did with Stoker’s, but this time it’s legal.

We can see that copyright laws, to some extend, limit some creativity.  While coming up with a totally original piece is a great show of creativity, it has been demonstrated that derivative works can be quite creative, too.

Net Neutrality

Net Neutrality is a hot issue right now, especially since this Thursday, the Federal Communications Commission (in charge or TV, radio, internet, etc. in the USA) will be unveiling their new Open Internet policy.  No one is sure what exactly to expect, but there is a great fear that the internet is about to get a lot worse for everyone in the United States.  This is because Internet Service Providers (like AOL, Comcast, and Time Warner) have not been shy about their wishes make more money by charging websites to use a proposed “fast lane” in the internet.  Internet users like us could possibly be charged to access certain websites as well.

Netflix is already paying Comcast more money for regular speed internet. The struggle is real.
Netflix is already paying Comcast more money for regular speed internet. The struggle is real.

Right now, Internet Service Providers (ISPs, for short) get to make money by giving us access to the internet.  The whole internet.  Nothing is censored and no website gets to be delivered faster than another.  ISPs are not allowed to discriminate or become gatekeepers.

Gatekeeper: someone who controls what information reaches the consumer

But the ISPs want to remove the laws currently preventing them from making some websites faster than others and charging people more for the privilege of being faster.  The ISPs are just hatching a scheme to make more money, but they are trying to trick the public into thinking this is a good thing.  They’re proposing the creation of a fast lane and hoping people assume they mean adding a new feature to the internet that makes some websites faster than the normal speed.  What they actually want to do is slow down the internet and charge more to use the normal speed, which is technically faster than the slow internet they’d have created.  This explainer puts it best:

Basically, the ISPs can get richer while most everyone else loses time waiting for less-popular websites to load and money paying for access to websites.

If ISPs are successful in eliminating Net Neutrality, websites can become similar to cable television channels.  In the United States, some channels are free to watch, like ABC, CBS, and NBC.   Some channels, like AMC or Disney, are part of a basic cable package that costs money.  Other channels, like HBO or Showtime, are part of a premium cable package that costs even more money.  There are various packages of television channels you can get from various providers.  And without Net Neutrality, the internet could look a lot like this:

Click Image To Enlarge

You could be charged for access to packages of websites, which would be just as annoying for the internet as it is for cable TV, but even more so because we use it for more than just news and entertainment.

Another danger with the proposed end of Net Neutrality is that ISPs would have the freedom to block or censor content if they feel it’s reasonable.  Potentially, Comcast could block AOL’s website and claim that it was for technical reasons or something.  This video explains this possibility in more detail:

Or an ISP could block a politician’s website and claim it interfered with internet traffic (when it could really be because the ISP’s CEO hates that politician).  The end of Net Neutrality in America could be the end of free speech in America’s internet.

Recently, the popular website Reddit created a blog post that explains how an average person can make a difference regarding the threat to end Net Neutrality.  Included are walkthroughs of how to let Congress and the FCC know that Net Neutrality is important to you, which helps because it’s supposed to be Congress and the FCC’s job to protect the interests of the people.

Comments on Reddit regarding the blog post affirm the power of making phone calls to Congress, claiming that even a hundred calls can stand out because few calls are made in the first place.  Click here to read these comments.

ENGL 340 Group Project: A Quantitative Analysis of Thoreau’s Walden

Fall 2014 Update: While the course project’s preliminary results appear below, the project has since evolved into a more sophisticated effort to co-author an update to Harding’s 1962 article using digital tools: a team of authors is using Harding’s claims to guide a distant reading of Thoreau’s novel using Harvard’s General Inquirer categories and is, in turn, employing distant reading to assess those claims.  Led by Gregory Palermo, this team includes ENGL 340 veterans Michael Gole and Jonathan Pepperman alongside English department students Rebecca Miller, Jenna Cecchini, and Thomas McCarthy, supported by Drs. Paul Schacht and Kirk Anne.

Our group (Christine O’Neill, Angie Carson, Dan FladMichael Gole, and Greg Palermo) used data analysis to explore and verify Walter Harding’s claims in his essay “Five Ways of Looking at Walden.” This was a deductive process in which we isolated particular arguments from Harding’s text and then asked Dr. Kirk Anne (from Geneseo CIT) to extract raw data from the text of Walden using the Python and R programming languages . We interpreted this data, which we complemented with the usual close reading of Walden, to see if it supported or disproved Harding’s arguments. While data analysis can be useful for a number of reasons, establishing credibility (ethos) is among the most relevant to the future of the humanities.

Our process looked a lot like what is described in Stephen Ramsay’s article “The Hermeneutics of Screwing Around” and the Culturomics approach. First, we determined what specific type of data would be most useful to answer the questions we had generated around Harding’s claims.  Kirk then coded and ran the scripts to produce representations of the relevant data, which included spreadsheets, histograms, line graphs, and dispersion plots of the linguistic features of Walden and its versions. We were fortunate that he was able to produce such an abundance of data that we had the luxury of browsing through, selecting whatever seemed useful to us (and often asking for more detail about a certain aspect).

Each of the members of the group brought different questions to the data.  Here is how we, individually, dealt with turning the numbers into conclusions:


I found Harding’s claim that “the unifying device of the book is the year” to be interesting, so I tried to determine how I could quantify this observation. I asked a few questions, like “do the deletions show Thoreau getting rid of stuff that doesn’t have to do with the year, and the additions show him developing the theme of the year?” Eventually, what I really focused on was Harding’s specific delineation of the book’s arrangement: he said that Thoreau talked about his cabin and pine trees in the spring, bean fields in the summer, etc. Data analysis could give us a bird’s eye view of whether or not Harding was right about this “unifying theme” through examining the key terms he identified.

A lexical dispersion plot of Walden
A lexical dispersion plot of Walden

Next, I had Kirk Anne runs some numbers. I was able to access a chart of additions and deletions spanning the versions. Next, I asked Kirk to make a lexical dispersion plot for the key terms from Harding’s claim (and a few others) – in a nutshell, the plot was a graphic showing the concentration of these words across the chapters. So, if the word “spring” was heavy at the beginning and end of the book, or if a discussion of “ice” was heavy near the middle, that would indicate a special focus on the seasons. To take the opposite approach, I decided to look at the chapters first and see if additions/deletions/word-concentrations made sense according to the chapters. Sure enough, chapters with names like “The Bean Field,” or “Winter Animals” not only had high concentrations of season-related words, but showed the most overall addition and deletions.

What did that all mean? I interpret this data analysis to be confirmation of Harding’s claim. His editorial focus seems to have been on season related chapters, and clusters of season-related words appear in the appropriate spots of the text.

In a word, my strategy was: isolate a claim, ask some questions, use numbers to respond to the questions, and interpret the results.


The claim of Harding’s that most struck me was related to the readability of Walden: despite the size of Thoreau’s vocabulary, Harding says, his writing “cannot be termed ostentatious.” What I wanted to do, in order to substantiate Harding’s claim, was to quantify the lexical sophistication of different passages throughout the novel and compare that to the passages’ readabilities, which could be represented by readability indices like the Gunning Fog and Coleman-Liau.

Readability of different chapters of walden, by version, represented in a box-whisker plot of Coleman-Liau indices.
Readability of different chapters of Walden, by version, represented in a box-whisker plot of Coleman-Liau indices.

This, however, was far too large a project for the scope of our course. If we wanted to quantify the extent of Thoreau’s vocabulary, we’d have to compare his writing to works by contemporary authors. In addition, Harding isn’t specific about to what audience Thoreau’s text is readable: is he claiming that Walden was readable in Thoreau’s time? In Harding’s own time? Text is perceived as readable, in part, because of the norms of its age of reception; likewise, the indices–which were made in the latter part of the twentieth century–make assumptions about a text’s audience, an audience that may differ from a contemporary readership of a certain demographic.

So, I instead set out to see how the quantitative readability of different versions of a passage in Walden would compare with Geneseo students’ opinions of readability. I sent out a survey that asked them to read two versions of the same passage from Chapter 3, each of which scored quite differently when subjected to the algorithmic reading tests. Not knowing which one was supposed to be more readable, the students were to indicate which passage they found easier to read and to briefly explain why.

The results were exactly the opposite of what I’d initially hoped they’d be.  I won’t discuss their full implications, but a small majority (59%) of students indicated that the easier to read passage was the one that the indices indicated required a higher level of formal education. Admittedly, there were some problems with my survey. First, I used two versions of the same passage, so the second passage was more likely to be perceived as readable because it was already somewhat familiar (this is something I anticipated and also that quite a few respondents noted). In addition, most of the people who took my survey were English majors, who are quite used to finding their way through and comprehending intricate texts.

But we can learn something from this iteration of the study. Those who chose the passage that was supposed to be easier to read pointed out the attributes like punctuation and clause length on which the computational tools made their measurements. Those who chose this passage cited aspects of it that could not easily be quantitatively measured–for example, the rhetorical structure of Thoreau’s argument. Does this suggest that there are certain features of a text that cannot be quantified? That we need to be more attentive to what we apply certain algorithms? Or, do we just need more sophisticated ones?

I think that these questions lead well into Angie’s portion of the project.


My assigned data to analyze was in relation to Walden’s polarity and subjectivity. These two aspects, primarily subjectivity, worked in junction with Harding’s fifth style of reading Walden as a spiritual guidebook; a guidebook is inherently subjective in its having an opinion on how one is supposed to live their life. He had mentioned that there were four key chapters to reading Walden spiritually. Of course, it would make this project too easy if Harding’s key chapters matched up with the data received. Instead, I had the following to work with:

Harding’s Key Chapters:
Where I Lived, and What I Lived For
Higher Laws

Data Received from Kirk Anne:
Baker Farm (most subjective)
The Ponds (least subjective)
Reading (most positive)
The Village (most negative)***
Former Inhabitants; and Winter Visitors (most negative)

I’ll start with the polarity data, as it’s the easiest to explain. Thoreau came off as relatively negative throughout Walden. Though he was ranting during “Reading,” he did sound relatively positive compared to the rest of the book when talking about the benefits of reading. After reviewing the data and re-reading the chapter, it was easy to see why this was picked to be the most positive. The negative end of the spectrum was a bit more complicated. “The Village” was, according to the data, the most negative chapter of the novel. But when looking at the numbers, you can see that while this is said to be the most negative, it is also the shortest chapter of Walden. Kirk and I discussed that longer chapters such as “Former Inhabitants; and Winter Visitors” can have diluted negativity by the amount of excess words in the chapter that don’t coincide with the negative connotations. I took another look at both these chapters, as “Former Inhabitants; and Winter Visitors” was the second most negative chapter, to see if this was actually the case. I found that Kirk was correct and that “Former Inhabitants; and Winter Visitors” was, in my opinion, more negative that “The Village.” My reasoning was that in “Former Inhabitants; and Winter Visitors” Thoreau discusses the house fire that burns the man’s entire life away. In comparison, “The Village” discussed Thoreau’s arrest for not paying the Poll Tax, but his emotions towards the matter were far more indifferent than those displayed in “Former Inhabitants; and Winter Visitors.” Of course, this being just my opinion leaves room for error and is definitely something to continue studying.

The subjectivity data became my main focus for this project due to its relevance to the Spiritual Reading given by Harding. I took a look at four chapters: “Baker Farm” and “The Ponds” as they are the two our data said were respectively the most subjective and least subjective, and “Higher Laws” and “Economy” since they were the two that I picked to be the most subjective out of Harding’s four Key Chapters. After reviewing all four, I found that I agreed with Harding in that “Higher Laws” and “Economy” were more subjective than “Baker Farm.”

There are a few explanations I came up with for this occurrence. The first is similar to the polarity, where chapters like “Economy” are extensive in length, meaning the subjectivity is diluted by a higher word count. Another possibility I came up with is that the computer only picks up on direct opinions or keywords given by a person. I suggested this idea to Kirk and he believes that it could be a possibility due to his use of a code based on movie reviews as a training set for determining subjectivity and polarity. This indicates to him that there is probably a mismatch between the training set and Walden. This raises a new concern for collecting data from literature. So much of what is written is stated as a fact. When Thoreau is ranting, is he going to say “well, in my humble opinion, I think that the world is corrupted?” No! He states everything he believes as a fact and tells people that his way is the set right way (like any good spiritual guidebook would). I feel that this could make it difficult to create a training set for Walden, along with many other literary works, as they lack “opinion words/phrases” such as “I think,” “you should,” and so on. To me it only makes sense that there is at least one aspect of literature that requires a human mind to analyze it. After all, literature is created for humans, by humans, and isn’t made to be analyzed by a computer. I’m certainly not denouncing this project—I feel that using the technology available to us only can enhance the understanding we already have of literature. However, I don’t see the possibility of us ever fully replacing old-fashioned reading with computer analysis. Books will always need a human mind and eye to understand the human mind and hand that wrote them.


I addressed Walter Harding’s claim that Walden can be approached as a purely belletristic or aesthetic book, one of his “Five Ways of Looking at Walden.” Harding believes that Walden is an example of “good writing,” and that his generally straightforward writing style separates him from his contemporaries, who often used abstractions, euphemisms, circumlocutory logic and figurative language. (156) I figured that it would be a good idea to compare Walden to some of Thoreau’s contemporaries that Harding referenced. Aside from Walden, I looked at essays and other writings by Ralph Waldo Emerson, Nathaniel Hawthorne, Oliver Wendell Holmes, Walt Whitman, Edgar Allan Poe, Washington Irving, Francis Hopkinson, and Richard Henry Dana Jr. Unfortunately the majority of the data was not able to be produced in time for most of these sources. However, some very basic data is still worth noting. One of the things I was able to do was compare the length of the average sentence across these texts. If Harding was to be believed in his assertion that some of Thoreau’s contemporaries were overly abstract and circumlocutory, it stands to reason that their sentences would be generally longer and more wordy. This is not entirely the case however. In Walden, the average sentence length is 27.5 words per sentence. In looking at his contemporaries, there is a relatively even spread in terms of sentence length. This would pose an issue to Harding’s assertion if it were not for the fact that Harding also recognized that Thoreau’s sentences were unusually long. (158) Unfortunately, without more in depth data, it is difficult to further address the above claim regarding other texts.

To further address Harding’s claim, I used the data regarding Walden that Kirk Anne produced for our group. I first looked at Thoreau’s use of symbols. I reasoned that, if Thoreau was truly a less abstract writer than his contemporaries, the number of symbols in Walden would be relatively low. Thoreau used 11 symbols in Walden, which seems to be a relatively small amount. This potentially supports Harding’s claim that Walden was a much more straightforward work than those of his contemporaries, although I unfortunately do not have sufficient data regarding Thoreau’s contemporaries to draw any concrete conclusions from this number.

I next wanted to address the allusiveness of Thoreau in Walden. If Harding’s claim that Walden is simply good, relatively straightforward prose is accurate, it stands to reason that Thoreau would have a limited number of references to historical events, classic literature, and other similar things in Walden. While it is difficult to determine this from the data that I was provided with, I think a very general picture can be gleamed from the frequency of proper nouns in the book. In looking at allusiveness, we ideally want to remove locations from the data, as well as non-historic and non-fictional individuals. There are 386 proper nouns within Walden. Although the data could not account for this, it is reasonable to assume that this number would be considerably smaller if places and certain individuals were removed from the list. In general, I would make the argument that Walden is actually not particularly allusive, although there is unfortunately no data from contemporary authors to compare to.


I was working on verifying Harding’s claim that within Walden there is a “careful alteration of the spiritual and the mundane, the practical and the philosophical, the human and the animal”. I was also tasked with using the program Voyant tools as my tool for evaluating Harding’s claim. I began by uploading each chapter into Voyant tools, taking down notes on such things as the number of words and unique words in each chapter, and then combing through the chapters for the most common words. Voyant tools made this much easier than it would have been. I was able to pick out common words in chapters that I would qualify as spiritual, mundane, practical, philosophical, animal, or human. I spent more time on chapters Harding specifically mentions within his article as examples, so that I could draw upon his observations. However I did look at every chapter to make sure the alterations were continuing throughout the novel rather than simply in those chapters. Having read the chapters, it was sometimes frustrating that I knew these themes existed within the chapter, however the words being used didn’t always match up. For example, it was difficult to finds words that fit into the category of philosophical because many times Thoreau uses metaphor to convey these philosophical ideas. In other words, the words may be seemingly mundane but actually have a deeper meaning in context. However, even with these struggles, it was clear that there is truth to Harding’s claim, although I would say that all of these themes occur and exist throughout the novel, although the main theme being talked about may be alternating. However, even in a very mundane chapter like Brute Neighbors, there are still spiritual elements mixed in. In the end it was up to me to analyze the data given by voyant tools to see if I was able to come to the same conclusions as Harding.

The Days of an Omekan

Shortly after being assigned to the Archivists group and looking over what the previous website group had accomplished, we decided that we needed to create a more personal connection to Walter Harding’s story. While there was a lot of information about his work and the scholarly contributions he made to Henry David Thoreau’s legacy, we wanted to know more about Walter Harding himself.

After consulting with Liz Argentieri, an archivist of Milne Library, and asking about Harding’s relatives, she pointed us in Allen Harding’s direction. With the help of Professor Schacht, we contacted Mr. Harding by email and set up a conference call. We tried to do the call over Skype at first, but there were a few technical difficulties, so we switched over to a conference call on a cell phone.

The interview itself went wonderfully! Mr. Harding was very generous with his time and the iInterview with Allen Harding Screen Shotnformation he gave us, filling us with cool stories about his father. We learned how Walter Harding came to be at Geneseo (would you have guessed that it was his wife’s attachment to the town that made them stay there?), that he enjoyed camping and birdwatching, and that every summer their family would make trips to different universities where Walter Harding did research and work.After the interview, we started to transcribe an audio file we made of the discussion. We also received a few emails from Allen Harding containing some really wonderful photographs of
Allen Harding throughout the years. After transcribing and editing the interview, we uploaded it to the site, putting some pictures alongside each question to add a little flavor to the page. All of the photographs, however, are also uploaded under their own area of the website with captions.

The audio transcription along with other files were uploaded onto our website, using Omeka as our host network. Omeka allows users to pick from several different themes, varying in color and layout. We felt that the initial layout the previous students chose was unsophisticated an did not fit the vibe and image we wanted to create. After deciding on a theme we felt represented Walter Harding and his life, we began to upload the documents (letters,photographs, newspaper clippings, etc.) we found in the Harding Archives located downstairs in Milne.

A great deal of our efforts as a group went in to familiarizing ourselves with the contributions to literature and scholarship that Professor Walter Harding made. Together, we spent our time pouring through documents in the archives of Milne Library in order to gauge a sense of how Professor Harding’s impact has influenced us as students of SUNY Geneseo, as scholars of English literature, and as appreciators of the works of Henry David Thoreau. Despite the numerous documents we had accessed, however, it was but a fraction of Professor Harding’s collection and bibliography. It was our job to select a few items from that prolific collection of literature that we felt would best represent the legacy of Professor Harding’s 26-year tenure with SUNY Geneseo and put together a website which told a story of this man’s career.

We used scanners from Milne to upload the documents onto our laptops. After naming each file and assigning it to its specific folder, we were ready to start uploading everything onto the website!

We all were excited to begin work on the website. Dr. Schacht encouraged us to think of something new we could add to the site. He wanted us to create an exhibit as if the website was a museum and create a new area where interested parties could go to explore and learn.

After countless trips to the library going through archives, we all were struck with inspiration. Walter Harding had done so much in his life, impacting so many around him. He influenced his students at Geneseo, people across the country, and even other countries. His contributions about Thoreau and his works has made a lasting impact on our society.

That is why for the website, we created three different collections about his contributions to SUNY Geneseo, the nation, and then the world. It was so easy to go through the archives and find amazing documents and information for each section.







One of the most exciting moments was when we found a letter from Albert Einstein to Walter Harding. We didn’t realize that Harding even know him! We later discovered that he would write to all sorts of different people to see what they thought about Thoreau’s writings.

While we at Geneseo have all heard of Walter Harding one way or another, we didn’t realize until after we found so many various letters, newspaper clippings, and other documents that he was such a well known figure. Without, Thoreau truly would not be as well read or respected as he is today.

Working on the website was such a fun experience, since we got to learn more about this amazing man, and also design and create a whole world for people.

Students: Kathryn Bockino, Kevin P. Feeley, Corinne Green, Angeliki Ellie Laloudakis, Danielle Pesin, and Emma Wang

Faculty: Dr. Paul Schacht, Liz Argentieri

The Perks of NeTWERKing

This semester, the Networkers have studied digital collaboration tools, specifically Our goal was help other English Professors to implement WordPress blogging into their classrooms, and then to serve as resources to help improve their experience using the tool. Throughout that process, we created our own WordPress help page and used other tools such as Google Docs, Forms on Google Docs, and Doodle. We also researched other means for collaboration such as Prezi, Commons in a Box and Comment Press. We learned a lot this semester about the typical problems and fears English professors face in implementing blogging into their classrooms, and how to improve upon and enhance this experience.

Google Docs and blogging are two online tools that we used consistently in this class throughout the semester. Google Docs was incredibly useful in collaborating with group members on our projects. Not only were we able to collaborate with our own group members, but we were also able to collaborate with the rest of the class as well as Dr. Schacht. Dr. Schacht was able to post readings and assignments that he wanted us to do, and we could go on in class or outside of class and have access to them. Dr. Schacht was also able to go onto the Google Doc on his own time and check on the progress each group was making in their projects. Google Docs is a very useful tool for online communication in a class, or if trying to put together a project or presentation in a group. However, there are elements that other sites like WordPress are more suited for, for example blogging.

Blogging is something that is now being used in many classroom settings, as it is easily accessible to students. One positive aspect of blogging is that it gives students a chance to express their ideas in a place for others to read them. Another positive of blogging is that you can share things, like articles and media that connect with class discussions and enhance the material being taught in class. Blogging is an especially good tool for people who might be less outgoing and not likely to share their ideas in class. Blogs give them a place to feel comfortable expressing their ideas and opinions. One negative aspect of blogging is that it can be used as a substitute to face to face conversations and discussions. Blogging is a great form of communication as long as it is used in conjunction with class discussion, and used as a way to spark new ideas or reinforce what was taught and discussed in the classroom.

In order to come up with our manual for the class blog, our group explored many different avenues. At first we thought a PowerPoint or a Prezi would be an easy way of showing the general set up of the site, but we soon figured that that wasn’t beneficial for everyone. We also toyed with the idea of having a running document where anyone would have access to view, but again this seemed too large and messy. Users would have to go through so many different posts to get to their question and that could get too confusing and convoluted. To keep with the theme of the class a website seemed like the best idea as it would allow us to tag posts so that the users can use this with ease to find a specific post. Once we realized that wordpress was the best avenue as it looks and feels much like the SUNYGeneseoEnglish wordpress, we had to decide between a .com or a .org platform. As the school uses a .org we really wanted to use this, but in order to set up a blog we would have to pay for a subscription, and we didn’t feel like that was necessary when the .com was free for use to use for the small amount of posts we would have.
After setting up the page, we had to make it work for our purposes. We hit some roadblocks here. We originally each created posts about different topics (adding media, turning off email setting, etc) and then put them into different categories. Some of the posts also included information about who we were and how to contact us. It quickly became clear that those important posts were getting lost in a set-up in which most recent posts were first. So, we created 4 static pages: Home, Posts, Who We Are/How to Contact Us, and About English @ SUNY Geneseo. We also had to rearrange our widgets in the Posts Page to make “Categories” appear above “Recent Posts.” This made the site easier to navigate. We switched the theme to Hemingway Rewritten and put an image at the top of the trees in front of Welles in the Spring. The page now looks very aesthetically pleasing. Now we’ve decided to switch the site over to a page connected to To do this, we’ll probably have to change the page to xml and upload it to the .org page.

Throughout the journey of creating a help center we have faced various challenges that have encouraged us to re-route our plan of attack.  We originally thought that sending a mass email out to our classmates, asking for further information on what they wanted to know more about, would be a good idea.  We were not suspecting to have zero feedback.  This had set us back slightly.  We had created a wordpress filled with general information on blogging, although this idea was structured around feedback. Having that lack of feedback kept our idea from aiding specific information that was sought upon.  This challenge of not being able to reach out to others enforced our small group to put ourselves in other student’s shoes in trying to meet the suspected needs we believe they would ask.  This was the most troubling challenge we had to face. Although slight challenges have set our project back at times, the strength of our group had helped to overcome the difficulties and restructure our plan to better meet the needs of our website and the students using it.

This semester, I (Michael) have been immersed in the world of blogging. My SUNYGeneseoEnglish WordPress account is not only connected with the Digital Humanities, but also with The Practice of Writing with Dr. Paku, and Film Talk with Professor Ed Gillin. The three courses are all fundamentally different, and because of this, each class’ use of the blog is also different. Being in the Networker group has forced me to consider how the blog is used in other classes, and actually being in those classes makes me an expert. Dr. Paku’s class, The Practice of Writing, is composed of nine English Adolescent Education majors. This intimate group focuses on the basics of writing and how to teach writing to students. The class contains a service learning component, and the students spend time actually practicing what they learn in the classroom. At first, Dr. Paku would post a prompt as a blog post, and ask students to post their answers in the comments. While this may keep the posts somewhat organized, people visiting the blog (which is now public), may not think to click on the comments. In order to keep the organization of this blog and make the posts more accessible, we gave the blog a bit of a facelift. We made use of the ability to categorize each post, and then we added widgets on the sidebar that show those different categories. By the end of the course, Dr. Paku would ask us to blog about a topic, and she or I would create a category that everyone was able to choose. For example, after a week of service learning, we all posted about our experiences, and those posts can be easily accessed by clicking on the “Service Learning” category.

On the other hand, Professor Gillin’s class group, Film Talk, has been very straightforward and has not really changed at all, mostly because the original way it was used fits the class perfectly. Every week the class watched a movie and two students are required to do a film review on that movie. These students do not post on the blog like Dr. Paku’s students do, but instead, create a topic in the forum and post their review there. With this format, students are easily able to comment on the review and see the posts in a collected thread. A majority of these forum topics have at least three different “voices” in each. These two classes went about using the SUNYGeneseoEnglish WordPress very differently, but were able to tailor the site to their needs. With the help of the Networkers group, I think that even more English courses will be able to figure out the best ways to take advantage of using WordPress.

We also worked throughout the semester with Dr. Doggett’s class, assisting them with using the blog for their Irish Studies class. Through our work with them, we learned about a project they’re working on in anticipation of the Alumni Summer Trip to Ireland. They were looking to create some sort of site or app that people on the trip could use to communicate and get facts about different locations in Ireland. It’s going to be an ongoing project, and we were able to help them get started.

We met with Dr. Doggett and he told us that the idea was that while visiting certain places, they should be able to open this app, find the page that corresponds to their location, and access pictures and facts. He also wanted an outlet for people going on the tour to connect and discuss flights, transportation, and other questions they might have before the trip. In addition, he said that there should be a place for the latest news announcements on the main page, and a separate page for FAQ’s. Basically it is going to be a go-to resource for travelers to connect and get information both before and during their trip.

A challenge of this project is that while many people have smartphones, not all of the places they visit will have WiFi. Because of high international data costs, we needed to create something they could download and access offline. We started designing a WordPress site, creating sections and pages for the topics that needed to be covered. The group discussion will be better suited to a forum, so we decided there should be a group on the SUNY Geneseo English page for them too. While on the trip, Dr. Doggett will be able to post about the latest information they need to know, and we got an RSS feed on the main page so it can be conveniently seen. Our hope is that there will ultimately be an app that will allow users to download the whole site to their phones, making it accessible without WiFi.

Working on this project was a nice way to take an in-depth look behind the scenes of WordPress, which we’ve been using all semester. Seeing all the different options was overwhelming at times, but this makes it possible to customize the site in a way that meets our needs. Getting the Ireland site set up was exciting because it sets a precedent for future Ireland trips and other study abroad groups. We think it’s going to be a really useful tool. If we can make it accessible in Ireland, then other courses should be encouraged to create their own sites for their trips as well. Having started off at the beginning of the semester as students who were new to WordPress, we’ve come a long way and have learned what goes into all stages of blogging.

photo (4)

Group Members: Michael Augello (jellyfish), Lindsey Gales (dolphin), Becca Miller (seal), Katelyn Baroody (jellyfish), Chrissy Stellrecht (dolphin), Lyndsay Moore (jellyfish)

Explainers Explained

What is an explainer? This is the first question my group and I had to tackle as began working on our project at the beginning of the semester. The answer: An explainer provides a succinct, lively overview of a concept, issue, situation, event, principle, process, rule, procedure, or the like. It’s typically aimed at a broad audience. It can take the form of a written explanation or audio and visual media. As we began work, we looked at a variety of explainers to get a general understanding of how explainers were typically presented and decide how we wanted to present our own explainers.

One of the challenges we faced when creating our explainers was completely understanding the material in order to condense it into a concise explainer. This ended up working out in our advantage as we learned more about Thoreau and his life and his writing of Walden. Our final explainer focused on the Gettysburg Address and the five copies that Lincoln wrote. The explainer includes a brief story about each of the copies while also numbering them in the order Lincoln presumably wrote them in.

Another challenge we faced when creating explainers was the programs used to create them. Moviemaker was an integral tool in making the video explainer. I used both PowerPoint and Inkscape to make two different explainers. Inkscape is essentially a free version of Adobe Illustrator. It’s much easier to use this program than it is to use PowerPoint because it allows for free manipulation of text and graphics. Inkscape is built for designing graphics like explainers whereas PowerPoint is built for designing presentations.

The other half of our project was creating an explainer contest. Before the contest was opened, we created a rubric to judge the entries with and we created an email address to accept submissions. We also reconfigured the website to explain the contest and what an explainer is, wrote an announcement and sent it out. Unfortunately, the contest was unsuccessful, as we received two entries that were not what we were looking for. Despite the lack of success of the contest, the groundwork has been laid for the contest to run next year, hopefully with more success.


Group Members: Julie Eckert, Brodie Guinan, Kimberly Owen, and Jonathan Pepperman.

The Thorough Thoreau: the Annotated Fluid-Text Edition

The ENGL340 Coder’s team presents the Improved Fluid-Text edition of Walden…

Working with Beth Witherell’s “The Writings of Henry D. Thoreau,” and the Princeton edition of Walden, our team incorporated annotations from eight volumes of journals into the Fluid-Text edition of Walden, making the project more massive, authentic, and penetrating.

“We commonly do not remember that it is, after all, always the first person that is speaking. I should not talk so much about myself if there were any body else whom I knew as well. Unfortunately, I am confined to this theme by the narrowness of my experience.” Walden page one.

Project: Success!with some foundering… Our ultimate goal this semester was to add a feature to the Fluid-Text Walden that grants users a behind-the-scenes study of Thoreau’s writing process, from the [seemingly] random ramblings of his journals to the finished product– the transcendentalist masterpiece of life in the woods. Given that our group as a whole had little prior experience with encoding, our journey was not directly the exploration of growing authorship, but instead the exploration of this idea in a digital way– we explored this idea in a very meta way, going behind the behind-the-scenes to revamp the digital edition of the work.

Our task involved expanding our collective knowledge of coding, primarily utilizing the TEI standard and the XML format, of which we plugged finally into the versioning machine of the Fluid-Text.

However, what can’t clearly be seen in the digital manifestation of our effort are the organizational bumps we plowed into along the way– the rerouting and clarifying and focusing and… Here we will historicize the endeavor (much like our project scaffolds Thoreau to a greater degree)– of the careful balance digital humanists must establish between planning  and actual implementation.

For all the struggle, our project was completed. The Fluid-Text edition is ever-enhancing itself, and Thoreau isn’t done with us yet, either.

Table of Contents:

•The Process
• The Product
• Challenges
• The Future
• Digital Humanities
• Project Members
• Resources


 “No man ever stood the lower in my estimation for having a patch in his clothes…”

For some time, the project was undefined. We rolled around in the dirt of ideas and potential projects, and had a few false starts and dead ends along the way. ENGL340’s Data Analysis group discussed feasible ideas with our team at the onset, as we were both working with types of coding and analysis as the basic  purpose to our project, but eventually the path was forked. Finally, it was decided, given the availability of materials, that we would create a coding project that amalgamated passages in Thoreau’s journals that were later referenced in the published edition of Walden in the digital edition, the Fluid-Text. These cross-references  were neatly cited in a compendium in the back of the volumes (#1-8) of “The Writings of Henry D. Thoreau.”

01_ThoreauThoreau kept extensive journals, of which he recorded ideas, notes, and thoughts he had during his days. Some were lengthy– others, jotted down half-ideas, of which seem only to make sense to Thoreau himself (or, when Thoreau later finished the thought by including it in Walden).

Simply attaining the journals proved to cost a pretty chunk of time. Due to their rarity, cost, etc., it was some time before a complete set could be shipped to Geneseo. It would also be useful for everyone in the group to possess a physical copy of the Princeton Walden to use in tandem with the journals, and this was another unforeseen snag we briefly were stalled with.

Screen Shot 2014-05-13 at 5.37.49 PM

But– again!– more unanticipated hindrances. Though Thoreau’s writings are in the public domain, the works we used are edited versions, and thus are under copyright. We couldn’t photocopy just anything we wanted due to these legal limitations, and this involved yet more obtaining of journal volumes– we had to use all volumes in multiple steps of the project, often with multiple people needing to use the same one. We could only photocopy the index for the sake of reference, shown here.

We were the “coding” group, yet, oddly, the bulk of our work (after the planning) was tedious, hand-typing of every single journal entry into our Google spreadsheet. We needed the entire passage itself, its citation (e.g. 6.10-15; that is, page 10, lines 10-15), the date; also needed were its counterparts, the page number in Walden itself, and the relevant passage that most resembled the keywords from the journal entry:

Screen Shot 2014-05-12 at 9.19.11 PM Screen Shot 2014-05-12 at 9.19.27 PM

Some 500+ entries were filled out with this data.

All right– the data was assembled all in one place. We had the journal text itself, and all the data to link it to Walden itself, as well as things like dates that we could both include and perform analyses on [more on that later]. What next?

The point of using Google Spreadsheet was twofold. One, keep the group in sync and connected; two, utilize the function and mass-apply capabilities of the program. Our next medium, TEI (using the program Oxygen), needed transformed data. By writing a function and applying it to all 600 items in our spreadsheet, we could quickly and easily slide forward in our agenda. The beauty of coding and things like Google Spreadsheets is that it eliminates, most of the time, sheer labor. We need not write out every TEI data string that would be plugged into the Fluid-Text versioning machine for Walden– instead, we could write a program and apply it to that which matched the pattern (in our case, all items).

TEI (the Text Encoding Initiative) is a standard of programming used most often in digital humanities. TEI providing the guidelines, the markup language XML (Extensible Markup Language) was used to encode our data in the editor Oxygen, which allows for advanced features. The string of tags in the image above (appearing from the Google Spreadsheet) were transplanted to Oxygen for finalization:

Screen Shot TEI

Examples include the “resp” tag, which identifies the responsibility (i.e., editor). All of these tags are read by the machine and can be manipulated by the coding engine to perform different tasks, if need be.

The majority of the work done, the rest was detail-work and error-check. The TEI code itself had to be checked in Oxygen for technically errors. For instance, Thoreau often used the ampersand (“&”) in his writing, but in TEI, the ampersand is not read as text but a command. This demanded a work-around, as did other small errors.

Screen Shot 2014-05-12 at 10.09.00 PMThe majority of the work done, the rest was detail-work and error-check. The TEI code itself had to be checked in Oxygen for technically errors. For instance, Thoreau often used the ampersand (“&”) in his writing, but in TEI, the ampersand is not read as text but a command of sorts (as it was, an error). This demanded a work-around, as did other small errors.

Additionally, we had to proofread the annotations in the final form as well, as Oxygen can only pick up on technical errors (things not strictly allowed), while we wanted to check for errors beyond that, such as formatting, etc.

Screen Shot 2014-05-12 at 6.57.45 PM


Our part done, everything we worked on over the semester has been uploaded to a site directory, where you can see all our files and data, and download it to see for yourself. The product is now available in the Fluid-Text edition of Walden.



Now anyone can access the digital edition of Walden, which contains not just the various editions in Thoreau’s writing process of the Work itself, but also his journaled annotations.

Screen Shot 2014-05-12 at 11.26.14 PMHere is an example of what the journal annotations look like. In the text itself, a c marks a note at the beginning of a paragraph. A simple mouse-over reveals the forerunner thought Thoreau had in his journal writings.

Some times Thoreau copied himself exactly. At others, he radically changed the sentence structure, retaining only the very kernel of the statement. It is possible he actually copied lines from an older version of Walden into the journal, and then back again, to an version closer to the final product…


Copyright regulations, an ethical adherence to avoid plagiarism, etc., all slowed our work down. Unable to scan the journals or Princeton Walden, we were stuck with hard labor when a technical solution was right at our fingertips.

At the beginning of the course (and project), we only had a plain, basic understanding of markup systems. As a result, a non-significant portion of our group project time was spent learning various coding languages; the attributes and values and tags; etc., etc. Ultimately, though, Joe Easterly’s expertise and work in encoding and script-writing helped the project glide along when it came to the markup stages.

Due to the digital/coding aspect of the course, we focused more on doing something with the text rather than saying something about it. However, despite not focusing on critical analysis of Thoreau’s authorship through time, our own journey of learning tools, how to build systems, etc., resembled not only what you can do with our project’s outcome, but Walden itself. While some may question whether “digital humanities” is an oxymoron, or two incompatible things juxtaposed forcefully, let this be a lesson that that is not the case. Our digital humanities project plan was not literary analysis using digital tools, but it ended up resembling such a thing in the end after all, upon reflection.

Planning, planning, planning! While it may be said, especially, of humanities projects that too much planning can land one in developmental hell, it is equally true that too little planning can lead to dead ends faster than Henry David Thoreau would eat a woodchuck if he could catch it (answer: instantly devoured raw, of course). But one must be patient and precise, and, again, know that humanities projects are some some 75% planning, and just 25% action. If you are stuck like we were, or even outright failed, learn from others and yourself.


Still much work is to be done for Digital Thoreau and the Fluid-Text edition, specifically.

For instance, in assigning tags to the dating of journal passages, we have discovered that some of the dating of the various editions (a-g) may in fact be wrong. Of course, the assignment of writings to “editions” (rough drafts) of Walden was an “imagined” discipline, as Thoreau did not leave separate notebooks with complete versions in it, but instead wrote over older editions in different inks, etc. However, this project will lead into that one, leading to a more accurate chronology of Walden‘s writing process.

As for the Fluid-Text edition itself, the project is still ongoing. Not quite all of our cross-references were finished in time to be added, and formatting choices are still to be updated for the additional section of the journal notes.

Finally, as for you, and us, after a long journey, its time we can actually relax and study the outcome of our project, comparing the annotations and cross-referencing Walden passages.


So… what’s the point of all this? Why do all this work when we students have access to the journals in college libraries? The goal, of course, is to harness the awesomely democratic power of the Internet, powered by our digital tools and digital thinking. The TEI standard, XML, Oxygen– all these and more were utilized in this single project to make available the pre-Walden musings of our favorite transcendentalist, Henry David Thoreau. And of course, “transcendentalism” is just another pre-digital handle for “digital humanities,” the global project of freeing knowledge from the dusty recesses of collegiate libraries for everyone, everywhere.

“To my astonishment I was informed on leaving college that I had studied navigation!–why, if I had taken one turn down the harbor I should have known more about it…”

Thoreau was very critical of the time’s he lived in– its politics, society, and education. He advocated for individualism, but, contrary to popular belief, he didn’t spend two years in solipsistic seclusion. He not only remarked on culture and life, but actively observed it, often passing into the town of Concord. “Life in the Woods,” then, is not a literal demand to be hermitic, but a way of thinking.

“What sort of space is that which separates a man from his fellows and makes him solitary? I have found that no exertion of the legs can bring two minds much nearer to one another…”

The progress in the spheres of digital humanities is the new Wood to gather in. Thoreau’s writings are packed with thoughts on contemporary matters. Transcendentalism is a good place to start for explaining the purpose of digital humanities, but it is really only the very beginning of its enormous potential. And the only way to explore this new world is to dive right in. You can’t just walk in the woods a few times, you have to live there, to get your hands dirty and understand how, and not just what.

“But lo! men have become the tools of their tools…”

So join us! Code! Not only have we given readers the digital version of Thoreau’s pre-Walden thoughts, but we have attempted to show how we built this. As digital humanists, one must not be intimidated or aloof or ignorant, or even deprived of, the tools, how to use them, and how to build them ourselves. Go a bit past merely using digital tools and the Internet (to view cat videos, no doubt), to see how such things are constructed. Do not become a one-shot, funny-cat-photo-poster on the web– or, as Thoreau thought about the Collin’s cat, if you don’t adapt to changing society (improved by you, of course), you will turn to be completely wild, without a fine balance between thinking and doing, or solitude and society, literary and digital tools– and “so become a dead cat at last.” Don’t be a dead cat. Be a digital humanist!

“If you have built castles in the air, your work need not be lost; that is where they should be. Now put the foundation under them.”

*All Walden quotations from the Fluid-Text edition

Project Members:
Andrew Nauffts, Matthew Spitzer, Victoria Salazar,
Kyle Parnell, CJ Ferraro, Cob O’Brien, Joe Easterly

SUNY Geneseo ENGL340: “Literature and Literary Studies in a Digital Age”
Spring 2014

Special thanks to:
• Dr. Paul Schacht, our Professor
• Beth Witherell, Ron Clapper, and others for their work on Thoreau

Courtesy of digital humanist and Thoreauvian fanboy, Dr. Schacht


• Introducing the TEI Guidelines

•  A Gentle Introduction to XML

Free coding lessons online

• Free 30-day trial of Oxygen XML Editor

• The Social Reader’s Text edition of Walden

• This blog post by Dr. Paul Schacht with a video explaining the Fluid-Text in general

• Various blogs on coding, digital humanities, etc.: including this one, Coding Horror


What would you do if the power went out for good?

Throughout our class discussions the ultimate conclusion I have drawn is that we, as a whole are too comfortable with technology. I have constantly been at odds with the power of technology throughout our class debates.

Obviously I have joined into the world of technology with adopting a phone and I lug a laptop wherever I go.  However it is getting so popular that people feel naked without it.

I enjoy an occasional camping trip, free from distraction and the pressure life brings every so often. Many, however, lose the point of camping or do not embrace it since technology has interrupted the world.

Yes, technology has done wonders for communication, medicine, organization and much much more.  It is miraculous at how smoothly and well connected the world is after technology rushed through.

But, having machines take the roles of humans and leave hundreds without jobs? Having a calculator do the math problems for you? Looking up the answers on all those practice tests? Having a meal interrupted because a family member is texting at the dinner table? Even having a glow appear in a movie theater? …Where is the line drawn? What are we actually learning?

I stumbled across this video which really shows a divide between the pros and cons of technology.

Technology’s Impact on our World

This class has enabled me to see the power and strength that technology allows us in instances I never realized because we are so used to it.  You tend to forget that technology helps save more lives than ever before.  I am so used to seeing it in my everyday life that i take it for granted.  It’s nice to take a step back and see the improvements technology has delivered to this world.  I just hope that people would be content without It at times.

I have argued with my 35 year old cousin countless times because she has to send out electronic invites to gatherings ,due to the up-to-date lifestyle she has chosen to embrace.  She doesn’t realize however that our 90 year old grandmother is not as familiar, nor does she have as much access to the technology that teenagers and adults use on a daily basis.

All that I wish for, is that technology will not replace the actions we are blessed with.  Technology should be used as an aid but it should never be used to replace. It is OK to be without Facebook or your cellphone for more than one day.  Most people feel like they would die without it.  I feel like the comfort we have over its presence is doing more harm than good.

I don’t think the world would be able to function if the power went out for good.  We are getting too comfortable with this lifestyle.  It is important to find a balance so that people will be able to function and know what to do if power just left the world.

It is good to embrace technology but it is also good to take on the tips from past inhabitants in your history textbooks.  The stars can guide you if you get lost, not just your GPS, food can be cooked without an oven,

How do you think the world would react to a powerless life style?  Don’t you think balance is needed?
Life should not run on the comfort of technology.

Children’s Walden?

In one of my finals last week, a friend of mine reported on reading Walden in a children’s picture book format. I was intrigued because I think it’s great to start children on the basic idea of classic literature at a young age. I researched a little and found out there there are multiple versions of Walden in the children’s section. A few include:


Henry Builds a Cabin actually has the bear that is supposed to be Henry Thoreau and whatever he does is in relation to his cabin. It shows how he is very much a part of something other than just nature. I think these are worth a read and actually give some more insight into the world of Walden!

Happy Finals!