A Thoreau Approach to TEI

       Why Encode the Journals?

Thoreau has several journal entries from his life in the hands of the scholarly community, but these entries are not encoded, existing only in manuscript and transcript form. This makes it very difficult for scholars to analyze, search through, and work with the texts as they strive to learn more about him and his work. The journal entries provide information about Thoreau that is absent from his books and other published works, and so are of great value to the community. Encoding these entries will make it easier for scholars to find patterns in who he interacted with, where he went, what he observed in nature, and what he did throughout his life.


The Process

Receiving the Journals

Our group got off to a rather late start, in part due to our lack of resources and lack of knowledge about XML and TEI. Within time, we received the files of Thoreau’s journal entries from Beth Witherell over Google Drive, including manuscripts, transcripts, and notes. After receiving the files we began to look through them in search of patterns and themes that we could focus on. Beth Witherell shared sets of journal categories and information so that we would have a starting point for our journey.


The Google Doc

Once we determined the themes that we wanted to focus on, we then created a Google Doc containing the journals that we were going to encode. From here we had to split up the elements that we wanted to identify, and to do so used a color coordination method that consisted of us highlighting the specific words in different colors to make it easier for us to locate them. These elements included dates, activities, times of day, possessions, animals, plants, weather, people, and places. We each picked two elements and went through the document, highlighting their occurrences in several entries.


Going into Depth

After a discussion with Dr. Schacht we decided to elaborate on the elements that we had identified by providing more information, rather than tagging more elements. This process required us to assign an XML:ID to proper names of people and places, and also include the “ref” attribute within the tags that would link to a website with more information. To do this, we created a spreadsheet listing the tag, XML:ID, name in the text, place in the text, and reference link.


Starting to Encode

We divided up the documents amongst ourselves and started coding with the text editors on our own computers. For this we used both TextWrangler and Notepad ++. We each had approximately sixteen lines to encode and referred back to the spread sheet and GoogleDoc to keep track of our tags.



We ran into several roadblocks because TEI does not have have specific tags for elements such as animals, weather, plants, activities, and possessions. With Dr. Schacht’s help we were able to find ways around this problem and use alternative methods to tag plants and animals; however, we were still unable to find tags for weather, possessions, and activities, and had to leave those elements out. This would be more possible for more advanced TEI editors who have more time to focus on this. We also struggled with tagging names, because in the entry we focused on, Thoreau references vague and ambiguous characters, including a person only referred to as “C”.  We had to make several decisions on what each element could apply to within the text. For example, did a “salt marsh” count as a place, or should only proper places be included?


After we tagged the elements individually we combined our individual sections using the more advanced text editor, Oxygen, including a header that was created by Dr. Schacht. We downloaded Oxygen as late as possible into the process in order to utilize the thirty-day free trial. Oxygen was especially helpful because it validated our document as we worked, and gave hints at the source of any issues.



At one point, we found that several of our XML:IDs were not accepted by Oxygen, and we didn’t know why. Dr. Schacht updated the template he had given us, and explained to us how to include the XML ID’s in the heading rather than within the body of the text, as well as how to have references within the text itself.


Final Meeting

We met with Dr. Schacht with some final inquiries and adjustments to our document. We discussed referring to IDs from the header within the text and tagging geographic features, as well as tagging drawings and uncertain text within the entry. We also adjusted the spacing and formatting of the document. By the end of our meeting, the XML document was complete and validated within oxygen.



Throughout the process, we had several questions that Beth Witherell may have been able to answer, or possibly provide guidance on. These include:

  1. How important is preserving the format of the original journal entry manuscript? For example, we didn’t encode any of the line breaks, page breaks, or paragraphs that were in the manuscript. Additionally, the paragraphs were not included in the transcript we worked from, and the page breaks were not clearly defined.

  2. Is the quantity of elements tagged more important than the depth of information provided? For example, we tagged a very general list of elements, but provided further information through the use of reference links. Because of this, there were many tags suggested by Beth Witherell that we did not include.

  3. Are there some elements that are more important than others, that you think would provide more insight to Thoreau and his work than others?

  4. Are there specific journal entries that would be more beneficial to Thoreau scholars to encode than others?

    What We Learned

    We learned a lot about encoding throughout this process, especially about the limitations of it. There were several element tags that seemed simple to us and relevant to the text, but the TEI guidelines could not provide. Although these elements could have been customized by us, we did not have the skill nor the time to do so. We also learned that interpreting the manuscripts written by Thoreau is a delicate and difficult process that includes a lot of decision making and judgement on behalf of the interpreter. Finally, we learned that there is a sort of authority given to the encoder, because she decides how elements should be organized, what is important, and what is not.

    Group 4 Members: Daisy Anderson, Emily Buckley-Crist, Darby Daly, Melissa Rao

Does Our Generation Have Social Skills?

“You kids don’t know how to have real conversations anymore”, “You spend so much time playing on that cell phone of yours”, “You aren’t going to know how to express yourself in the real world after all that texting.” ThesScreen Shot 2015-03-27 at 7.44.47 PMe are things that I’m sure almost all of us have heard at some point, whether it be from our parents, grandparents, teachers, or maybe even that annoying aunt; but regardless of whoever had said it, there is a good chance that they were most likely a great deal older, having grown up in a generation that was not as heavily reliant on technology as ours is today. Now, it is very easy to just brush off these comments, as most kids our age, and teens especially, clearly know everything. And of course, what would these older folks know? They didn’t have technology like this. They’re obviously just stating that we’re wrong because our lifestyle varies soScreen Shot 2015-03-27 at 7.43.13 PM much from what theirs once was.

Here’s a crazy idea…maybe their criticism actually isn’t completely wrong this time, maybe we are in fact lacking in the social skills that they had when they were our age. Yeah so maybe you would rather sit on the couch at a family gathering and text your friends about how bored you are, or it could be less nerve-wracking for you to text your crush rather than asking them on a date. But  if you really think about these things, they’re kind of sad because you are potentially allowing yourself to miss out on so many little things, like the joy of laughing with your friends instead of typing “LOL” while you actually maintain a straight face. Even though thesScreen Shot 2015-03-27 at 7.45.38 PMe things may make me sound like a grandma, I would like to point our that there are plenty of studies that prove that this concept is actually true.

One study in particular held by UCLA included sixth graders whom were separated from their devices and had no choice but to interact with each other during a week of summer camp. These children were occupied with other social activities; they had no choice but to interact with one another. Meanwhile, the control group whom did not have their devices taken away maintained their usual interactions with social media. The end result proved that the children who did not have any contact with their technological devices showed an increased understanding and capability of identifying emotions; the control group essentially remained the same. Yes it’s true that these are only children, but these children are at at point in their lives where that are capable of understanding complex human reactions. Similar studies on adults proved to have similar results as well.

Maybe parents should monitor their children’s access to social media mScreen Shot 2015-03-27 at 7.45.21 PMore so to avoid these situations? Unfortunately there are a lot of difficulties regarding this possible solution. Today’s society is so heavily reliant on technology that, even if the child’s technology use is limited at home, their education will now heavily rely on it as well. Today’s children are learning how to do essentially everything with technology. Yet, this is not necessarily a bad thing either, because by the time these kids are our age, they’re going to appear as if they’re miniature Einstein’s. in our eyes when we are no longer up to date with all the modern technology. This is very important because at the rate the world is going at, we are probably going to have hover cars in the near future.

Technology is moving too fast to even make an attempt at avoiding using it. It’s literally everywhere. But knowing the possible strains it may cause on our social skills is very important. As long as we are aware of such issues I think that we could easily make time to not use technology for a little bit, maybe even do something crazy like have a small gathering with our friends as opposed to just texting in the group chat. Now that you’re done reading this I think you maybe turn off your computer, go outside, and take up a new hobby to try and make some new friends! (and try not to tweet about it)

Can you own data?

The lightning talks that we’ve been having in class have expanded what I know about technology (which honestly isn’t too much), and also the way I think about technology. The topic regarding the Digital Millenium Copyright act made me question whether or not one can own data.

There are many methods that someone can obtain data. This can be through purchasing it online (ie. iTunes, GooglePlay, the App Store, etc.), or the same products can also be obtained through torrenting. While purchasing data such as music, from somewhere like iTunes, it may be legal, but it also costs money. Naturally, everyone is always trying to find ways to save their money and avoid spending it whenever they can. This is where torrenting becomes a popular habit. Why buy an album on iTunes when you can download it for free? Well, torrenting is illegal, so even though the album may have been free, it’s similar to picking up the album in a record store and walking out without even paying for it.Screen Shot 2015-02-17 at 5.03.51 PM

One of the most commonly used torrent networks is BitTorrent. In order to send or receive files, one must have a BitTorrent client, a computer program that allows for the BitTorrent procedure. uTorrent, Vuze, and Xunlei are examples of commonly used clients. More than a quarter of a billion people are estimated to be using BitTorrent on a monthly basis.

Screen Shot 2015-02-17 at 4.28.17 PM

A torrent file is a file on computers that contain metadata about files and folders that are to be distributed, and also it usually contains a list of the network locations of trackers. These trackers are computers that help participants in the system find each other and to form efficient distribution groups that are called “swarms”. A torrent file does not contain the content to be distributed it only contains information about those files, such as their names, sizes, folder structure, and cryptographic hash values for verifying file integrity.

You can pretty much torrent anything you want, such as music, programs, apps, movies, etc. Doing so can put your computer at risk because there are so many Screen Shot 2015-02-11 at 11.48.02 AMways that your IP address can be obtained, allowing hackers to get into your computer or allowing yourself to get caught for illegal activity.

The lightning talk topic of “What is the Digital Millenium Copyright Act?” is very relevant to the idea of torrenting. The DMCA is a U.S. copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It is the criminalization of the production and dissemination of technology, devices, or services that are intended to circumvent measures (digital rights management) that control access to copyrighted works. It also criminalized the act of circumventing an access control, whether or not there is actual infringement of copyright itself. The DMCA’s main innovation in the field of copyright is the exemption from direct and indirect liability of Internet service providers and other intermediaries. So even though you may see your download of one song or one movie as innocent, you are in fact participating in criminal activity.

There are many different opinions regarding the topic of torrenting. One popular topic of discussion is the idea of torrenting music. Some people are completely against the idea, arguing that it is taking away money from the artists and the record label, while others argue that it is essentially the same concept as listening to music on the radio or giving a friend a mixtape. Nirvana drummer and Foo Fighters frontman Dave Grohl feels very strongly about this subject. “I think it’s a good idea because it’s people trading music. It has nothing to do with industry or finance, it’s just people that want music and there’s nothing wrong with that. It’s the same as someone turning on the fucking radio, it’s the same as someone putting a cassette in a cassette deck when the BBC plays a special radio session. I don’t think it’s a crime, it’s been going on for years. It’s the same as people making tapes for each other. The industry is more threatened by it because it’s the worldwide web and it’s a broader scope of trading, but I don’t think it’s such a fg horrible thing. The first thing we should do is get all the fg millionaires to shut their mouths, stop bitching about the 25 cents a time they’re losing.” Now I personally agree with this opinion, that downloading music online may not be the worst thing in the world, however; there are some people, like Lars Ulrich of Metallica, who have a very different view towards piracy, “It is sickening to know that our art is being traded like a commodity rather than the art that it is.”  I think that a lot of these opinions vary based on the person and the type of piracy that is being committed. Torrenting a 99 cent song as opposed to a $500 program definitely is much worse, however; both are still technically a crime.

So overall I think that the idea of owning data is possible, because, much like buying or stealing a product from a store, you can buy or steal a form of data. It’s true that you may not own the physical copy, but you still own it on your device. As long as it’s a form of theft then I see it as ownership.