Category Archives: Tools

An Attempt at Annontating an Oral History Recording

For my Digital Public History class I was assigned to annotate an oral history interview with OHMS. I didn’t have one of my own, as I found one on YouTube. At first I wanted to annotate a baptism story, but they typically weren’t long enough and functioned as witnesses to Christ’s saving power, rather than witnesses to historic traditions of baptism. Since I knew I’d have to listen to the audio over and over again, I wanted to find something I’d at least enjoy. So I thought I’d find oral history about nuns. In my experience they’ve been given a bad wrap as mean school teacher and have been some of the nicest, loveliest people I know. I stumbled upon “Sister Stories“, an oral history project across Catholic campuses to collect the vocation stories of nuns.

Having known a few nuns and being familiar with Catholic terminology made this interview fairly easy to annotate. Although Sister Nolan never says the phrase “Vatican II”, many of the reforms she talks about are the result of that council. I’ll let you be the judge of how well I did: https://ohms.uky.edu/preview/?id=35032

Contextualizing Baptism Personas

To help me create a scholarly and relevant product that will be useful to people who might want to use the material, I’ve created two personas. One is a retiree using her free time to explore her interests. The second is a faith formation teacher looking for engaging class material.

Name: Maggie Jones

Demographic: 65, Female, White, Catholic (28% of Tolland County- citydata.com), Retired, Parishioner, Married
Descriptive Title: The Life-Long Learner
End Goals: Maggie wants to learn more about Christian history and stay connected with her community.
Quote: “Now that I’m retired, I can learn all the things I’ve always wanted to!”
A Day in a Life Narrative:

Maggie’s an early riser, waking up with the sun. After she’s eaten breakfast she logs onto her computer to check her e-mail, and skims through her inbox, looking through the subject headings of several list serves she’s subscribed to (27% of Religious Web Users subscribe to a list serve, and it’s more likely that people actively seeking out religious information will subscribe- Rainie, CyberFaith, Pew, 2001). She then checks Facebook for updates from her friends and family (40-49% of Boomers use social networks- Zickuhr, “Generations 2010”, Pew). She follows several organizations, one of which is her home Parish and a few other religious pages, on Facebook, and likes to read through their daily posts on her wall.

She occasionally volunteers for with her church (retirees volunteer an average of 30min a day- Brandon, “How Retirees Spend Their Time,” U.S. News and World Report.com, 8 July, 2013), like decorating for Christmas. Most of her involvement is going to Sunday services. She considers religion very important (43% of Boomers consider themselves strong members of the faith communities- Cohn and Taylor, “Baby Boomers Approach 65-Glumly,” Pew, 2010) in her life.

Now that she’s retired, she’s taken up learning about things she’s always wanted to, but never had the time for. She’s always wanted to visit the Holy Land, and has been reading up on Church History. One of her Facebook groups posted a link to “Contextualizing Baptism” and she started exploring the website. She enjoys exploring the ancient house church and imagines herself visiting one. She feels connect to other Christians of her community by reading their Baptism experiences (Religious Web Users use the internet to connect with their community- Rainie, CyberFaith, Pew, 2001)

End Goals: Maggie wants to learn more about Christian history and stay connected with her community.
Name: Lauren Kellogg
Demographic: 23, White, Protestant, Young Professional, Single, Volunteer Faith Formation Teacher
Descriptive Title: The Millennial Bible Study Leader
Quote: “I want to engage my Bible Study.”
A Day in a Life Narrative:

Lauren just graduated college. She’s moved back home (32.1% of 18-34 year-olds live with their parents- Fry “For the First Time in the Modern Era, Living With Parents Edges Out Other Living Arrangements for 18-34 Year-Olds”, Pew, 24 May, 2014) and just started her first professional job. Having just left her vibrant campus Christian community, she’s looking for community in her small town. She wakes up just in time to dress and eat before leaving for work. In her spare time she uses Facebook (where she often sees her friends share their faith and occasionally posts about it herself [46% of social media users see other share their faith and 20% share their faith- Cooperman, “Religion and Electronic Media, Pew, 6 Nov, 2014]), Instagram, and follows a few blogs (80-89% of Millennials use social media- Zickuhr, “Generations 2010”, Pew, 2010). When she comes home she eats with her family and preps for the small Bible study she started running at her local church.

Lauren often surfs the web looking for engaging material for her Bible study group (62% of spiritual leaders use the internet to find educational material- Larson, “Wired Churches, Wired Temples,” Pew, 20 Dec, 2000). The parish provides some materials (44% of churches post online youth material- Larson, “Wired Churches, Wired Temples,” Pew, 20 Dec, 2000), but Lauren wants to bring in more interactive elements to her meetings. She wants something interactive and informative.

Lauren mostly uses her congregation’s page on “Contextualizing Baptism” to discuss Baptism with her study group. They also love the interactive house church and the ruins capture their imagination.

End Goals: Lauren needs a website that young people with a variety of education can engage with. The writing needs to be accessible and navigation easy.

A Social Media Strategy for My Final Project

Thanksgiving means family time,  big meals and final class projects. I’ve been asked to provide a social media strategy for the final project of my Introduction to Digital Humanities class (part of George Mason’s Graduate Certificate in Digital Public Humanities). Before I get into how I’ll utilise social media, I’ll first describe my project.

The Life of Farmland : a Digital Mapping Project*

In 2002 my parents purchased a house in (semi-)rural Connecticut located next to a historic farmhouse. Since we moved into that house I’ve wanted to excavate my backyard. In order for such a project to be fruitful, even as a middle schooler I knew I’d have to do some research to see where in my back yard would be a good place to dig. This final class project will ultimately determine if there is a space on my parent’s property that is a good candidate for Phase One (shovel testing) excavating.

To accomplish this task, I’ve combed through the town land records and constructed a timeline of ownership back to the 1800’s. I want to demonstrate how the boundaries of the property changed with ownership. To this end, I’ve first georeferenced historic maps with modern and labeled my parent’s property on each. I’ve also constructed polygons in ArcGIS using both property descriptions and maps of the land during different occupancies.

The end product will be a webpage with the georeferenced maps, polygons and the timeline.

Social Media Strategy

There are three audiences I’m interested in engaging: Local historians, the Jewish community and archaeologists.

Local Historians: I would love for local historians to see my work, both because I think they’d find it interesting and because they will likely have knowledge to contribute. My town does have Historical Society, which mostly consists of retirees. The Society does not have any presence on social media, just a website. I think the beset way to reach this audience, given their age, is to publish my findings as a blog, since this demographic (Baby Boomers) are likely blog readers. I will advertise my blog on Facebook, which in the last ten years has since an increase in older users. I can “friend” the members of the board that I can find on Facebook as well as post my blog to the town Facebook page.

I anticipate that this audience will be the most difficult to engage. Facebook’s greatest strength is the personability possible through the platform.  My hope is that microblogging on Facebook will feel more like a personal invitation. I will make these posts personable by asking the audience for any assistance they might offer the project. My primary message is that I’ve done exciting new work using some of their resources and that I not only want, but need their help for the successful completion of the project.

I can start adding members of the historical society, people who like their page, as well as the library and its followers, as soon as possible. As I work through the project, I can microblog status updates to build interest. I think it’s also important to respond to any comment, even if it’s just a “like.” That will encourage people to visit the blog itself when it goes live. My project will continue after the element for this class is complete, so I’ll be able to continue the blog and microblogging after the semester ends.

The Local Jewish Community: During my research, I learned that the farm land was owned by a promenant Jewish family in my town, the Goldsteins. Ike Goldstein, United Brethren Synagogue was built, of which he was a patron and a founding member.  The Goldsteins were able to purchase the land by obtaining a loan from the Jewish Agricultural and Industrial Society based in New York City. The Society’s goal was to empower landless Jews in cities by teaching them to farm, and eventually for them own and operate their own farms. The Goldsteins are a success story. It also appears that the Society owned property very near to the Godstein’s where Jews could stay for a time to learn how to farm.

My social media strategy for this audience will be very similar to that of the local historians. I’ll craft microblog posts that discuss the Goldstiens and the JAIS to Hebron’s page as well as the unofficial Untied Brethren Synagogue page. My message will be much the same as that to the local historians: I have information that may interest them and I welcome any new information.

Archaeologists: I would really like other archaeologists to see my blog to receive professional feedback. I’m already friends with many archaeologists on Facebook, so I can microblog about the archaeological aspects of my project on my Facebook rather easily.  I can also post to archaeology interest pages.  A Twitter account would also be good for this audience since it is heavily made up of Millennials. My general message would be to see my work and that I would value the opinions of my professional peers.

Measuring Success

The purpose of this social media strategy is to receive insight from my fellow Hebronites and professional feedback. Thus, I will mesure success, first, by comments left on my blog.  I may add a comments section to the blog to encourage this kind of communication. I would like to see comments by the end of the first week, seeing as it will likely take people a while to fit looking at my project into their lives.  I would be happy to see one or two individuals from each category respond.

The second measurement will be through comments left through Facebook and Twitter. I don’t put as much stock in this kind of validation, because a “like” in no way indicates that the person visited the blog. I’ll put a greater weight on comments left in Facebook, especially those that either discuss the project directly and those that express interest in actually visiting the blog.

* Working Title

Voyant, CartoDB and Palladio: A Comparison

Using the same data for all three project was an enlightening experience. Intellectually I had accepted that different methodologies would yield varying results and elicit different questions. But actually doing to work has given me a deeper understanding of the magnitude of this concept.

As an archaeologist, I’m always very aware of place. I often ask “where” questions and think of  human activity taking place on the crust of the Earth at specific points. Therefore, CartoDB was fairly intuitive to me. Before beginning to play with the software I already had some idea of what producing a map could tell me. What did surprise me was the depth of information I was able to get about place with the other two tools.

Although I often work with texts to both point me in the direction of sites and to add nuances to the archaeological record, I’ve always struggled conceptualizing text within space. Voyant in conjunction with Carto helped me visualize this relationship. While Voyant gave me visualizations about words that occurred within the seventeen states, Carto helped me make spatial connections and situate these data in a place rather than just a word “Alabama.”

Likewise, Palladio helped me make further connections about the observations I had made in Voyant and Carto. Voyant acted more as a comparative tool. I could see how word frequencies changed across the corpus. Palladio was a comparative tool as well, but graphs visualized magnitude and categories, whereas Voyant was useful in discovering that these categories existed, but was less effective in presenting observations in relation to other data.

The observation I made after looking at the data in all three tools was that there was a significant movement of people after emancipation. Voyant provided the words used to describe this movement like place names and occupations following freedom. CartoDB conveyed how far afield people traveled after emancipation. Finally, Palladio showed me the movement of individuals. That dynamic action across time and space is not something that one application was able to fully convey.

That being said, the Voyant, CartoDB and Palladio each have their specific strengths. Palladio might have a mapping feature, but if your project has a heavy map component, use Carto. Voyant can be used to topic model, but use Palladio to visualize how the topics related to people. Carto can insinuate relationships, but rely upon Palladio to actually connect the dots.

After looking at the three tools side by side I can see real potential for projects that integrate more than one. That being said, I find that academics can get lost in the sea of knowledge. Some times we spend so much time trying to know everything we can about a topic we loose sight of our project. A successful project needs to be able recognize when a tool will be useful and when it will detract from the goal. These three programs are very powerful discovery and publication tools. I find it very challenging to balance discovery with putting knowledge out there. At some point I have to at least pause discovery, draw conclusions, and share what I’ve learned. And sometimes I find it incredibly fruitful to return to the discovery process. Palladio and CartoDB allow for that fluidity, whereas Voyant is much harder to return to.

Using Palladio: A Reflection and User Guide

Palladio is a free web-based network tool. It allows users to upload their own data to create both maps and network graphs.  Users don’t need an account and can download project to their computers or save the url to return to their projects.

When embarking upon a networking project, I think it’s important to be conscientious about what kinds of relationships networking can best represent. The idea is to see relationships between data that otherwise are hard to conceptualize. My trials with Palladio have some good examples where networks effectively convey information, and good negative examples as well. As I describe how to use the tools, I’ll included some commentary about what I learned in the process.

Getting Started

 photo Palladio_Home_zpsom8mcrq9.jpg
After entering http://palladio.designhumanities.org/#/ into your address bar all you have to do is press “start.”
 photo Palladio_Load_Data_zpsntqiyodx.jpg
From there you’ll have to load some data. As far as I can tell you can’t create data in Palladio. My data were stored in .csv documents and prepared for me by my instructor. Hopefully you’ve organized your data before you’ve come to this point. I loaded the primary data set I would be drawing from- in other words the spread sheet all other sheets would relate to. From there press “Load.”

Your Data

 photo Palladio_Data1_zpsahjlvh4n.jpg
Your primary dataset will appear in the “data” tab. Here you’ll connect other data tables to make relationships. In this step you’ll already start making decisions about what kinds of relationships you want to show by what data you choose to add extensions to. In my project we were concerned with showing relationships between people and places, so this is the data we linked.
 photo Palladio_Add_Table_zpsscsdgkns.jpg
By clicking on a data field (in my case “where interviewed”) the editing window opens. Here I selected “Add a new table” to start creating relationships. I uploaded my “locations” spread sheet. Then I selected the “subject interviewed” data field from my primary dataset and added the locations of their enslavement. Finally, in the Enslavement table that I just uploaded I clicked on the drop down “Extension” menu and selected “Locations” to link the two datasets. At the end I had this set of data:
 photo Palladio_Data_Complete_zpsez9o611u.jpg
With all my data uploaded and connected, I was ready to start exploring Palladio’s visualizations.

Mapping with Palladio

 photo Palladio_Add_Layer_zpsvknycxra.jpg
It is possible to map with Palladio, but that doesn’t mean you should. Creating the map is fairly easy. Select “Add new layer” then the above editing window appears. Just select what kinds of points you want and the two datasets you’d like displayed (source for me was interview location, target where_ensalved). You might have to “ctrl -” to zoom out enough to see the “add layer” button.
 photo Palladio_Map_zpsxhnldfar.jpg
For some projects the mapping feature is probably sufficient, but for mine, it was far from. Although this map does show the movement of individuals from their places of enslavement to the location of their interview, the directionality is not clear. Palladio does allow for layers, but the information available for display is not nearly as rich or customizable as CartoDB and other GIS specific applications. This map does convey that there was a significant movement of people after slavery ended, but other questions can be better explored and asked in GIS applications.

Networking Tools

 photo Palladio_Settings_zpszjspy6hr.jpg
The networking tool is pretty intuitive. All you have to do is select that data you want to relate (a source and a target). The “facet” tab allows users to focus in on certain aspects of their data (for example in the above visualization I could limit the interviewees to those over the age of 80). In the above example I selected the interviewer as the source data and the interviewee as the target. The resulting visualization shows which interviewers interviewed which ex-slaves. I would say this is a good visualization because the viewer can see the intended information with ease. So, instead of looking at a spreadsheet organized by interviewer, I see all the their interviewees in one page. This visualization doesn’t elicit many novel questions, but it does provide solid information.
WPA M/F photo WPA M_F_zpsykydcfv4.jpg
Within the source data, you can select other facets to relate. In the above example I visualized which interviewers met with male and/or female informants. This helps me ask questions about gender bias in the interviews, or how gender norms in 1936 influenced the interviews.
WPA Type/Topics photo WPA Type_Topics_zpstz0kwiiz.jpg
Networking the relationships regarding topics had mixed results. The graph above, showing the relationship between the type of work the ex-slave was engaged in during their enslavement and what topics they discussed is a rather successful graph. I would have expected there to be more variation in the topics, but upon seeing this graph I was reminded that the interviewers used a script, asking about particular topics and really engaging in natural conversations. The few outliers probably represent occasional spontaneity arising from the script.
WPA Topics photo WPA Topics_zpsdlumm6pm.jpg
The above graph is a fairly clear example of an instance where networking doesn’t work well. The relationship between which interviewees and which topics they discussed. The result is too highly clustered and the size of the nodes too large to make anything out. This unfortunate grouping is due to the afore mentioned scripted. Almost every person discussed the same things sine they answered the questions they were being asked. This consistency also reflects what Mark Twain refers to as “corn-pone,” when enslaved or previously enslaved people who tell white people what they wanted to hear. It could be said that there was already a script, even before the WPA produced one.

A Brief Reflection

Networking visualization can be a very powerful tool when the investigator is conscientious of what information the graphs can provide. Palladio  only computes bimodal networks, which is usually the best thing. Looking at these networks allowed me to ask questions about why these relationships looked the way they did. So, why did female interviewers tend to meet with female informants? Why were ex-house slaves the only people to discuss “mammy?”

I was also able to draw some preliminary conclusions. The WPA script was, arguably, effective. Conversations stayed relatively on script and recorded consistent types of information. These observations would lead me to look at the scripts themselves to see to see if I’m correct. Which leads to another take-away: no tool replaces close reading.

Using CartoDB: A Reflection and Guide

CartoDB is a free online application that allows users to make GIS maps. The interface is user friendly and fairly straight forward for even the novice to navigate. I wouldn’t consider this a replacement for more powerful programs like ArcGIS, but this is certainly a better tool for projects looking to make clean, professional spatial visualizations. There are certainly tools that make deeper analysis possible, but not to the extent something like Arc would.

I tinkered with CartoDB using data derived from the WPA Slave Narratives, which I explained more fully in my last post about Voyant. Many interviews had GPS coordinates: where the interview occurred and where the interviewee had been enslaved. Those that did not have exact points were set in the middle of their city/region.  This exercise was intended to visualize the spatial elements of these interviews. As I discuss how to use some of the available features,  I’ll also reflect on the utility of the tools in this endeavor.

Getting Started

Home Page photo Carto_Home_zpsqajxdi1q.jpg
The first thing you need to do is create an account: https://carto.com/signup. After that you need some data, which needs to be organized into a spread sheet (I used data that was prepared in Excel, but other options are available). Hopefully, you’ve prepared the data before even getting as far as signing up or logging in (there is functionality to draw your own polygons, lines, and points, but I did not explore those features).  To add new data select “datasets” from next to your username (where is says “map” in the above screen shot).
Add New photo Carto_Data_New_zpsh60sqoq2.jpg
Then, all you do is upload, or create, your data from the options shown above. I really like that users can create their data from the software they’re comfortable with, which makes sharing data easier. I did not create the data I used for my project, my professor had certain goals in mind about what he wanted our maps to show. The way tables are organized influences the kinds of information map visualizations will display, so this functionality, though seemingly simple, is actually pretty powerful.

Your Data

Data Table photo Carto_DataView_zpsct483rsn.jpg
Now that you’ve got your data uploaded, you’re ready to start mapping. But, first, you might want to go through and make sure Carto understands your data. The above screenshot is what I saw upon uploading my data. I had to make sure all the numbers were recognized as such, but especially that the date was an actual date. I find this to be a helpful exercise in ensuring you understand your data, especially if you didn’t create it. Before I even saw the map, I began to wonder about how these categories related to each other on the map. Chances are, that if you’ve made your own data, you already have an idea.

Making Your Map(s)

 photo Carto_Map_Home_zpsl5z5woao.jpg
Now, all you have to do is click the “Map View” tab.  Carto automatically plots your points and zooms to the extent of those points. Like other GIS software you can work in layers if you so choose.
Infowindow photo Carto_Infowindow_zpsy8jd7nab.jpg
When you click on the side bar where you see the “+” and “1”, the tools open. I started by clicking the speech bubble (hover and see it called “infowindow“) to select what columns of my data I wanted to be displayed when I clicked on points that would help me identify one point from another, like the name of the interviewee. That was helpful for me, not just because my instructor told us to do that first, but because I was able to start visualizing what I might want my map to look like and the kind of information I wanted to show.

The wizard tab is the tool you’ll spend the most time with. This is where you select what kind of map you want. The default is “simple” which just plots your points. Here you’ve really got to think about what kind of information you want to convey. I was working with two sets of data: where the interviews occurred and where the interviewees were enslaved. Although I could have mapped both of these using two layers, it wouldn’t be very helpful as just points on a map, so I looked (and was instructed) to view the data in two separate maps.
Category photo Carto_Category_zpsicbzernq.jpg
As I played with the wizard I found that some elicited interesting questions while some just muddied the data. The map of where interviews occurred tells me more about the interviewers than interviewees. The “category” wizard gave a unique color to each interviewer and patterns emerged regarded how much or little they traveled. The “torque” map can be equated with the timeline feature in ArcGIS. I had dates for when the interviews occurred and was able to play an animation of when the interviews occurred. This allows for temporal questions to be asked: When was the zenith of the project? How were the interviews carried out- in a logical progression across the state or seemingly randomly? The cluster map was also useful for analyzing the data for this map. Where were the most interviews? Kernal density maps were not particularly helpful in illuminating this data, however. The interviews were already pretty tightly clustered in Alabama.

The other set of data, where the interviewees had been enslaved, did lend toward kernel density maps since they were spread out over a larger area.The simple map was still helpful getting an idea of where these people had been enslaved, but given the limitation that the exact X,Y coordinates were more often than not uncertain, the information presented as particular points might be misleading. Something like a heat or density map gives a more honest visualization of the available data.
2 Layer photo Carto_Heat_zpsmgwmhlky.jpg
In the end, I did combine the two maps to see how the locations of the interviews related on the face of the globe to the places of enslavement. I used a heat map for the places of enslavement and simple points for the interview. The resulting two layer map revealed that most of interviewees were enslaved in the metropolitan areas they were interviewed in. The major pitfall is that the map is rather hard to read, since Carto considers heat maps to be animations and therefore must be the top layer. I would have preferred for the points to be the top layer so they could be more visible. I played with the transparency of the heat map until I felt I had struck a balance between the visibility of the simple map and the color saturation of the heat map. You can add text to your map, but I found great difficulty producing a legend, the key component of conveying information.

A Brief Reflection

Mapping allowed me to ask “where” questions, which comes as no surprise. This exercise also elicited questions that require returning to the text for answers. Why did the people who were enslaved elsewhere move to Alabama? Are those ex-slaves who were enslaved in Alabama the same who were interviewed? Why did certain interviewers conduct their work where they did? Plotting points on a map is helpful because it reminds the researcher that these interviews were conducted in a place and place influences thought. Just how place and thought interact is the job of the researcher to investigate, but these questions are best broached with a map visualization.

Using Voyant: A Refelction and Guide

Voyant (voyant-tools.org) is a web-based tool set for text mining and analysis. I utilized the service to glean information about the nature of the WPA Slave Narratives. These narratives are the result of interviewers from the Worker’s Progress Administration seeking out ex-slaves from 1936 to 1938.

Voyant Tools: Getting Started

 photo Voyant_Start_zpsetdnpink.jpg
Upon arriving at the home page, I was given the option to either upload text files, enter urls, or enter text directly. I entered 17 urls, one for each state that participated in the WPA project, and clicked “Reveal.”

Here I ran into my first hick-up. Some times, the quantity of data I was loading into the web tool seemed to much and either Voyant would go on “fetching corpus” forever, or it would give up with an “error” and no explanation. Luckily, there’s an easy fix. Just visit http://docs.voyant-tools.org/resources/run-your-own/voyant-server/  and download the Voyant Server. Nearly all my problems were solved after I downloaded the server, so I do suggest it.

Voyant Tools: the Tour

Voyant Home photo Voyant_Home_zpsnqmr7bd9.jpg
Once my corpus was “fetched” I instantly saw visualizations for my text. I already saw that I’d have to do some adjusting. Just looking at the word cloud, I saw that Voyant was including information that I knew wasn’t useful. In my corpus, there was not standard transliteration for dialect, so the most common words “dey” and “dat” were not significant since dialectical variations were reliably recorded. There was an easy solution for this, but first I’ll go through each of the five tools: “Cirrus,” “Reader,” “Trends,” “Summary,” and “Contexts.”

1 Cirrus: This tool provides the old standard word cloud, with the largest words representing the most common words. The word count appears when your hover over the word. By sliding the “terms” bar you can adjust how many words appear in the cloud. The “scale” drop-down menu allows users to look at clouds representative of the entire corpus, or just a particular document. When you click on a word, the “trends” section displays the graph for that word. I found this tool helpful for getting a big picture idea of the interviews.

2 Reader: The reader allows for contextualization and some degree of close reading. The text from your documents is displayed. When I first came to the tools page the first lines of the first text in my corpus were displayed. The colorful boxes along the bottom of the window represent the different documents in your corpus. The width of the boxes represent how much of the total corpus they make up. The line going through the boxes is a representation of the trend of the word you are looking at. When clicking in the boxes, the reader displays the text at that spot. If you select a word from the “contexts” tool it will show that instance of the word (more on that in the “contexts” discussion).

3 Trends: This window displays a line graph of the frequency the term you’re exploring. Much like “cirrus”, users may adjust the scale of the graph from the whole corpus to a specific document. I found this tool useful in gaging how word use changed across states and allowed me to ask those rich “why?” questions.

4 Summary: This box provides the metadata of the document. The first line provides document count, word count, and number of unique word forms. It also conveys how long ago the session was started. Then the tool further breaks down information about each document, first with document length (longest, then shortest), vocabulary density (highest, then lowest), most frequent words, and distinctive words (by document). The “document” tab displays much of the same information in the main tab about each document. If you’re explore one word the “phrases” tab will display phrases the word under investigation is found in. I found the summary useful in, first, getting a sense of the magnitude of the text I was working with. Having never seen the volumes or even read the text, I was able to understand just how much text was being processed. Secondly, the summary conveyed the wide variety of language used across the text.

 5 Context: This tool essentially does what it claims. Once you’ve selected a word in either the Reader or Trends, context displays the documents the word occurs in as well as texts to the right and left. If you click on the term in one of the lines, that line will appear in the reader with the surrounding text. I found this helpful for, well, putting floating terms in context by doing a little close reading.

Voyant Tools: Stoplist

Voyant Settings photo Voyant_Settings_zpsudlfyfnf.jpg

My corpus had a long list of words that weren’t helpful and likely almost any text analysis project will. Luckily, it’s very easy to adjust the stoplist in Voyant. In any of the tools, when you hover over the question mark (but do not click) more options appear (pictured about). Click on the slider icon to call up this text box:
Voyant Options photo Voyant_Options_zpsid7dtkin.jpgThere are several adjustments that can be made, but for the stoplist, just click the “edit list” button next to the “stopwords” dropdown menu. Another text box will appear in which you can enter your next terms and edit the auto-detected list if you choose.

The little arrow  coming out of the box icon allows you to export your visualization in several formats. Below, I chose one that could be embedded in a web page:

As you can see, this is a fully interactive word cloud. Each of the tools allows for this utility. This word cloud is also the result of adding words to the stoplist. This word cloud is much more representative of the corpus than the previous one you can see in the screen shot of the home page.

A Brief  Reflection:

AHaving used Voyant Tools, I have a much better appreciation for the anaylitic power of text mining. I was able to see patterns and outliers much more readily than a close reading. I was also able to ask novel questions that I doubt I would have been able to had I read each interview one at a time. As for using Voyant as that text mining tool, I have mixed feelings. The fact that the service is completely free is a huge boon, but there’s the old saying, you get what you pay for. With project looking at several million words, Voyant might be too slow. Although the export tool allows users to share their visualizations, you can’t save your work. So every time you close the program, you have to re-enter the text. Which, again, for larger projects would be a major hindrance.