Voyant (voyant-tools.org) is a web-based tool set for text mining and analysis. I utilized the service to glean information about the nature of the WPA Slave Narratives. These narratives are the result of interviewers from the Worker’s Progress Administration seeking out ex-slaves from 1936 to 1938.
Voyant Tools: Getting Started
Upon arriving at the home page, I was given the option to either upload text files, enter urls, or enter text directly. I entered 17 urls, one for each state that participated in the WPA project, and clicked “Reveal.”
Here I ran into my first hick-up. Some times, the quantity of data I was loading into the web tool seemed to much and either Voyant would go on “fetching corpus” forever, or it would give up with an “error” and no explanation. Luckily, there’s an easy fix. Just visit http://docs.voyant-tools.org/resources/run-your-own/voyant-server/ and download the Voyant Server. Nearly all my problems were solved after I downloaded the server, so I do suggest it.
Voyant Tools: the Tour
Once my corpus was “fetched” I instantly saw visualizations for my text. I already saw that I’d have to do some adjusting. Just looking at the word cloud, I saw that Voyant was including information that I knew wasn’t useful. In my corpus, there was not standard transliteration for dialect, so the most common words “dey” and “dat” were not significant since dialectical variations were reliably recorded. There was an easy solution for this, but first I’ll go through each of the five tools: “Cirrus,” “Reader,” “Trends,” “Summary,” and “Contexts.”
1 Cirrus: This tool provides the old standard word cloud, with the largest words representing the most common words. The word count appears when your hover over the word. By sliding the “terms” bar you can adjust how many words appear in the cloud. The “scale” drop-down menu allows users to look at clouds representative of the entire corpus, or just a particular document. When you click on a word, the “trends” section displays the graph for that word. I found this tool helpful for getting a big picture idea of the interviews.
2 Reader: The reader allows for contextualization and some degree of close reading. The text from your documents is displayed. When I first came to the tools page the first lines of the first text in my corpus were displayed. The colorful boxes along the bottom of the window represent the different documents in your corpus. The width of the boxes represent how much of the total corpus they make up. The line going through the boxes is a representation of the trend of the word you are looking at. When clicking in the boxes, the reader displays the text at that spot. If you select a word from the “contexts” tool it will show that instance of the word (more on that in the “contexts” discussion).
3 Trends: This window displays a line graph of the frequency the term you’re exploring. Much like “cirrus”, users may adjust the scale of the graph from the whole corpus to a specific document. I found this tool useful in gaging how word use changed across states and allowed me to ask those rich “why?” questions.
4 Summary: This box provides the metadata of the document. The first line provides document count, word count, and number of unique word forms. It also conveys how long ago the session was started. Then the tool further breaks down information about each document, first with document length (longest, then shortest), vocabulary density (highest, then lowest), most frequent words, and distinctive words (by document). The “document” tab displays much of the same information in the main tab about each document. If you’re explore one word the “phrases” tab will display phrases the word under investigation is found in. I found the summary useful in, first, getting a sense of the magnitude of the text I was working with. Having never seen the volumes or even read the text, I was able to understand just how much text was being processed. Secondly, the summary conveyed the wide variety of language used across the text.
5 Context: This tool essentially does what it claims. Once you’ve selected a word in either the Reader or Trends, context displays the documents the word occurs in as well as texts to the right and left. If you click on the term in one of the lines, that line will appear in the reader with the surrounding text. I found this helpful for, well, putting floating terms in context by doing a little close reading.
Voyant Tools: Stoplist
My corpus had a long list of words that weren’t helpful and likely almost any text analysis project will. Luckily, it’s very easy to adjust the stoplist in Voyant. In any of the tools, when you hover over the question mark (but do not click) more options appear (pictured about). Click on the slider icon to call up this text box:
There are several adjustments that can be made, but for the stoplist, just click the “edit list” button next to the “stopwords” dropdown menu. Another text box will appear in which you can enter your next terms and edit the auto-detected list if you choose.
The little arrow coming out of the box icon allows you to export your visualization in several formats. Below, I chose one that could be embedded in a web page:
As you can see, this is a fully interactive word cloud. Each of the tools allows for this utility. This word cloud is also the result of adding words to the stoplist. This word cloud is much more representative of the corpus than the previous one you can see in the screen shot of the home page.
A Brief Reflection:
AHaving used Voyant Tools, I have a much better appreciation for the anaylitic power of text mining. I was able to see patterns and outliers much more readily than a close reading. I was also able to ask novel questions that I doubt I would have been able to had I read each interview one at a time. As for using Voyant as that text mining tool, I have mixed feelings. The fact that the service is completely free is a huge boon, but there’s the old saying, you get what you pay for. With project looking at several million words, Voyant might be too slow. Although the export tool allows users to share their visualizations, you can’t save your work. So every time you close the program, you have to re-enter the text. Which, again, for larger projects would be a major hindrance.