Category Archives: Guides

Reading a Wikipedia Article: A Guide

I attended middle and high school in the early 2000’s, when high speed internet, public wifi, mobil devices and user generated content was all fairly new. I think anyone going to school at that time can remember lengthy lectures from teachers and librarians warning us against the pitfalls of Wikipedia. I was taught never to use it as a source. First of all, it is a general knowledge source, and like all encyclopedias shouldn’t be cited. But more importantly, the content was misleading, misinformed, badly written or just wrong. The text was written by ordinary people- not trusted PhDs.

To the first point, most high school students probably shouldn’t cite Wikipedia as one of their three internet sources. But as to the validity of the information, investigations have shown that most of the content is reliable1. As with any other source, Wikipedia articles should be read with scrutiny. The following is intended to be a guide to this end.


How to Read a Wikipedia Article:         A Step-by-Step Guide

As an illustrative example, I’ll be using the article “Digital Humanities.”

1. Read the article. Sounds simple enough, but with all our history teachers’ warnings of the pitfalls of Wikipedia, sometimes just reading the article seems like a slander against our intellectual upbringing. We were taught better than this! But before we go throwing accusations around, perhaps we’d better give the accused their say. 

As a result of this read-though you should be able to answer a few questions: “What is this thing?”; “Why is it important?”; and “Why is it interesting?”  If, by the end of the article, you feel like you have a general grasp of what the thing is, at first blush the article is successful. In the example of the Digital Humanities article, I feel pretty confident that I could explain to a friend what DH using only the information provided.

The following two questions lead into the skeptical aspects of reading a Wikipedia article. They are questions about motive. There is no editorial board deciding what should be included (or not) in this encyclopedia. Individuals simply think, “This is a thing people would want/need to know about. So, I’m going to write this article.” In the case of our DH article, the author(s) seem to think it’s important because it’s a relatively new way of quantifying humanities research (at least that’s what I got out of the article). Now, the article about the “Heavy Metal Umlaut”2 many would probably argue is not particularly important (please refrain from long comments about intellectual and cultural relativism in the comments). It is, however, certainly interesting. After reading the article, I came away with the impression that it’s interesting because although there is no linguistic reason for the symbol in these metal bands’ names, there is a culturally significant reason.

2. Look at the author(s). At this point you might think to go directly to the revision history to see a stop-action film of the page’s development, complete with bloopers. Instead, I think a better approach is to look at the authorship, particularly those most engaged in editing to article. This information will give you an idea about just how “crowdsourced”  this article is. I would argue that the more editors an article has, the better. A good deal of editors means that at the very least you’re not reading an opinion piece.

First, click on the “View history” tab at the top of the article, then the “Revision history statistics” link found near the top of that page: Wikipedia Revision History Open photo Wiki_History_1_zpsqbjwk84a.jpg Upon clicking this link you’ll encounter a mess of data and lovely visualizations. The first item of interest is who created the article; in this case user Elijahmeeks. If you click the user name, you’re taken to a wikipage about the author where we learn that this is someone who is actively engaged in Digital Humanities at a university. At least in the case of this article, our fear that Wikipedia entries are created on the whim of bored teenagers is completely unfounded. I happen to know that Elijah Meeks is a major figure in Digital Humanities, but even if you didn’t you would understand from his page that he is an expert in the field. Not all contributors have anything in their link and some have very little information.  In any case, we can begin to understand why Elijahmeeks was compelled to start this article in the first place- he was a Digital Humanist (and one that studied Wikipedia, according to his page) so he had a vested interest in making the field visible on Wikipedia.

The next bit of information the Revision history statistics page provides that is of particular interest is that of the top editors: Wikipedia Editors photo Wiki_Editors_zps5iautini.jpg After exploring the top ten users, we come to learn that 6 of the 10 are Digital Humanities experts. Three are unidentifiable. We’ll get to the last one in a moment. But first, the six experts. Knowing that actual Digital Humanists were prominent among the most active editors of the page is a credit towards the articles legitimacy, but what of it’s accuracy? From the article it is clear that many facets of Digital Humanities is hotly contested. So, might these various experts (knowingly or not) be trying to push an agenda? Perhaps, but the fact that there are quiet a few suggests that they’re policing each other and making sure that Wikipedia’s NPOV (Neutral Point of View) standard is enforced.

Now, to that one known non-Digital Humanist contributor, ElKevbo. Before we even learn who this person is, we are met with an interesting set of notes in their link: One from the editor and one from (presumably) Wikipedia administration: ElKevbo photo ElKevbo_zpsjhygvztr.jpg

“I’m taking a break from Wikipedia for an indeterminate length of time. I’m a bit burnt our and experiencing a general lack of support so I need to reevaluate whether this is a worthwhile project in which to invest time and energy.” To which the administrator replies, “Great idea to take an indeterminate length of time. If you do return, hope you understand the difference of censorship and source materials. Wikipedia needs to stay free from what you were doing and/or attempting to do to censor information submitted to Wikipedia.” That’s pretty serious claim. This casts a shadow of doubt on this person’s edits, and the article in general, since this editor is ranked third. Did ElKevbo censor anything on this page? But, before we jump to conclusions, let’s visit ElKevbo’s personal website.

Here we learn that ElKevbo is Kevin R. Guidry, a “scholar of higher education” currently working at the University of Delaware. Kevin’s interest in the Digital Humanities probably springs from an administrative lens. From his biography, I would consider him someone engaged in Digital Humanities, but not an expert. He certainly uses DH tools and methods, but seemingly to ends of exploring higher education. In any case, for the purposes of assessing the usefulness of the Digital Humanities article, a closer look at his edits would be enlightening.

3. View the editors’ revision history. To get a better idea about the kinds of edits a contributor made (adding information, formatting, grammar corrections, ect.), looking at the specific edits of major editors, especially those in question, would be helpful. To do this, navigate back to the “View history” page and click the link “Edits by user”, which is on the same line as the “Revision history statistics” link.2  photo Edits by user_zps4yjdqyl9.jpg

After clicking this link, and entering ElKevbo’s (or the pertinent username) in the username field, we encounter a list of all ElKevbo’s edits.  photo ElKevbo_edits_zpskvztcwc3.jpg The date, number of characters added or subtracted, and a short description (provided by the author) is listed for each edit. I find the description the most helpful. A quick glance demonstrates that ElKevbo mostly removed content. According to the user, these edits were mostly made in compliance with Wikipedia guidelines. We can explore the changes by clicking the “diff” link on an edit. I was curious about the large subtraction he made on 4 April, 2012. Revision Diff photo Revision_DIff_zpsexhvidpm.jpg Indeed, upon looking at the changes, the assertions the previous author had made were uncited. Upon a closer look at ElKevbo’s edits, it seems that, in regards to the trustworthiness of the editors, the Digital Humanities article is solid.

4. Look through the entire revision history. This sounds daunting, and it would be if I meant to meticulously go through each edit to scrutinize the development of the page in minutia. But this is not what I’m suggesting. Rather, look through for edits that either add a great deal of material or subtract it. The purpose is to get a general idea of how editors interacted with each other to write a trustworthy article that meet Wikipedia’s guidelines. If you see a lot of back and forth between authors, and citations in the edit descriptions of particular guidelines, that’s good. It means the editors are policing each other. I save this for the end because I think you can better gage individual edits if you know who the major players are. If you start with this tool, as I did, you’ll get pretty bogged down in details, like the exact moment “Controversies” changed to “Problems.”

*Words to the wise: One thing that my high school teachers never harped on, which I think is important to keep in mind, is that most Wikipedia articles are a work in progress. You are not looking at a finished product, but something that grows and evolves with time. And that is probably the greatest strength of Wikipedia. You’re seeing what the general consensus is about a topic at the particular second in time you click the link to the article. That being said, it’s always a good idea to quickly check the latest edits to be sure you didn’t happen to view the page in the few minutes between the edits of a bored teenager and a good-faith contributor.


1 RRCHNM. Rosenzweig, Roy. “Can History Be Open Source? Wikipedia and the Future of the Past.” Originally published in The Journal of American History 93, no.1 (06, 2006): 117-46.

2 I discovered this page through the mandatory readings my instructor assigned leading up to this post. This isn’t so much a reading as a narrated movie of Wikipedia in action. It’s worth a watch both for the amusement and the scholarly discussion of the typical editing practices of Wikipedia: Udell, Jon. “Heavy Metal Umlaut: the movie.” Strategies for Internet Citizens (blog), January 22, 2005.  

3 You could also view the user’s history on their wikipage’s “Talk” tab. For assessing the validity of an article, however, I don’t think this is necessary or a good use of time.  

Using Palladio: A Reflection and User Guide

Palladio is a free web-based network tool. It allows users to upload their own data to create both maps and network graphs.  Users don’t need an account and can download project to their computers or save the url to return to their projects.

When embarking upon a networking project, I think it’s important to be conscientious about what kinds of relationships networking can best represent. The idea is to see relationships between data that otherwise are hard to conceptualize. My trials with Palladio have some good examples where networks effectively convey information, and good negative examples as well. As I describe how to use the tools, I’ll included some commentary about what I learned in the process.

Getting Started

 photo Palladio_Home_zpsom8mcrq9.jpg
After entering http://palladio.designhumanities.org/#/ into your address bar all you have to do is press “start.”
 photo Palladio_Load_Data_zpsntqiyodx.jpg
From there you’ll have to load some data. As far as I can tell you can’t create data in Palladio. My data were stored in .csv documents and prepared for me by my instructor. Hopefully you’ve organized your data before you’ve come to this point. I loaded the primary data set I would be drawing from- in other words the spread sheet all other sheets would relate to. From there press “Load.”

Your Data

 photo Palladio_Data1_zpsahjlvh4n.jpg
Your primary dataset will appear in the “data” tab. Here you’ll connect other data tables to make relationships. In this step you’ll already start making decisions about what kinds of relationships you want to show by what data you choose to add extensions to. In my project we were concerned with showing relationships between people and places, so this is the data we linked.
 photo Palladio_Add_Table_zpsscsdgkns.jpg
By clicking on a data field (in my case “where interviewed”) the editing window opens. Here I selected “Add a new table” to start creating relationships. I uploaded my “locations” spread sheet. Then I selected the “subject interviewed” data field from my primary dataset and added the locations of their enslavement. Finally, in the Enslavement table that I just uploaded I clicked on the drop down “Extension” menu and selected “Locations” to link the two datasets. At the end I had this set of data:
 photo Palladio_Data_Complete_zpsez9o611u.jpg
With all my data uploaded and connected, I was ready to start exploring Palladio’s visualizations.

Mapping with Palladio

 photo Palladio_Add_Layer_zpsvknycxra.jpg
It is possible to map with Palladio, but that doesn’t mean you should. Creating the map is fairly easy. Select “Add new layer” then the above editing window appears. Just select what kinds of points you want and the two datasets you’d like displayed (source for me was interview location, target where_ensalved). You might have to “ctrl -” to zoom out enough to see the “add layer” button.
 photo Palladio_Map_zpsxhnldfar.jpg
For some projects the mapping feature is probably sufficient, but for mine, it was far from. Although this map does show the movement of individuals from their places of enslavement to the location of their interview, the directionality is not clear. Palladio does allow for layers, but the information available for display is not nearly as rich or customizable as CartoDB and other GIS specific applications. This map does convey that there was a significant movement of people after slavery ended, but other questions can be better explored and asked in GIS applications.

Networking Tools

 photo Palladio_Settings_zpszjspy6hr.jpg
The networking tool is pretty intuitive. All you have to do is select that data you want to relate (a source and a target). The “facet” tab allows users to focus in on certain aspects of their data (for example in the above visualization I could limit the interviewees to those over the age of 80). In the above example I selected the interviewer as the source data and the interviewee as the target. The resulting visualization shows which interviewers interviewed which ex-slaves. I would say this is a good visualization because the viewer can see the intended information with ease. So, instead of looking at a spreadsheet organized by interviewer, I see all the their interviewees in one page. This visualization doesn’t elicit many novel questions, but it does provide solid information.
WPA M/F photo WPA M_F_zpsykydcfv4.jpg
Within the source data, you can select other facets to relate. In the above example I visualized which interviewers met with male and/or female informants. This helps me ask questions about gender bias in the interviews, or how gender norms in 1936 influenced the interviews.
WPA Type/Topics photo WPA Type_Topics_zpstz0kwiiz.jpg
Networking the relationships regarding topics had mixed results. The graph above, showing the relationship between the type of work the ex-slave was engaged in during their enslavement and what topics they discussed is a rather successful graph. I would have expected there to be more variation in the topics, but upon seeing this graph I was reminded that the interviewers used a script, asking about particular topics and really engaging in natural conversations. The few outliers probably represent occasional spontaneity arising from the script.
WPA Topics photo WPA Topics_zpsdlumm6pm.jpg
The above graph is a fairly clear example of an instance where networking doesn’t work well. The relationship between which interviewees and which topics they discussed. The result is too highly clustered and the size of the nodes too large to make anything out. This unfortunate grouping is due to the afore mentioned scripted. Almost every person discussed the same things sine they answered the questions they were being asked. This consistency also reflects what Mark Twain refers to as “corn-pone,” when enslaved or previously enslaved people who tell white people what they wanted to hear. It could be said that there was already a script, even before the WPA produced one.

A Brief Reflection

Networking visualization can be a very powerful tool when the investigator is conscientious of what information the graphs can provide. Palladio  only computes bimodal networks, which is usually the best thing. Looking at these networks allowed me to ask questions about why these relationships looked the way they did. So, why did female interviewers tend to meet with female informants? Why were ex-house slaves the only people to discuss “mammy?”

I was also able to draw some preliminary conclusions. The WPA script was, arguably, effective. Conversations stayed relatively on script and recorded consistent types of information. These observations would lead me to look at the scripts themselves to see to see if I’m correct. Which leads to another take-away: no tool replaces close reading.

Using CartoDB: A Reflection and Guide

CartoDB is a free online application that allows users to make GIS maps. The interface is user friendly and fairly straight forward for even the novice to navigate. I wouldn’t consider this a replacement for more powerful programs like ArcGIS, but this is certainly a better tool for projects looking to make clean, professional spatial visualizations. There are certainly tools that make deeper analysis possible, but not to the extent something like Arc would.

I tinkered with CartoDB using data derived from the WPA Slave Narratives, which I explained more fully in my last post about Voyant. Many interviews had GPS coordinates: where the interview occurred and where the interviewee had been enslaved. Those that did not have exact points were set in the middle of their city/region.  This exercise was intended to visualize the spatial elements of these interviews. As I discuss how to use some of the available features,  I’ll also reflect on the utility of the tools in this endeavor.

Getting Started

Home Page photo Carto_Home_zpsqajxdi1q.jpg
The first thing you need to do is create an account: https://carto.com/signup. After that you need some data, which needs to be organized into a spread sheet (I used data that was prepared in Excel, but other options are available). Hopefully, you’ve prepared the data before even getting as far as signing up or logging in (there is functionality to draw your own polygons, lines, and points, but I did not explore those features).  To add new data select “datasets” from next to your username (where is says “map” in the above screen shot).
Add New photo Carto_Data_New_zpsh60sqoq2.jpg
Then, all you do is upload, or create, your data from the options shown above. I really like that users can create their data from the software they’re comfortable with, which makes sharing data easier. I did not create the data I used for my project, my professor had certain goals in mind about what he wanted our maps to show. The way tables are organized influences the kinds of information map visualizations will display, so this functionality, though seemingly simple, is actually pretty powerful.

Your Data

Data Table photo Carto_DataView_zpsct483rsn.jpg
Now that you’ve got your data uploaded, you’re ready to start mapping. But, first, you might want to go through and make sure Carto understands your data. The above screenshot is what I saw upon uploading my data. I had to make sure all the numbers were recognized as such, but especially that the date was an actual date. I find this to be a helpful exercise in ensuring you understand your data, especially if you didn’t create it. Before I even saw the map, I began to wonder about how these categories related to each other on the map. Chances are, that if you’ve made your own data, you already have an idea.

Making Your Map(s)

 photo Carto_Map_Home_zpsl5z5woao.jpg
Now, all you have to do is click the “Map View” tab.  Carto automatically plots your points and zooms to the extent of those points. Like other GIS software you can work in layers if you so choose.
Infowindow photo Carto_Infowindow_zpsy8jd7nab.jpg
When you click on the side bar where you see the “+” and “1”, the tools open. I started by clicking the speech bubble (hover and see it called “infowindow“) to select what columns of my data I wanted to be displayed when I clicked on points that would help me identify one point from another, like the name of the interviewee. That was helpful for me, not just because my instructor told us to do that first, but because I was able to start visualizing what I might want my map to look like and the kind of information I wanted to show.

The wizard tab is the tool you’ll spend the most time with. This is where you select what kind of map you want. The default is “simple” which just plots your points. Here you’ve really got to think about what kind of information you want to convey. I was working with two sets of data: where the interviews occurred and where the interviewees were enslaved. Although I could have mapped both of these using two layers, it wouldn’t be very helpful as just points on a map, so I looked (and was instructed) to view the data in two separate maps.
Category photo Carto_Category_zpsicbzernq.jpg
As I played with the wizard I found that some elicited interesting questions while some just muddied the data. The map of where interviews occurred tells me more about the interviewers than interviewees. The “category” wizard gave a unique color to each interviewer and patterns emerged regarded how much or little they traveled. The “torque” map can be equated with the timeline feature in ArcGIS. I had dates for when the interviews occurred and was able to play an animation of when the interviews occurred. This allows for temporal questions to be asked: When was the zenith of the project? How were the interviews carried out- in a logical progression across the state or seemingly randomly? The cluster map was also useful for analyzing the data for this map. Where were the most interviews? Kernal density maps were not particularly helpful in illuminating this data, however. The interviews were already pretty tightly clustered in Alabama.

The other set of data, where the interviewees had been enslaved, did lend toward kernel density maps since they were spread out over a larger area.The simple map was still helpful getting an idea of where these people had been enslaved, but given the limitation that the exact X,Y coordinates were more often than not uncertain, the information presented as particular points might be misleading. Something like a heat or density map gives a more honest visualization of the available data.
2 Layer photo Carto_Heat_zpsmgwmhlky.jpg
In the end, I did combine the two maps to see how the locations of the interviews related on the face of the globe to the places of enslavement. I used a heat map for the places of enslavement and simple points for the interview. The resulting two layer map revealed that most of interviewees were enslaved in the metropolitan areas they were interviewed in. The major pitfall is that the map is rather hard to read, since Carto considers heat maps to be animations and therefore must be the top layer. I would have preferred for the points to be the top layer so they could be more visible. I played with the transparency of the heat map until I felt I had struck a balance between the visibility of the simple map and the color saturation of the heat map. You can add text to your map, but I found great difficulty producing a legend, the key component of conveying information.

A Brief Reflection

Mapping allowed me to ask “where” questions, which comes as no surprise. This exercise also elicited questions that require returning to the text for answers. Why did the people who were enslaved elsewhere move to Alabama? Are those ex-slaves who were enslaved in Alabama the same who were interviewed? Why did certain interviewers conduct their work where they did? Plotting points on a map is helpful because it reminds the researcher that these interviews were conducted in a place and place influences thought. Just how place and thought interact is the job of the researcher to investigate, but these questions are best broached with a map visualization.

Using Voyant: A Refelction and Guide

Voyant (voyant-tools.org) is a web-based tool set for text mining and analysis. I utilized the service to glean information about the nature of the WPA Slave Narratives. These narratives are the result of interviewers from the Worker’s Progress Administration seeking out ex-slaves from 1936 to 1938.

Voyant Tools: Getting Started

 photo Voyant_Start_zpsetdnpink.jpg
Upon arriving at the home page, I was given the option to either upload text files, enter urls, or enter text directly. I entered 17 urls, one for each state that participated in the WPA project, and clicked “Reveal.”

Here I ran into my first hick-up. Some times, the quantity of data I was loading into the web tool seemed to much and either Voyant would go on “fetching corpus” forever, or it would give up with an “error” and no explanation. Luckily, there’s an easy fix. Just visit http://docs.voyant-tools.org/resources/run-your-own/voyant-server/  and download the Voyant Server. Nearly all my problems were solved after I downloaded the server, so I do suggest it.

Voyant Tools: the Tour

Voyant Home photo Voyant_Home_zpsnqmr7bd9.jpg
Once my corpus was “fetched” I instantly saw visualizations for my text. I already saw that I’d have to do some adjusting. Just looking at the word cloud, I saw that Voyant was including information that I knew wasn’t useful. In my corpus, there was not standard transliteration for dialect, so the most common words “dey” and “dat” were not significant since dialectical variations were reliably recorded. There was an easy solution for this, but first I’ll go through each of the five tools: “Cirrus,” “Reader,” “Trends,” “Summary,” and “Contexts.”

1 Cirrus: This tool provides the old standard word cloud, with the largest words representing the most common words. The word count appears when your hover over the word. By sliding the “terms” bar you can adjust how many words appear in the cloud. The “scale” drop-down menu allows users to look at clouds representative of the entire corpus, or just a particular document. When you click on a word, the “trends” section displays the graph for that word. I found this tool helpful for getting a big picture idea of the interviews.

2 Reader: The reader allows for contextualization and some degree of close reading. The text from your documents is displayed. When I first came to the tools page the first lines of the first text in my corpus were displayed. The colorful boxes along the bottom of the window represent the different documents in your corpus. The width of the boxes represent how much of the total corpus they make up. The line going through the boxes is a representation of the trend of the word you are looking at. When clicking in the boxes, the reader displays the text at that spot. If you select a word from the “contexts” tool it will show that instance of the word (more on that in the “contexts” discussion).

3 Trends: This window displays a line graph of the frequency the term you’re exploring. Much like “cirrus”, users may adjust the scale of the graph from the whole corpus to a specific document. I found this tool useful in gaging how word use changed across states and allowed me to ask those rich “why?” questions.

4 Summary: This box provides the metadata of the document. The first line provides document count, word count, and number of unique word forms. It also conveys how long ago the session was started. Then the tool further breaks down information about each document, first with document length (longest, then shortest), vocabulary density (highest, then lowest), most frequent words, and distinctive words (by document). The “document” tab displays much of the same information in the main tab about each document. If you’re explore one word the “phrases” tab will display phrases the word under investigation is found in. I found the summary useful in, first, getting a sense of the magnitude of the text I was working with. Having never seen the volumes or even read the text, I was able to understand just how much text was being processed. Secondly, the summary conveyed the wide variety of language used across the text.

 5 Context: This tool essentially does what it claims. Once you’ve selected a word in either the Reader or Trends, context displays the documents the word occurs in as well as texts to the right and left. If you click on the term in one of the lines, that line will appear in the reader with the surrounding text. I found this helpful for, well, putting floating terms in context by doing a little close reading.

Voyant Tools: Stoplist

Voyant Settings photo Voyant_Settings_zpsudlfyfnf.jpg

My corpus had a long list of words that weren’t helpful and likely almost any text analysis project will. Luckily, it’s very easy to adjust the stoplist in Voyant. In any of the tools, when you hover over the question mark (but do not click) more options appear (pictured about). Click on the slider icon to call up this text box:
Voyant Options photo Voyant_Options_zpsid7dtkin.jpgThere are several adjustments that can be made, but for the stoplist, just click the “edit list” button next to the “stopwords” dropdown menu. Another text box will appear in which you can enter your next terms and edit the auto-detected list if you choose.

The little arrow  coming out of the box icon allows you to export your visualization in several formats. Below, I chose one that could be embedded in a web page:

As you can see, this is a fully interactive word cloud. Each of the tools allows for this utility. This word cloud is also the result of adding words to the stoplist. This word cloud is much more representative of the corpus than the previous one you can see in the screen shot of the home page.

A Brief  Reflection:

AHaving used Voyant Tools, I have a much better appreciation for the anaylitic power of text mining. I was able to see patterns and outliers much more readily than a close reading. I was also able to ask novel questions that I doubt I would have been able to had I read each interview one at a time. As for using Voyant as that text mining tool, I have mixed feelings. The fact that the service is completely free is a huge boon, but there’s the old saying, you get what you pay for. With project looking at several million words, Voyant might be too slow. Although the export tool allows users to share their visualizations, you can’t save your work. So every time you close the program, you have to re-enter the text. Which, again, for larger projects would be a major hindrance.

A Guide to Digitization

Purpose: To provide a brief guide to digitization that will be utilized in this blog.

Underlying Assumptions: Digitization is the process by which material is reproduced in digital formats.  This guide was written with the understanding that no digital copy replaces the original object. Digitization creates a facsimile of the original and information is lost in the process.  Therefore, the first guideline is that the original should be handled as little as possible and stored as if no digital copy exists.

Digitizing Material: Almost anything can be digitized in one form or another, but not every element of the original object can be captured. The elements that can be captured in digitalization include:

  • Visual elements such as color, scale, dimensions and shape
  • Auditory elements
  • Movement

Elements that cannot be captured in digitalization include:

  • Sensory elements such as smell, texture, weight and taste
  • How the viewer would experience the original- for example magnitude and environment

Digitization is inherently project specific. Not every element capable of being captured is possible in every format. The project leader must make decisions about what elements need to be captured and which can be left out based upon the project. The project goals will determine which digital forms make the most sense for the objects being digitized. Some generalizations, however, can be made. For flat objects such as photographs and documents JPEG, GIF or TIFF formats are well suited for communicating visual information. Multi-dimensional objects such as sculptures and cultural heritage artifacts can be 3D scanned, although at this time 3D scanners are often prohibitively costly. These materials can also be photographed from several points of perspective and saved as JPEG, GIF or TIFF files. A final solution is to create of .mov or mp4 file by rotating a video camera around the object. The limitations of the last two options is that the viewer cannot manipulate the resulting digital product as much as a 3D scan. Audio can be transcribed into a PDF, which would be ideal for a project concerned with the content of the audio only. Projects which seek to explore auditory characteristics would find mp3 files useful. Performances or other objects for which movement is essential would benefit from .mov or mp4 files.

The digitization of material can be broken into four steps, and at each step the project leader must make decisions using the project goals as a guide.

  1. The Object is Captured in Digital Format
    • Decisions:
      • Which hardware to use
      • What conditions are necessary to reach minimum standard requirements (codified before the start of the project)
      • What information is the most important
  2. Import the Digital File
    • Decisions:
      • Name of the file
      • Location of the file
  3. Digital Manipulation
    • Decisions:
      • How much manipulation is necessary for the necessary information to be conveyed
      • What file formats will best deliver this information
      • What metadata can be extracted
      • How will this metadata be presented (if any)
  4. Storage of the Digital Artifact
    • Decisions
      • Creation of a Master File and publicly accessible files
      • Location of files

The Impact of Digitalization: The act of digitalization changes the way in which objects are understood. Digitalization is a wonderful tool for making material available to a bigger audience, and more voices change how material is understood. The digitizer should be aware that the object is not being digitized as though it magically turns into ones and zeros to live on a server. Rather, a digital copy is made, and this copy is incomplete at that.  A JPEG of a wallet-sized photo is a good copy because the viewer sees the object much in the same way he or she would view the photo in life. A 3D model of Stonehenge, however detailed, will never completely capture the magnitude of the structure on a standard computer screen. A digital copy fundamentally changes an individual’s understanding of the object by focusing on some elements and silencing others as a part of the digitization process.

Digitalization does open opportunities for users to transform the object. Sculptors could never make new arms for the real “Venus de Milo”, but they might using 3D modeling software. Users can change a textual document into speech and speech into text. Stills can be extracted from digitized film. Digitalization allows for the copy to be changed in ways the original might be incapable of and without harming the original.