Data Storage Decisions

My task this semester, as last semester, has been to geocode the museums of the world by entering data into excel. Not only are we recording each museum’s coordinates, we are also writing a short description of each site. Now that I have figured out my work flow, I’ve started to think about how I’m entering data, rather they the mechanics of what I’m doing. Particularly, I’ve been thinking about why we’re using Excel rather than database software.

During the course work portion of the Digital Public Humanities, we touched upon databases when we covered metadata. At my job, I’ve continued to learn about databases, and one of the first things my boss told me is that excel is not a database. A user can’t query information or show relationships among data. I was surprised to hear this, because every data entry job I’be ever had used Excel.  I scheduled exams using this software as an undergraduate, and cataloged artifacts with it in graduate school. Shortly thereafter, he taught me to use Access, and I understood the difference. Access is a relational database, and it is much easier to find information in Access than in Excel, especially when I’m using multiple fields. I find it easier to find an specific entry I’m looking for, or sort by multiple criteria. I think this kind of functionality would be useful for the Conflict Cultures project.

The goal of the Conflict Cultures project is not just have a list of coordinates and museum names, but to facilitate research. I think it would be helpful to sort the data by multiple criteria.  Relating the data via liked tables would be the best way. I think the location data could be stored in one table, and the textual descriptive data in another. They would be linked by a table containing the museum names and country codes. Arranging the data like this would allow for looking at the data in multiple ways. For example, it may be interesting to know where museums are located in proximity to capital cities. First the museums could be sorted by country, then by proximity to a coordinate taken from the center of a city.

There is certainly good reason to use Excel for the Conflict Cultures project, however. First, there is great potential to use this data in GIS spatial analysis. By entering the data all in one table, it can be imported in ArcGIS and read as a point shape file. Once in Arc, the data table can be sorted by multiple fields and further manipulated. Second, most people have access to Excel, which is an important criteria since this project is collaborative. Although we used Google Docs last semester, Excel is compatible with ArcMap and other GIS software.

In thinking about these issues, I keep coming back to a key concept I learned from my second semester in the Digital Public Humanities coursework. When creating any data collection project, the developer must be conscious of how the end user want to use it. Although, in my opinion, a relational database is better for sorting and accessing data, Excel is probably the better option for this project. The end user is most likely going to use our data table in GIS, and Excel would be easier to import into GIS.


Leave a Reply

Your email address will not be published. Required fields are marked *