Extensive Readings.


As part of my research at the Huygens Institute in The Hague, I’ve been reading A LOT lately and on the advice of a visiting faculty member, I’ve recently started tracking all of my articles and books in my Zotero account.  You can view my library here: .  I haven’t done much organization of the collection at this point, but I plan to go through and tag contents and break it up into thematic collections, so check back if you’re interested in digital preservation of digital humanities publications, copyright, piracy, or scholarly communication.


Semantics and Siloization: A Few Thoughts on DH BeNeLux


This morning I posted a  to the , outlining some of the major themes and takeaways of the conference.  This leaves the choicer, more granular bits to be lolled about here on my personal blog.  So then, this post is meant to provide a deeper understanding of the conference, but is also really an avenue for me to work through a few concepts that have been sitting at the edge of my mind for the past week.

So first, to contextualize – the DH BeNeLux conference, in its first year, is a collaboration between various cultural heritage organizations and research centers in Belgium (Be), the Netherlands (Ne), and Luxembourg (Lux) and aims to foster a sense of local community within the larger context of the digital humanities.  While there seem to be a lot of meetup groups in various parts of the world, this multi-country collaboration, to me, felt fresh.  Because there were still researchers from very different cultures in attendance, the more international issues like conference language were broached; however, unlike a more international conference like , the use of English seemed a lot less justifiable as none of the organizing institutions utilize the language officially (to my knowledge).

The conference raised a lot of interesting points overall, and even the final panel discussion felt more like the beginning of a conversation rather than the end.  So with this fact in mind, here are a few discussion starters that I grasped from the various DH BeNeLux sessions I attended.

1. Digital Humanities is not digital humanities is not the humanities AS digital.

Whitacre’s Virtual Choir 3

This is a really important point  that was apparent in a lot of the breakout sessions, but truly laid bare by .  A faculty member at Leiden University College in The Hague, Fu’s (specifically highlighting the Virtual Choir work of , which should be watched and rewatched in surround sound) was a resounding hit as part of the Day One session on Crowdsourcing; however, it was actually her comments in a few other presentations that really put forth the need for a deeper discussion of semantics.

During a presentation by Niels-Oliver Walkowski on the  (as part of the generically-titled About DH session), Fu engaged Walkowski in an interesting discussion of his use of ‘digital’, in which she asserted that he was using the word predicatively, rather than attributively.  She again brought up a similar point during the final panel discussions, though apropos to what I don’t remember.  Her assertion, though a bit difficult to engage with outright, was something I wrote in my notebook as a point I wanted to return to.  Now that I’ve had time to roll it around in my brain (and refamiliarize myself with the grammatical terms – thanks, Google), I think it’s an important thing to talk about.  How do we as digital humanists (capital D capital H) differentiate from humanists working with the digital… or do we?

Further, what constitutes a digital humanities project?  Looking over some of the projects coming out of the top digital humanities institutes around the world even, I can’t help but often wonder “isn’t that just a digital archive?”  Speaking to a traditional humanist about this issue, he asserted that a lot of digital humanities projects seem like “humanities with maps”, often with a questionable methodology.  While that’s just one scholar’s understanding of the field, it really points to the need to pin down a semantic understanding of ‘digital’ in this context.  Or not.  But at the very least we need to illucidate what ‘digital humanities’ means in the context of our own projects, so that we’re not all being lumped into the same, often wrong, box.

2. External factors need to be better addressed in terms of their impact on the field.

Albert Meroño-Peñuela presenting on the Short Title Catalogue, Netherlands during the Linked Open Data breakout.

Albert Meroño-Peñuela presenting on the Short Title Catalogue, Netherlands during the Linked Open Data breakout.

One of the most interesting presentations I saw during the conference was that of Alastair Dunning, who discussed the in digital humanities projects, asserting that copyright has, to a large extent, excluded more recent works from being studied in the field.  As a copyright nerd, this was one of those ah-ha moments, where something so obvious was laid out so succinctly that I had to wonder why I had never considered it before.  The data from Dunning’s study of some of the top DH research centers (which can be viewed in his ) is very straightforward, and I’ll be interested to see what comes of it.

Another issue worth mentioning here is something that Max Kemman talks about in his post on the conference, and that is the fact that.  The heavily-attended Linked Data breakout on Friday provided some insight into how this is changing, but still there are issues (e.g., audience members asking how a project would map to other existing LOD projects like , and presenters not having a plan in action).  , librarian at Universiteit Gent, did address this issue in the final panel by suggesting that libraries were the place where desiloization occurs, but this raises yet another discussion point…

3. Digital humanities ≠ digital preservation, and we need to figure that out.

A common question in most of the breakouts that I attended was “What are you planning to do with the data once [insert techie project name here] is complete?”, and a common answer was that other researchers will figure that out once the data is published.  While it’s one thing to leave interpretation up to the scholarly masses, it’s a whole different issue that many (though not all) digital humanists leave long-term preservation of digital projects up to other entities like libraries to solve at the end of the project cycle.  As a digital preservationist myself (and now as someone studying the long-term sustainability of digital scholarly editions, har har) I can feel my blood start to boil just now.  Preservation strategies need to be implemented at the beginning of a project and adapted as it goes, otherwise the library is simply going to become a place where DH projects go to die.

This is probably a good stopping point, before the entire post becomes a heated discussion of best practices in digital preservation.  Suffice it to say that the conference was amazing for generating great ideas and discussions, allowed researchers to share final and mid-cycle projects and receive feedback, and also the conference swag was highly notable.


Digital Preservation 101: Undertaking a File-Level Inventory for Preservation Planning

This entry was originally posted on January 13, 2014.  It outlines the institution-wide file format inventory that I undertook at Dumbarton Oaks as part of my work as a Library of Congress National Digital Stewardship Residency fellow.

In order to develop a better understanding of the holdings at Dumbarton Oaks as part of my NDSR project, I have been working on a file-level inventory that can hopefully be embedded in a digital preservation workflow process at DO in the future.

The benefits of an inventory are manifold, but these are a few that I highlighted in a recent presentation (all originally adapted from ):


The inventory basically tells us what we have, how much we have, where we have it, and most importantly, what user behaviors surround the creation and management of digital assets.  Keep these goals in mind as you are working, because undertaking a file-level inventory won’t be easy.  There really aren’t a lot of tools out there, and the ones that are there require a pretty solid base of technical knowledge.

The two tools I decided to try out were JHOVE2 and DROID.

The first was my main focus, as .  On top of this, JHOVE2 includes validation of files, which is an added bonus when compared to DROID.

Drawbacks of JHOVE2, however, were pretty insurmountable in my project implementation.  They included the need to run now-outdated Java 6, and the lack of a GUI.


Command line, anyone?

The main problem that I ran up against with JHOVE2, however, wasn’t the actual implementation (all of the basic commands needed are outlined in the handbook, so even a relative novice can run it), but rather the reporting.  After going through all of the steps, the tool was spitting out a massive jumble of text that I was unable to make out.  After consulting the forums and trying our in-house IT specialist at Dumbarton Oaks, I had committed too much time to JHOVE2 and still couldn’t process the inventory reports and so I decided, for the sake of moving the project forward, that I would go with DROID instead.

The most recent version, , is a lot more accurate than older versions.  The install is incredibly easy (for Windows: download ZIP file, unzip, run BAT file, done).

The interface is also a whole lot prettier than JHOVE2:


But of course, there were still (mysterious) problems.


Beyond the occasional crash, the tool’s output is fairly readable, especially if you pre-read the user guide referenced above.

Here’s a small example of what the final reports look like:


DROID is helpful for identifying preservation issues, like the  above.  The report also provides information like MIME type (to get a top-level idea of general types of media), date last modified (I found this really helpful for determining whether a drive was full of archival assets or everyday files), and file format and size.

While I tried out these two tools, there are other possibilities to check out.  See this .  Some all-in-one preservation tools like also integrate file inventorying, sometimes referred to as preservation planning.