Digital Preservation 101: Undertaking a File-Level Inventory for Preservation Planning

This entry was originally posted on January 13, 2014.  It outlines the institution-wide file format inventory that I undertook at Dumbarton Oaks as part of my work as a Library of Congress National Digital Stewardship Residency fellow.

In order to develop a better understanding of the holdings at Dumbarton Oaks as part of my NDSR project, I have been working on a file-level inventory that can hopefully be embedded in a digital preservation workflow process at DO in the future.

The benefits of an inventory are manifold, but these are a few that I highlighted in a recent presentation (all originally adapted from ):


The inventory basically tells us what we have, how much we have, where we have it, and most importantly, what user behaviors surround the creation and management of digital assets.  Keep these goals in mind as you are working, because undertaking a file-level inventory won’t be easy.  There really aren’t a lot of tools out there, and the ones that are there require a pretty solid base of technical knowledge.

The two tools I decided to try out were JHOVE2 and DROID.

The first was my main focus, as .  On top of this, JHOVE2 includes validation of files, which is an added bonus when compared to DROID.

Drawbacks of JHOVE2, however, were pretty insurmountable in my project implementation.  They included the need to run now-outdated Java 6, and the lack of a GUI.


Command line, anyone?

The main problem that I ran up against with JHOVE2, however, wasn’t the actual implementation (all of the basic commands needed are outlined in the handbook, so even a relative novice can run it), but rather the reporting.  After going through all of the steps, the tool was spitting out a massive jumble of text that I was unable to make out.  After consulting the forums and trying our in-house IT specialist at Dumbarton Oaks, I had committed too much time to JHOVE2 and still couldn’t process the inventory reports and so I decided, for the sake of moving the project forward, that I would go with DROID instead.

The most recent version, , is a lot more accurate than older versions.  The install is incredibly easy (for Windows: download ZIP file, unzip, run BAT file, done).

The interface is also a whole lot prettier than JHOVE2:


But of course, there were still (mysterious) problems.


Beyond the occasional crash, the tool’s output is fairly readable, especially if you pre-read the user guide referenced above.

Here’s a small example of what the final reports look like:


DROID is helpful for identifying preservation issues, like the  above.  The report also provides information like MIME type (to get a top-level idea of general types of media), date last modified (I found this really helpful for determining whether a drive was full of archival assets or everyday files), and file format and size.

While I tried out these two tools, there are other possibilities to check out.  See this .  Some all-in-one preservation tools like also integrate file inventorying, sometimes referred to as preservation planning.