Featured post

National Digital Stewardship Residency in New York

Metropolitan New York Library Council, in partnership with Brooklyn Historical Society, is implementing the National Digital Stewardship Residency (NDSR) program in the New York metropolitan area through generous funding from the Institute of Museum and Library Services (IMLS) via a 2013 Laura Bush 21st-Century Librarian Program grant.

The National Digital Stewardship Residency is working to develop the next generation of digital stewardship professionals that will be responsible for acquiring, managing, preserving, and making accessible our nation’s digital assets. Residents serve 9-month, paid residencies in host institutions working on digital stewardship initiatives. Host institutions receive the dedicated contribution of a recent graduate that has received advanced training in digital stewardship.

An affiliated project, NDSR Boston, was awarded by IMLS to Harvard Library, in partnership with MIT Libraries, to implement the fellowship program in Boston, MA. Hosts and residents from both programs will participate in the broader NDSR network of professionals working to advance digital preservation and develop a sustainable, extensible model for postgraduate residencies combining advanced training and experiential learning focused on the collection, preservation, and availability of digital materials. See the About NDSR-NY page for additional program and contact information.

Science: The Final Frontier (of digipres)

Science: the final frontier. These are the voyages of Vicky Steeves. Her nine-month mission: to explore how scientific data can be preserved more efficiently at the American Museum of Natural History, to boldly interview every member of science staff involved in data creation and management, to go into the depths of the Museum where none have gone before.

Hi there. Digital preservation of scientific data is criminally under-addressed nationwide. Scientific research is increasingly digital and data intensive, with repositories and aggregators built everyday to house this data. Some popular aggregators in natural history include the NIH-funded GenBank for DNA sequence data and the NSF funded MorphBank for image data of specimens. These aggregators are places where scientists submit their data for dissemination and act as phenomenal tools for data sharing, however they cannot be relied upon for preservation.

AMNH scientists at work in the scorpion lab: http://scorpion.amnh.org/page19/page23/page23.html

AMNH scientists at work in the scorpion lab: http://scorpion.amnh.org/page19/page23/page23.html

Science is, at its core, the act of collecting, analyzing, refining, re-analyzing, and reusing data. Reuse and re-analysis are important parts of the evolution of our understanding of the world and the universe, so to carry out meaningful preservation, we as the digital preservationists need to equip those future users with the necessary tools to reuse said data.


Therein lies the biggest challenge of digital preservation of scientific data: the very real need to preserve not only the dataset but the ability to deliver that knowledge to a future user community. Technical obsolescence is a huge problem in the preservation of scientific data, due in large part to the field-specific proprietary software and formats used in research. These software are sometimes even project specific, and often are not backwards compatible, meaning that a new version of the software won’t be able to open a file created in an older version. This is counter-intuitive for access and preservation.


an example of some obsolete databasing software; popular back in the day but not widely used today. 

Digital data are not only research output, but also input into new hypotheses and research initiatives, enabling future scientific insights and driving innovation. In the case of natural sciences, specimen collections and taxonomic descriptions from the 19th century (and earlier) are still used in modern scientific discourse and research. There is a unique concern in digital preservation of scientific datasets where the phrase “in perpetuity” has real usability and consequence, in that these data have value that will only increases with time. 100 years from now, scientific historians will look to these data to document the processes of science and the evolution of research. Scientists themselves will use these data for additional research or even comparative study: “look at the population density of this scorpion species in 2014 versus today, 2114, I wonder what caused the shift.” Some data, particularly older data, aren’t necessarily replicable, and in that case, the value of the material for preservation increases exponentially.

So the resulting question is how to develop new methods, management structures and technologies to manage the diversity, size, and complexity of current and future datasets, ensuring they remain interoperable and accessible over the long term. With this in mind, it is imperative to develop an approach to preserving scientific data that continuously anticipates and adapts to changes in both the popular field-specific technologies, and user expectations.

Open Science

Published data aren’t the end-all-be-all of digipres for science. There are a lot data that need our help! http://www.opensciencenet.org/

There is a pressing need for involvement by digital preservationists to look after scientific data. While there have been strides made by organizations such as the National Science Foundation, Interagency Working Group on Digital Data, and NASA, no overarching methodology or policy has been accepted by scientific fields at large. And this needs to change.

The library, computer science, and scientific communities need to come together to make decisions for preservation of research and collections data. My specific NDSR project at AMNH is but a subset of the larger collaborative effort that needs to become a priority in all three fields. It is the first step of many in the right direction that will contribute to the preservation of these important scientific data. And until a solution is found, scientific data loss is a real threat, to all three communities and our future as a species evolving in our combined knowledge of the world.

I will leave you, dear readers, with a video from the Alliance for Permanent Access conference in 2011. Dr. Tony Hey speaks on data-intensive scientific discovery and digital preservation and exemplifies perfectly the challenges and importance of preserving digital scientific research data:

Capturing a Shadow: Digital Forensics Applications with Born-Digital Legacy Material

Hi – Julia here. Like Shira, I just spent the last week attending the  Association of Moving Image Archivists (AMIA) conference in Savannah, Georgia.  AMIA brings approximately 500 moving image professionals and students from all over the world for a wide variety of workshops, panels, and special screenings.  While it’s too much to cover in depth, those of you unfamiliar with the conference can check out the program as well as another blog post by NDSR Boston resident Rebecca Fraimow.

As part of my conference experience, I chaired and moderated a panel on digital forensics applications with personal collections with speakers Elizabeth Roke, Digital Archivist at Emory University, and Peter Chan, Digital Archivist at Stanford University.  The work of both directly correspond to my projects at NYU Libraries where I’ve been tasked with developing the infrastructure, policy, and workflows to preserve and make accessible born-digital collections. One of the major collections I”m working on are the Jeremy Blake papers and his “time-based paintings.”  I’ll detail my progress in this area in my next post.

This panel was the first of its kind to be introduced to the AMIA community. There had been no previous discussion on digital forensics concepts, use cases, or projects, making this panel a unique experience as both a moderator and a community member.  Digital forensics has its origins in the legal and criminal investigative worlds, but its support of archival principles such as provenance, chain-of-custody, and authenticity, have driven its recent adoption in the archives.

While digital forensics is an emerging field,  both Emory and Stanford were among the first universities to create forensics labs and equipment to process backlogged obsolete born-digital media.  Much rarer, both Emory and Stanford do not stop at ingest.  They’ve both processed collections now accessible to researchers.


Elizabeth Roke, Emory University

Elizabeth Roke began her presentation with an introduction to digital forensics concepts and workflows. She strongly emphasized, however, that there is a huge gap between ideal workflows and the reality of born-digital processing. This is a theme that was returned to throughout the panel and Q&A. She often, for example, finds herself processing media with little to no documentation. File names can be baffling and nondescript (“ ,,,,,,.doc”).  Chain of custody may already be broken by well-meaning individuals who have copied files over, permanently altering their time stamps.  Additionally, Elizabeth stressed that disk imaging itself–the initial process of refreshing the media into a more actionable format–could take a lot of time, effort, and experimentation, and was not always successful.


Elizabeth also updated us on Emory’s seminal work on Salman Rushdie’s 4 personal computers, as well as some of the recently processed, less complicated collections, such as the Alice Walker papers.  Preserving and providing access to the Rushdie computers involved significant dedicated staff time because of both the high level of technical requirements involved in full-scale personal computing emulation, and because of the number of nuanced access levels and restrictions.  Few institutions can dedicate the resources to make such a project happen.  While the earliest Rushdie computer was emulated and accessible in February 2010, the remainder of the Rushdie born-digital collections is not yet accessible.  She contrasted that immense project with The Alice Walker papers, a collection of word processing files on floppy disks with comparatively few restrictions.


Peter Chan, Stanford University

Peter Chan then jumped in via skype to discuss and demo ePADD (email: Process, Accession, Discovery), a Stanford project specifically tackling email processing that is scheduled for an April 2015 release to the public (NYU Libraries is a beta-tester on this project).  One of the great things about ePADD  is that it easily addresses a major stumbling blocks in born-digital access: personal and private information extraction.  ePADD extracts text for key word searches to cull sensitive content such as those based on health, social security numbers, credit card numbers, and any other topics deemed private by a donor.  While you can read much more about ePADD in a recent post, one of the fun aspects  are the visualizations possible through ePADD.  ePADD mines  email records and can create cool visualizations displaying words usages and corresponandances over time, as seen below with the Robert Creeley papers image (courtesy of Peter):



“The Roke and Chan virus gap”

At the end of the panel, audience members asked some interesting questions highlighting  different institutional responses.  For example, how do each of the institutions handle viruses? Are they worth preserving despite the risks?  Stanford preserves the whole image, virus and all. Emory excludes viruses from ingest.  Emory, it turns out, also preserves the disk image with viruses, but excludes viruses from any exported files.  Some institutions may exclude viruses from ingest.  This is only one example of differences in not only methods, but values and evaluations determining what are objects of study for future researchers.    One person’s context is another person’s text.  At this point, we can all only speculate; the researchers aren’t there.

While my task is to develop policies addressing issues like this, I’m not sure what we’ll be doing in this area!   In my preliminary surveys,  I can already tell that preserving and making accessible Jeremy Blake’s work, for example, will present for a whole other set of considerations due to its artistic context.  Determining the essential qualities will present challenges that a text-based record, for example,  wouldn’t.  I’ll blog more about that in my next post!


AMIA 2014: Open Source Digital Preservation & Access Stream

Hi everyone, Shira here. Last weekend I attended the Association of Moving Image Archivists Conference in Savannah, GA. For those of you who don’t already know, AMIA is a nonprofit international association dedicated to the preservation and use of moving image media. Although the conference has traditionally focused more on issues surrounding the preservation of analog film and video, in recent years it has brought the subject of digital preservation to the fore.

This year was no exception. One of the three curated programming streams that comprised this year’s AMIA conference was devoted to addressing the open source software in use within the digital preservation community, and by my count, 22 of 53 panels (~42%) were directly related to digital preservation. There was also significant buzz around the projects that were developed as part of the second annual Hack Day, the goal of which is to design and/or improve upon practical, open source solutions around digital audiovisual preservation and access.

Although enough ground was covered at AMIA to provide days’ worth of blogging fodder, my post today is going to focus on the Open Source Digital Preservation & Access stream, which served as a showcase for a variety of exciting tools for the preservation and access of digital video and born-digital moving images.

AMIA swag bags

AMIA swag bags

What does “open source” mean, and why is it important for digital preservation?
Open source software is made available with a license that permits users to freely run, study, modify, and re-distribute its source code. Open source tools are usually developed in a public, collaborative forum such as GitHub or Gitlab. This means that any users can improve, fix, or add onto the source code over time.

Using the open source model to develop tools for digital preservation has a number of advantages over proprietary software. The principal benefit is cost; most open source tools are available to the public at no cost, which is a big deal for many perennially cash-strapped organizations in the archives community. Another benefit is that the open source model is versatility. Making the source code available to the public allows tools to be perpetually refined and modified according to the needs of a particular group, and as Bill LeFurgy explains in a 2011 blog post on The Signal, open source also, “gives organizations the opportunity to stitch together a preservation system from existing components rather than laboriously start from scratch.” The last benefit I’ll mention here is that open and freely accessible code is far simpler to preserve than closed proprietary code, making it more likely that open source tools themselves will be around in the future.

#osdpa / #amia14
In case you missed the AMIA conference but still want to look into some of the things that were discussed there, you can always play catch-up by following the #osdpa and #amia14 hashtags on Twitter. AMIA has a contingent of active tweeters (myself included), and the good news is that the intrepid Ashley Blewer has put together a twarc archive of tweets from this year’s AMIA conference in JSON and text formats (available here).

She’s also put up a list including links to additional information of all the technologies discussed during the conference, either within the Open Source Digital Preservation & Access Stream, at the Hack Day, or at-large. It’s an extremely valuable resource and I highly recommend giving it a look.

Trevor Thornton, "Open Source Tools, Technologies and Considerations" panel. Photo courtesy of Kathryn Gronsbell

Trevor Thornton talking as part of the “Open Source Tools, Technologies and Considerations” panel. Photo courtesy of Kathryn Gronsbell

Interview with Chris Lacinak
Although following the #osdpa hashtag will give readers a good sense of the talks that comprised the open source stream, I wanted to offer some background information on the stream and why it was put together. Before leaving Savannah I spoke with AVPreserve’s Chris Lacinak, who curated the Open Source steam. Our conversation is below:

Shira Peltzman (SP): How did the open source stream come into being?

Chris Lacinak (CL): Last year the Digital Library Federation (DLF) and AMIA sponsored the first AMIA/DLF Hack Day and there was a lot of excitement within AMIA about that event. Thanks to the hard work of Kara Van Malssen, Lauren Sorenson, and Steven Villereal it was a big success and was perceived well, not just by the people that took part, but by the rest of the membership at large; it gave them an opportunity to engage with that type of event and see what kinds of things happen there. The energy and buzz that came out of the first Hack Day was great. I was the AMIA board liaison to the Open Source Committee and at the committee meeting the members made it clear that they wanted open source to be a greater part of AMIA. We talked about ways to make that happen and one of the ideas was to have an Open Source stream as part of the conference. I went back and pitched it to the board and they were very much in favor, asking me to be the stream’s curator.

SP: I noticed that many of the panels at the Open Source stream were standing room only. Why do you think this stream was so popular?

CL: Standing room only, and that was accidental! Originally the room we were in was supposed to be half the size that it actually was. Clearly we touched a nerve within the membership. Software has become an integral component of digital preservation practice. Open source software has been embraced wholeheartedly by the archival community largely based on preservation principles as well as budgetary considerations. However, there is still a lack of clarity regarding the process and component parts that make up open source software projects, and people are really interested to get their heads around this. It was also interesting that lots of the people filling the seats had minimal experience with open source or Hack Days or anything like that. So a lot of people were new but were clearly enthralled with the topics and types of tools they were seeing. The other thing is that obviously both digital preservation and access are huge things; people are hungry for content around digital media in general so this served both the digital preservation and access folks and also the open source interest.

Inaugural session of the Open Source Digital Preservation & Access stream at AMIA 2014

Inaugural session of the Open Source Digital Preservation & Access stream at AMIA 2014

SP: Are there any trends that you noticed this year among the presentations in the Open Source stream?

CL: I think that both the tools and the understanding have reached a level of maturity that’s very real now. Lots of conversations on open source within the community a few years ago were painted in an experimental light, as if it were something for folks on the fringe and not for “real” archives. Now it feels very central and real. In the stream presentations I really noticed, one, the sophistication of the tools, two, all of the presenters were very articulate and three, the audience was able to receive and process the information in a way that hasn’t happened in the past. So I think there’s definitely a maturity within this ecosystem that’s new and interesting and let’s operate on a different level than in the past.

SP: What are some of the open source projects that you’re most interested in seeing developed in the coming year that were discussed at Hack Day and during the open source stream?

CL: First and foremost I want to point everyone to the Hack Day wiki which also has links to all of the Hack Day projects on GitHub. (Find this here).

All of the projects are really amazing and deserve being reviewed. The prize winning projects this year were Video Sprites! and Hack Day Capture, with AV Artifact Atlas getting a special jury prize. Personally I would encourage people to take a look at the Video Sprites!, the Video Characterization Comparison Viewer and the ffmpeg Documentation projects. Speaking of a documentation project, another thing that I loved this year was the addition of an Edit-A-Thon (part of AMIA/DLF Hack Day), because what you find is that a large number of open source tools are poorly documented, seriously limiting their usability. Documentation projects really answer a huge need, so I think it’s great to have gotten this done. It’s extremely important and valuable work.

In the stream, QC Tools is really exciting. I think it’s an amazing tool with a feature set that rivals commercial offerings, and I’m excited to see that continue to grow, have new features added, and be used by more organizations. I am interested in combining QC Tools with our tool, MDQC in one package. Erik Piil gave a lightning talk on an open hardware cleaner he’s working on which is awesome. And on a larger scale, MoMA’s development of the first digital preservation repository for museum collections is a phenomenal project. It’s hard to pick though because they really are all great. If I didn’t think they were awesome I wouldn’t have picked them for the stream! The entire stream will be posted online for those that are interested in watching the presentations. Keep an eye on the AMIA website.

SP: Well congrats on a really successful stream.

CL: Thank you.

What I learned at summer camp: A trip to THATCampPhilly

Karl here. Last month I made the trip back down to my hometown of Philadelphia to attend THATCampPhilly, a day of meet-ups among technologists and humanists organized under the broad supervision of local digital humanities interest group PhillyDH. In the “unconference” model, a handful of technical workshops were prearranged, but the event was open to proposals for less formal talks and discussions leading right up to and throughout the morning convocation, and provided attendees opportunities to network over lunch, cocktails, and dinner in Philadelphia’s historic Old City neighborhood. Over 100 attended the conference, including representatives from the city’s central anchor institutions like the University of Pennsylvania, Drexel University, and our host, the Chemical Heritage Foundation (CHF), as well as those from regions as peripheral as Delaware, DC, and the Lehigh Valley.

Attendees propose, combine, and arrange unconference sessions during the morning convocation

Attendees propose, combine, and arrange unconference sessions during the morning convocation

In the morning I attended an introductory training in APIs for cultural heritage organizations. Nabil Kashyap, Librarian for Digital Initiatives and Scholarship at Swarthmore College, taught attendees the very basics of JavaScript and how to use it to query the vast metadata store compiled by the Digital Public Library of America (DPLA). Using just our laptops, their text editors and browsers, we learned how to identify ourselves to DPLA’s API, how to structure and send a request for its data, and ultimately how to interpret and render what we receive back for use on our organizations’ web instances. While I had used JS very briefly for web design, this was my first experience issuing proper cURL and Ajax requests with a command line interface. Getting all of the ~20-25 participants up to this same speed was tricky of course; the myriad contextual conditions necessary to consistently render everyone’s results pretty neatly reflected the problem of capturing the same kind of dynamic script-generated content that my host institution NYARC needs to archive regularly.

Participants in Nabil Kashyap’s API workshop learn some basic scripts

Participants in Nabil Kashyap’s API workshop learn some basic scripts

In the afternoon I attended a video production workshop led by Nicole Scalessa, IT Manager at the Library Company of Philadelphia. Having a background in film industry standard hardware and software resources, I was eager to learn for the first time about low- and no-cost alternatives feasible to non-profit ventures and others on tight budgets. Scalessa led participants in hands-on planning and budgeting for social media campaigns that rely heavily on audio and video, and walked us through the process of creating web quality products with the free tools like the audio editing suite Audacity and video editing platform Lightworks, among others. As I get more comfortable with these software tools in particular, through practice and through a little application here on the blog, I’d love like to coordinate a similar education session in New York, so let me know if you’re interested!

Unconference sessions
In the morning I attended an impromptu discussion group considering the persistent issue of leading digital humanities projects without formal institutional support–no support, as session facilitator Jerry Yarnetsky, Emerging Technologies Librarian at Montgomery County Community College, framed it. This quickly became a discussion of participants’ personal experiences with low/no-cost software for content management and/or data storage. Several of us used the opportunity to remind our peers of the importance of investing in software and services that support exporting our previous digital objects to open source or at least industry standard file formats and wrappers–of having an ‘exit strategy’ for our data.

When I asked for examples, a few participants shared their experiences seeking/achieving buy-in from bodies outside of their institutions, but for the most part there was ambivalence about organizations or projects that enable resource sharing among institutionally under-supported efforts.

Suppose we were to build a hub for digital humanities resource sharing in the Mid-Atlantic region from scratch–what would it look like? Maybe something like METRO, I thought, but I proposed an unconference session for the last timeslot of the day to brainstorm exactly what such an organization would provide. About 25 attendees participated in the discussion. In general, they expressed their appreciation for grassroots and spontaneous learning opportunities that relate directly to their active projects, connect them with resourceful professionals within the digital humanities network, and promote alternative forms of credentialing beyond the PhD/MLS. They also, however, felt constrained by geographic dispersion, lack of time to plan or attend more events, and the perceptions of digital humanities both within their institutions and in our own community when it lacks a compelling business case or ‘value story.’ A desire to better understand the full community of institutions and practitioners that fall under the regional digital humanities umbrella and their respective skills–a thorough needs and resources assessment–abounded.

My live whiteboard scribblings from the discussion of existing and aspirational resources for the local digital humanities community

My live whiteboard scribblings from the discussion of existing and aspirational resources for the local digital humanities community

Of course I couldn’t let the day pass without connecting with others striving to steward web culture to future generations with NYARC’s own chosen software service, Archive-It. Over lunch, archivists from CHF, Temple University, and the Presbyterian Historical Society shared with me the technical challenges that they regularly face and their plans to organize locally for more collaborative solutions. Again, you can expect to see something similar pop up in New York very soon!

The day ended with a lovely reception at Old City’s historic Physick House, where the combination of the day’s fun and a few adult beverages left everyone inspired and optimistic about future collaborations. The conversations started at the above workshops, discussions, and hallway encounters continue through rapidly circulating Google docs and on Twitter (#THATCampPhilly), so like any good summer camp experience, everyone made a lot of new friends and came away a good bit wiser.

Hello, World!

Welcome to the brand new NDSR-NY residents’ blog! We are Vicky Steeves, Peggy Griesinger, Shira Peltzman, Karl-Rainer Blumenthal, and Julia Kim. We’ll use this space to share updates from our project work, our personal research, and from all four corners of the ever-expanding globe of digital stewardship. Preserving digital culture becomes more challenging and more necessary by the minute, so we’re very excited to share and to hear your thoughts about some truly innovative practices, tools, and resources.

Residents' IDs

Hi! My name is Vicky (@VickySteeves), and I’m the resident at the American Museum of Natural History. I am surveying digital assets in the scientific research divisions, then writing a recommendation for digital preservation of said assets. My blog posts will deal with my time at AMNH, digital preservation for scientific data, and emerging tech that affects the field. When I’m not wandering around AMNH, I’m cuddling with my cat, Little Boss, and crafting.

This is Peggy (@peggygriesinger), the resident at the Museum of Modern Art. My project involves researching and identifying schemas, ontologies, and controlled vocabularies to describe the process history of the digitization of audiovisual materials. This will then be integrated into the museum’s new digital repository to allow for better preservation of the museum’s collections. My blog posts will detail my work at MoMA, interesting findings from my research, and my experiences at conferences and workshops. In my free time, I enjoy reading historical fiction and baking cupcakes.

Shira (@shirapeltzman) here. I’m the resident at Carnegie Hall. During my time there I’ll design workflows for the acquisition, storage, and long-term management of born-digital assets; configure and implement a new Digital Asset Management System; and create an inventory of born-digital assets. These will ultimately inform requirements for the long-term preservation and sustainability of digital objects at Carnegie Hall. In addition to project updates, I’ll blog about conferences I attend, readings germane to digital preservation, and some of the tools and strategies that I’ll be using to complete my project. When I’m not at work I’m usually hiking somewhere upstate, riding my bike, or at the movies.

Hi, everyone. I’m Karl (@landlibrarian). I’ll spend my residency designing and implementing quality assurance, preservation metadata, and archival storage practices for the web archiving program of the New York Art Resources Consortium (NYARC). The project highlights two key aspects of digital stewardship that inspire me: the problem of defining integrity and authenticity among dynamic objects, and the leadership that multi-institutional collaborations provide on issues so complex. I’ll write about both while I’m not taking in New York’s great galleries, parks, film screenings, or just Skyping home to my cats.

Hi, I’m Julia Kim (@jy_kim29). I’ll be at NYU Libraries developing  infrastructure, policy, and workflows to make born-digital archival collections discoverable and accessible to researchers.  This encompasses both file-based material as well as material on obsolete media requiring digital forensics adaptations.  Two of the major collections that we are looking at in connection with Fales Library and Special collections are the Jeremy Blake papers and the Exit Art Archives.  My blog posts will cover my work here, conferences and workshops, and readings and related topics of interest.  When I’m not in the lab, I also enjoy obsolete audio visual media digitization with the XFR Collective.

We hope that this space will become a useful sounding board for digital preservationists and archives enthusiasts. We’ll post updates a few times every week, so check back early, often, and share your thoughts with us.

Welcome the First NDSR-NY Cohort

Welcome the first NDSR-NY cohort! Each resident demonstrated a strong commitment to digital stewardship, along with exceptional talent and skill. More about each resident can be found on the Residents page.


From left to right:

Victoria (Vicky) Steeves, Master of Library and Information Science, Simmons College
Host institution: American Museum of Natural History

Peggy Griesinger, Master of Library Science, Indiana University
Host institution: Museum of Modern Art

Karl-Rainer Blumenthal, Master of Library and Information Science, Drexel University
Host institution: New York Art Resources Consortium

Shira Peltzman, Master of Arts, New York University
Host institution: Carnegie Hall

Julia Kim, Master of Arts, New York University
Host institution: New York University Libraries