Featured post

National Digital Stewardship Residency in New York

Posted on by

Metropolitan New York Library Council, in partnership with Brooklyn Historical Society, is implementing the National Digital Stewardship Residency (NDSR) program in the New York metropolitan area through generous funding from the Institute of Museum and Library Services (IMLS) via a 2013 Laura Bush 21st-Century Librarian Program grant.

The National Digital Stewardship Residency is working to develop the next generation of digital stewardship professionals that will be responsible for acquiring, managing, preserving, and making accessible our nation’s digital assets. Residents serve 9-month, paid residencies in host institutions working on digital stewardship initiatives. Host institutions receive the dedicated contribution of a recent graduate that has received advanced training in digital stewardship.

An affiliated project, NDSR Boston, was awarded by IMLS to Harvard Library, in partnership with MIT Libraries, to implement the fellowship program in Boston, MA. Hosts and residents from both programs will participate in the broader NDSR network of professionals working to advance digital preservation and develop a sustainable, extensible model for postgraduate residencies combining advanced training and experiential learning focused on the collection, preservation, and availability of digital materials. See the About NDSR-NY page for additional program and contact information.

Party on, AMNH!

Posted on by

Hello everyone! Vicky here to bring you some holiday cheer. I thought, since this is our last post before Hanukkah, Yule, Life Day, Festivus, Kwanzaa , Pancha Ganapati, Soyal, the Dongzhi Festival, Christmas, Newtonmas, Boxing Day, Omisoka, and New Years, I could wind down a busy few months by talking about the American Museum of Natural History party season!

Just about every day of the week, starting from the 10th of December to the 19th, there is a party at the AMNH. Each department has their own parties, some are small and attended mostly by people within the department; others are all staff events with food, drinks, and music.

The Library kicked off the party season this year, with probably 50+ people eating and drinking in the reading room (it’s only one night of the year, librarian friends who are cringing!) as the night went on.  This was a great opportunity for me to better get to know many of the scientists that I’ve interviewed for my NDSR project in a more informal environment.

Friday the 12th was the day of the physical sciences party. Since it’s one of the better slots for parties, the Rose Center was absolutely packed. What usually sets the physical sciences party apart from others is the high probability of seeing some science celebrities, since it is held in the same wing as Neil deGrasse Tyson’s office.

For me, the first celeb sighting of the night was Bill Nye the Science Guy! I walked by Neil deGrasse Tyson’s office on the way to the bar/food room, and looked in hoping for a quick look look at NDT himself, and to assess the number of people at the party. To my surprise, I saw Bill Nye in there dancing!! I promptly freaked out to my boss but kept moving as the office was way too crowded for me to get in.

Bill Nye & I

BILL BILL BILL BILL BILL BILL BILL

Later that night, as I was refilling my drink, in Bill walked to get some dinner. I saw him bopping around the table, getting some pasta and salad, and waited until he was done to approach and ask for a picture. He was so sweet and immediately agreed! He told me on “the Science Guy show,” they had 12 GB of digital data that were constantly being fanned and air conditioned. In his words, “it was state of the art technology.”

After I got through the crowds and saw a lot of the scientists I interviewed for my project here, I made it into Neil deGrasse Tyson’s office thanks to my security guard friend Jamiel, who is tight with Dr. Tyson. He introduced me to NDT, and asked if I could get a picture with him. Dr. Tyson replied “only if she asks me.” I was so struck I immediately stuttered out “if you don’t mind, Dr. Tyson!” And he turned to take a picture with me. As we opened a bottle of wine together, I told him about my project and digital preservation, which was absolutely incredible. He was obviously supportive of anything preserving science data. He even took a picture with my boyfriend later! Such a good sport.

Neil deGrasse Tyson & I in his office at the AMNH.

Neil deGrasse Tyson & I in his office at the AMNH.

I have to say, the AMNH is absolutely the best place I’ve ever worked. Everyone I’ve met here has been nothing but gracious and my work is everything I’ve wanted to do since I was a kid. However, perks like getting to meet Bill Nye the Science Guy and Neil deGrasse Tyson make this job all the sweeter.

Until our next posting, happy holidays to all you fabulous readers!

Catching up with Rebecca, Boston NDS Resident at WGBH

Posted on by

 

Screen+Shot+2014-10-06+at+7

Last week, I was able to catch up with my good friend and colleague Rebecca Fraimow. Rebecca is an NDS Resident in Boston at WGBH.  I’ve included some transcript excerpts from our conversation below:

Julia: it’s been about 3 months! You’re about a third of the way with your project! How do you feel?

Rebecca:  That’s pretty difficult to believe.

Julia: How do you feel? Do you have a good grasp of what’s going to happen come May? What do you think?

Rebecca: Mostly I feel like I have a better sense as far as my project goes. What happens when it’s May…I have no idea! As far as doing my work right now, I feel like what these last three months have really prepped me to do is figure out what kind of questions to ask my mentors.

Julia: I know a big part of what you’ve been talking about is your LTO [Linear Tape-Open] failure? How does that fit in with your project and how are you exploring that as a research project? How are you framing that question?

Rebecca: Well that’s a question…it’s sort of a part of my project that’s come up. It wasn’t originally designed as a part of my project because, obviously, WGBH didn’t know they were going to have a big failure and this is not specifically a failure of the LTO drives necessarily, but a failure in transferring the material off of an LTO automated storage system over a network to a local system.

Julia: It was because people were trying to access the material that [this error was discovered]?

Rebecca: Exactly, it was because they were trying the pull the material off the network. The storage material was storage for all of WGBH, it wasn’t storage specifically for the archive. Which is sort of what motivated the change within the archives to have their own Hydra DAM system to maintain local LTO machines with direct connections to local computers.

Julia: Ok, it was because of these failures that that you decided to…make the archive your own and no longer trust the institutional-wide workflow?

Rebecca: Pretty much.

Julia: That’s a good idea probably.

Rebecca: It’s pretty good to have the direct access and know you can just walk down to storage and pull out the tape and have control of that process all the way through.

Julia: Was that failure discovered after you came onto the project?

Rebecca: Not after, but shortly before. It was during the summer when all these massive amounts of material were transferred to Crawford [the vendor storage company]. So I first heard about it when I first visited WGBH, I guess it was the beginning of August… And they started talking about it at AMIA and it became clear there was a lot of interest… And more than that, other people had these kinds of problems, but maybe were not as open with them. It’s unclear if the failures are with the transfers, the transfer protocols, or the LTO tapes, but these are things that people ought to know about and can learn from.

Julia: So in terms of working with this as a research question? Are you studying and acquiring all the brands of LTO tapes you had? How are you replicating older system setups? It’s not like you can replicate that network? What steps are you taking?

Rebecca: Good question. Even now, it was even a question if the failures were with the storage of the files earlier. We probably won’t be able to necessarily replicate what led to the failures. What we are trying to do is narrow down exactly where the point of failure. So we got the files before they originally, or about half the files before they were transferred and their checksums. We can run some file analysis tools on them. We can try to characterize them using tools like FFMPEG and compare the results we get.

Julia: But what kind of failures were these failures?

Rebecca: That’s a really good question. There were actually three different types of failures. So the files… they looked like files when they came. The files transferred all the way, but the files types were unrecognizable. Let me backtrack to before I came onto the project. When these files were initially pulled, there was just such a hurry to transfer them. WGBH was sending these drives of born-digital material created at WGBH to Crawford, the vendor partner on the American Archive. WGBH pulled a whole bunch off of their own WGBH institutional files through their network and sent the hard drives to Crawford. When Crawford received some of the video files, they started to see large  and unfortunate failure rates. Initially in the first batch, in that particular batch there was a 57% failure rate: 693 failed in an initial FFMPEG analysis. That means that when FFMEG tried to characterize them, it couldn’t characterize them as video files. 394 files failed QC, meaning they had issues that made the file unusable — for example, a greenscreen with no audio, or the audio is digital noise only. And then 108 failed in transcoding — the software could not recognize the initial format well enough to rewrite it in a different format. Some of those problems are a little more straightforward than other problems.

Julia: Interesting! So, I’m trying to understand, are they all utter failures? Are some of these failures much worse than others?

Rebecca: That’s part of what I’m trying to discover: why these files were showing problems. So I’ll be doing some testing on the files that failed with FFMPEG and other file analysis tools. Maybe more importantly, the gist of my investigation there is going to be discovering exactly where these failures occurred: whether they happened on the LTO tapes [themselves], in being transferred to the LTO [from WGBH], or whether they were damaged in transfer coming off the LTO tapes [to the hard drives]. That is, whether it was pulling them off the network that damaged them and the transfer protocol itself.

Julia: Do you think your project is elastic enough that this can be absorbed into your project?  You can just talk to your mentors and this is a clear priority with your mentors as well?

Rebecca: Oh, yeah. This is kind of their idea anyway – to have me incorporate this into the stage when I was going to work on the American Archive data anyway. Casey initially proposed it and I thought that was a fantastic idea and a really interesting research to do. My project mentors are amazing; my project is amazing. There’s a lot of flexibility built in, which is kind of as it should be because you don’t really know what kind of problems you’ll see until you’re on the ground.

I think this is one of the keys to the NDSR Program. It has to have structure, it has to have clearly defined product, but at the same time it has to have a lot of flexibility. An archive has to work flexibly and you don’t really know what you’re going to find until you start doing it and see the systems that you have that you hit some major bumps in the road.

Julia: True, but at the same time at the end of our 9 months, we have very clear deliverables to define whether or not our project is a success.

Rebecca: Well…I think for all my fellow NDS residents in Boston, we’ve all seen our projects shift since we’ve hit the ground. For me, again, since my project is structured a little differently-in chunks rather than one overarching thing, it provides for more of that flexibility. My end deliverable is suppose to be an instructional webinar on everything I’ve learned in the archive, so obviously everything I do will go into that no matter what. But for a lot of my fellow peers who’ve had a large projects to work on throughout the course of 9 months, the deliverables changed and end goals changed when they got there.

Julia: It could also be a part of the design of the projects. The NY NDS projects are not diverging much…this may change, however. My project is going at a fast clip, but we also mostly work in a “chunks” and phases. That could be something.

So, we’ve talked a lot about your projects and different interesting problems, but I’m interested in hearing you describe more of your day-to-day schedule. What do you do in a week? How do you organize your time?

Rebecca: It kind of shifts day-to-day. We’re all in one big room, so I see everyone everyday. It’s a very, very friendly and comfortable environment. People can walk over to each other’s cubicles to ask questions, or pop up like gophers and shout across the way.

Julia: Paint the picture. Do you all get coffee and chat in the morning? Do you have regular meetings with your mentor?

Rebecca: Well, my main mentor is Casey [Davis], but during the first stage I’ve met more with Leah [Weisse]. Everyone gets in at different times, so meetings have to be in the middle of the day, like 11-3pm to not interfere with people’s lunches.

Julia:  So it’s pretty friendly and informal. Do you have a dress code? That was one of the big question when we [in New York] all started: what were the official dress codes of our respective institutions and what were typically days like.

Rebecca: No, not any official one, but I don’t wear ripped jeans to work. Or at least no stated written dress code.

Julia: How often do you see your mentors? Do you mostly interact with the 5 or so people in your department? It sounds like Casey has been really great at promoting your work within the entire organization, but I’m curious how you work with other people in your organization and department. Well for me, for example, tomorrow I’ll go in at 10, prep a workstation for an intern coming in at 11. I’ll prep for a meeting with another department and answer emails, check in with Don…Lay it out for me.

Rebecca: So, for example right now, I’m at a point of transition between projects, so it’s more meeting-heavy. I usually have 3-4 meetings a week, but tomorrow I have 3 meetings. So, I came in the morning to set-up for meetings and I set-up my LTO operations because transferring data takes a long time, which you well know.

Julia: That’s similar to my schedule. Do any of these workflows that you’re creating interact or are they discrete?

Rebecca: They do somewhat intersect, the American Archive workflow will serve as a test case for the larger WGBH workflow, but at the same time they have different focus and energy. Which I like because I can take the morning to figure out what project I want to work on. Do I want to work on the workflow document? Do I want to put my headphones on to work on improving shell scripts? The specifics of how I spend my day are up to me. It really varies pretty widely.

Julia: I want to loop back to a previous conversation we had about your preliminary workflow examination and the realization that there were partial remnants of old workflows… because it wasn’t really questioned because it’s such a large, complex workflow with different inputs? Maybe no one understands the full scope of it?

Rebecca: Well, I wouldn’t say it wasn’t questioned. I work with a lot of really smart people here who at one point or another would say, “well this doesn’t need to be here.”  I would say that unless you have someone dedicated to making sure change happens, it’s a lot easier to keep going with the old workflow because you don’t have to go through the work of explaining the new workflow to  large number of people. You have to keep up and you don’t have time to stop even if you know you have some issues. So that’s the benefit of a NDS Resident!

Julia: Yeah, we’re here for a very finite amount of time with a clear mandate. So you’re a beaming endorsement of the project? You really enjoy it, you find it fulfilling, you get along with your colleagues…

Rebecca: Only good things to say!

Julia: Any advice, to individuals  thinking about applying to the next NDSR rounds?

Rebecca: I say go for it! It’s a cool way to get a lot of different kinds of experience and to really have the opportunity to push yourself to learn- that’s the other really nice benefit. For all the work you are doing at your institutions, you are also really encouraged and in fact required to spend a large amount of time taking advantage of professional development opportunities: to continue taking classes, go to conferences, take webinars. For example, if I want to take the afternoon off to write a blog post, i don’t have to feel guilty about that; I’m suppose to be doing that! Or if you want to apply for a copyright class, I can take some time to do that because it’s related to my projects. There are some downsides, but I would definitely recommend it.

Julia: Totally. It’s the best part. Any surprises?

Rebecca: I thought that I was going to be the only resident in my cohort that wasn’t going to be creating a workflow diagram- it’s not true. There will always be a workflow! You can never escape a workflow.

Julia: Especially with these large scale projects. I think everybody, whether creating technical writing documentation or a diagram, is essentially creating workflows.

Rebecca: You learn to love it. You learn to embrace it. How did I ever live without the workflow? I want to create a workflow for everything.

Julia: It’s strange to design workflows for phantom individuals and departments, albeit in my case, I know that the documentation will lay the groundwork for newly created positions for a new department that is now forming.

Rebecca: I thought about that too, but I spent a huge amount of time documenting an inordinately complex document. Leah came in and said she wanted to show it to everyone to show the scope of the work we do.

Julia: Workflow as an advocacy tool! Anything to help advocate. That’s great! Well thanks so much Rebecca! I really enjoyed getting to know a more about your experience so far!

Rebecca: Thanks Julia!

 

 

Shira Peltzman on The Signal

Posted on by

Head over to The Signal to check out “Preserving Carnegie Hall’s Born-Digital Assets: An NDSR Project Update” by resident Shira Peltzman.

From Shira’s post: “The first several months of my project feel like they’ve flown by. There are days when I reflect on what I’ve accomplished in just under three months’ time and feel proud of my progress, and then there are other days when I’m humbled by how much there is still left to do. But overall, the project has been one of the greatest learning experiences I could have hoped for–and there’s still six months left to go.” 

Photo_Hall-300x296

Shira Peltzman at Carnegie Hall. Photo by Gino Francesconi.

The Signal is the digital preservation blog of the Library of Congress. Each NDSR-NY resident will serve as a guest contributor during their residency. In case you missed it, here’s the contribution by Vicky Steeves from November.

 

The web archivists are here: AIT14 surveys the state of the practice

Posted on by

Stanford University Web Archiving Service Manager Nicholas Taylor took to the podium about halfway through the sixth annual Archive-It partners meeting in Montgomery, AL, to unpack key findings of the National Digital Stewardship Alliance’s 2013 survey of web archiving in the United States.

Nicholas Taylor has a posse. Photo by Scott Reed, Archive-It.

Nicholas Taylor has a posse. Photo by Scott Reed, Archive-It.

Those survey results were recently very well summarized by co-author and Library of Congress web archiving team leader Abbie Grotke over at The Signal, and even a little bit further on our little blog, but Taylor also keyed in on one important finding that had been all but lost upon me until this day-long meeting of the minds: Archive-It, he demonstrated, has quickly become the American web archivists’ true community of practice. In just a few short years, membership in the Internet Archive’s browser-based software service has expanded to include more web archivists than any professional affiliation, including the national organization of digital preservation institutions that commissioned this study.

Percentage of NDSA Web Archiving survey respondents by affiliation. Graph by Karl-Rainer Blumenthal.

Percentage of NDSA Web Archiving survey respondents by affiliation. Graph by Karl-Rainer Blumenthal.

This leads us to two important conclusions: 1) that the shoestring-budgeted service deserves significantly greater investment, and 2) that beyond a day to pick nits with vendor representatives, its annual meeting actually provides partners with their most inclusive opportunity to articulate and debate the state of web archiving in America.

From that context, I left AIT14 with a vision of web archiving as moving in two divergent, but not necessarily competing, directions: one extensive, nimble, and “big data”-driven; the other intensive, rigorous, and devoted to preserving precious artifacts. You can see the themes emerge throughout my bafflingly comprehensive notes on all presentations (really, you might want to get a snack), but I’ll break them down here in the meantime.

Representing the value of scaling up investment in Internet Archive’s technology suite, several partners introduced attendees to projects that bridge the data service gap with researchers in need of statewide and national coverage rather than the patchwork of individual websites accessible through the general Wayback Machine. Take just for instance the Integrated Digital Event Archive and Library (IDEAL) project presented by Virginia Tech CTRnet Research Group’s Mohamed Farag. At the center of IDEAL’s research and development is the Event Focused Crawler (EFC), a curator-modeled and then self-refining web crawler that captures websites central to events that spontaneously combust on social media for long-term preservation and ultimate linguistic or otherwise large-scale thematic analysis. Modest enough in concept, even VT’s most foundational technical efforts require large and continued investment by the National Science Foundation.

Internet Archive clearly recognizes the research value of the kind of data stores that VT’s and others’ projects bequeath. Culminating the day’s presentations, program manager Jefferson Bailey and engineer Vanay Goel unveiled Archive-It Research Services, Internet Archive’s own pilot effort to provide technologist researchers with tools designed especially for the longitudinal, graphical, and otherwise visually compelling analysis of Archive-It partners’ curated web collections. Scaling the pilot up to full-scale launch in January hinges upon the viability of file formats developed by Internet Archive engineers to circumvent the need for the kind of massive computational infrastructure that their organization enjoys, but that most individual researchers lack.

Graph of metatext frequency in the Hydraulic Fracturing in New York State Web Archive, Feb-Sept 2014, by Archive-It Researcher Services.

Graph of metatext frequency in the Hydraulic Fracturing in New York State Web Archive, Feb-Sept 2014, by Archive-It Researcher Services.

Of course all American web archivists–or at least those of us working in education and government–share a dirty little secret: that our big data bucket has a hole in it. I’ve spent the better part of my first phase of NDSR work, in fact, inventorying known issues and strategies to mitigate problems of capturing and rendering web content. It’s tempting to overlook these myriad gaps when you still have huge stores of web-native data to analyze, but practically impossible when your charge is to completely capture and accurately render a single, discrete website. NYU music librarian Kent Underwood demonstrated the need for this extremely careful approach. He reminded attendees that Baroque composer Johann Sebastian Bach’s long marginalized scores and manuscripts required stewardship to future generations that could appreciate them. Without rapid and significant improvement to the tools that we use to capture time-based web media like streaming audio, video, and Flash-based applications, the records of emergent young artists in NYU’s Archive of Contemporary Composers’ Websites will not live so long.

You don’t have to be an art librarian to see the dramatic downstream effects of this problem, but it certainly helps! Susan Roeper and Penny Baker of the Sterling & Francine Clark Art Institute Library, for instance, demonstrated how their institution’s long history of collecting catalogues and ephemera from the Venice Biennale required them to quickly become experts in web archiving’s most commonly problematic content types. Their web archive, in turn, reflects less the kind of documentation that could previously have been feverishly snatched up and fit into a colleague’s briefcase, rather the equally fragile born-digital and frequently interactive kinds of media that increasingly define the way that we design for the modern web.

To me, though, the finest point was put on this challenge by Heather Slania, Library Director at the National Museum of Women in the Arts. Like many of her colleagues in universities and government agencies, Slania was introduced to web archiving as a means to preserve her institution’s online presence ahead of a major redesign, but she soon applied it to her museum’s broader collecting mission. Archiving internet art, she explained, offered her the opportunity to preserve the work of women artists traditionally marginalized when not outright omitted from the art historical canon and from the scopes of landmark collecting institutions. Without advancing Archive-It’s capacity to fully capture and accurately replay them, these always imaginative and often subversive uses of the web as an artistic medium are destined to live on only in hints and rumors. Slania has already reduced her original collection of 36 sites down to 29 due to sites going completely offline, and as of 2014 she estimates that half of this collection experiences moderate to complete loss of content in an archival environment.

Susan Roeper (left) and Heather Slania (right) in front of two very perishable records: a salami and an internet artwork. Photos by Scott Reed, Archive-It.

Susan Roeper (left) and Heather Slania (right) in front of two very perishable records: a salami and an internet artwork. Photos by Scott Reed, Archive-It.

My host, NYARC, is doing its part to ensure that the integrity of these sites remains a priority while web archiving expands into larger scale data services territory. In scarily few weeks I will publish my draft procedures for assuring the quality of specialist art historical resources archived from the web. In the meantime, we took home AIT14’s prize for “best documented tech support questions,” which I suppose means that I must be doing something right…

How could I not pick the Neapolitan astronaut ice cream out of the prize bin?  Photo by Karl-Rainer Blumenthal.

How could I not pick the Neapolitan astronaut ice cream out of the prize bin? Photo by Karl-Rainer Blumenthal.

In all seriousness though, the opportunity for a narrowly and preservation focused operation to refine technology that is increasingly applied at such wider scales and iterative phases of research makes me optimistic that the parallel courses of web archiving described above will continue to inform and improve one another. Archive-It brought us all together under one roof, after all, and everyone seems to feel at home here.

The Case for a Digital Preservation Policy Document

Posted on by

Hi everyone, Shira here. Today’s blog post will be something of a project update by way of some thoughts about the need for digital preservation policies, inspired by a conversation between Jefferson Bailey and Meghan Banach Bergin that went up on the Signal earlier this week. Meghan is the author of a Report on Digital Preservation Practices at 148 Institutions worldwide, and Jefferson interviewed her to discuss the results of her research and its implications on her work at University of Massachusetts Amherst Libraries, where Meghan works as the Bibliographic Access and Metadata Coordinator.

signalscreengrab

Bergen’s report provides a fascinating snapshot of how digital preservation is being done around the world at this particular moment in time. Specifically, it highlights the extreme degree of variation that exists within the field; although there are some common themes among digital preservation initiatives (tight budgets, imperfect tools, and not enough members of staff), no two institutions have an approach to preserving digital information that is exactly alike.

For me, one of the most compelling insights to emerge from Bergen’s report is the fact that over 90% of respondents said that they had undertaken efforts to preserve digital materials, and yet only about 25% of the institutions surveyed had a written digital preservation policy. (Briefly, for those wondering what a digital preservation policy is: it’s a written statement authorized by the repository management describing the approach to be taken for the preservation of digital objects deposited into the repository). When I first read that statistic I found it somewhat shocking, but the longer I sat with it the less surprising it began to seem. In fact, I realized that it chimed with my own experience working in the field, which led me to reflect on why this has come to be the case for so many institutions.

preservation_planning

Photo © (c) 2006-2014 Institute of Software Technology and Interactive Systems

The first reason I hit upon is one that Bergen herself mentions in her interview with Bailey. When he asks her about what she feels accounts for the discrepancy, Bergen says that writing a digital preservation policy is time consuming, and that, “that’s why a lot of institutions have decided to skip writing a policy and just proceed straight to actually doing something to preserve their digital materials.”

This struck me as a particularly revealing sentiment because it implies that many respondents understand the act of writing a high-level digital preservation policy document to be fundamentally different (and inherently less valuable) than “actually doing something” to preserve their collections. Quite frankly, this is something that needs to change. Institutions need to begin approaching digital preservation as a holistic task rather than as a series of actions that fulfill the basic preservation requirements of bit preservation and content accessibility. There needs to be a deeply ingrained understanding that not only must any archive, museum, or library seeking to preserve digital material commit to implementing these high-level policy documents, but that doing so is digital preservation in and of itself.

Of course I understand that there is a practical difference between high-level conceptual planning and the hands-on tasks involved in the day-to-day management of an ever-growing collection of digital information; I have yet to work at a repository that has ever had the luxury of ever operating completely outside of the triage stage, and at any given time finding a balance between meeting users’ and administrations’ needs—not to mention the collections’—can be a challenge.

But there will always be emergencies, deadlines, and budget cuts. In fact, that is part of what makes having a digital preservation policy in place so important: being able to articulate your repository’s approach for the preservation of objects accessioned into it will provide a strategy that may help identify gaps or areas of weakness in your preservation strategy, which will in turn help repositories be better prepared for an emergency if one should occur.  A good preservation policy may also aid in demonstrating that certain positions are vital to the organization’s mission, or even—if paired with a well-constructed strategic survey of user groups—help determine the repository’s worth to an organization. In short, this is why having a digital preservation policy is considered a prerequisite to becoming a Trustworthy Digital Repository (for more on that subject see my previous blog post, On The Subject of Trust, which discusses why this matters)

Digital preservation policies matter, and I feel lucky to have the opportunity to help Carnegie Hall draft one. The final deliverable for my NDSR project will be a Digital Preservation and Sustainability document for Carnegie Hall that will outline a set of policies, procedures, best practices, and workflows for the ongoing management of digital files. And yes, this will ultimately include a digital preservation policy, as well as a Preservation Implementation Plan, Strategic Plan, Repository Mission Statement, and Access Policy. (Here’s lookin’ at you, TDR)
hereslookinatyou

At this point in my project I’m midway through the process of gathering the information I will need to eventually create this document. I’ve spent the past month interviewing a variety of different departments across Carnegie Hall to determine how digital content is being created, used, stored, and managed within the organization. I’ve learned a great deal; talking to people has provided me with not only a better understanding of how digital content used within Carnegie Hall, but it has also given me the opportunity to learn what matters most to each department, which will be crucial when we begin rolling out the new DAMS next year. Currently I’m in the process of reviewing, synthesizing, and summarizing these interviews. I will return to these interviews over the next couple months as my project progresses and I reach a point where I can begin to document some of the workflows described in them, so watch this space for updates.

***

Further Reading: There is a lot of good information and resources available on this subject, but wanted to give a quick h/t to Daniel Noonan’s “Digital Preservation Policy Framework: A Case Study”, which charts the process through which The Ohio State University Libraries created an organizational policy for digital information. I particularly enjoyed reading this case study and highly recommend giving it a read.

Prove Yourself: Needs Assessment Edition

Posted on by

What I’ve come to love about the library science field (which after years of waiting tables you’d think I’d hate) is the service aspect to everything we do. Librarians are intensely user-focused in all of our work: through the use of needs assessment surveys, we mold our libraries to what users want, expect, and need. We use the results to design programs, buy technology, even create positions within a library (YA librarian is a thing because of that!). Some common ways to implement a library assessment include  focus groups, interviews, scorecards, comment cards, usage statistics from circulation and reference, and surveys sent to users via email or on paper.

This past week, I attended a workshop with the fabulous Julia Kim at METRO that focused on the implementation and design aspects of surveying, called “Assessment in Focus: Designing and Implementing an Effective User Feedback Survey.” The presenter, Nisa Bakkalbasi, the assessment coordinator at Columbia University Libraries/Information Services, was a former statistician and presented on the many ways one could glean statistically valuable quantitative data from simple survey questions.

The first part of this workshop dealt with the assessment process and types of survey questions, while the second dealt mainly with checking your results for errors. I will focus here on the first part, which is about data gathering and question manufacturing.

I will touch briefly on the assessment process by saying this: all the questions asked should be directly relatable to all the objectives laid out in the beginning of the process. Also, that surveying is an iterative process, and as a library continues to survey its users, the quality of the survey to get valuable results will also increase.

Assessment Process: http://libraryassessment.org/bm~doc/Bakkalbasi_Nisa_2012_2.pdf

Assessment Process: http://libraryassessment.org/bm~doc/Bakkalbasi_Nisa_2012_2.pdf

While my work at AMNH is conducted solely through interviews, I found that the discussion Nisa had on the types of questions used in survey design was particularly helpful. She focused the session on closed-end questions, because there is no way to get quantitative data from open-ended questions. All the results can say is “the majority of respondents said XYZ,” as opposed to closed-ended questions where in the results its “86% of respondents chose X over Y and Z.” This emphasize was extremely important, because real quantifiable data is the easiest to work with when putting together results to share in an institution.

When designing survey questions, it is important to keep a few things in mind:

  • Ask one thing at a time
  • Keep questions (and survey!) short and to the point
  • Ask very few required questions
  • Use clear, precise language (think The Giver!)
  • Avoid jargon and acronyms!

The two most common closed-ended questions are multiple choice questions:

multiple choice

and rating scale questions:

rating scale

For multiple choice questions, it is important to include all options without any overlap. The user should not have to think about whether they fit into two of the categories or none at all. For rating scales, my biggest takeaway was the use of even points for taking away any neutrality. While forcing people to have opinions is considered rude at the dinner table, it is crucial to the success of a survey project.

“Filthy neutrals!” -- Commodore 64 Zapp Brannigan

“Filthy neutrals!” — Commodore 64 Zapp Brannigan

Both of these types of questions (and all closed-ended questions) allow for easy statistical analysis. By a simple count of answers, you have percentage data that you can then group by other questions, such as demographic questions (only use when necessary! sensitive data is just that–sensitive) or other relevant identifying information.

In terms of results, this can be structured like: “78% of recipients who visit the library 1-4 times a week said that they come in for group study work.” These are two questions: what is your primary use of the library, and how often do you come in, both multiple choice. These provide measurable results, critically important in libraryland and something librarians can utilize and rely heavily upon.

I also want to briefly discuss more innovative ways libraries can begin to use this incredible tool. Proving value–the library’s value, that is. Libraries traditionally lose resources in both space and funding due to a perceived lack of value by management, the train of thought usually that since libraries aren’t money-makers, it inherently has less value to the institution.

We as librarians know this to be both ludicrous and false. And we need to prove it. If the result the library is looking for says something like “95% of recipients said that they could not have completed their work without the use of the library,” then that is a rating scale question waiting to happen. And an incredible way to quantitatively prove value to upper management.


graphic

Quantitative data gathered via strategic surveying of user groups can be a powerful tool that librarians can–and should!–use to demonstrate their value. In business decisions, the hard numbers do more than testimonials. Library directors and other leaders could have access to materials that allow them to better represent the library to upper management on an institution-wide level. This can be the difference between a library closure and a library expansion, especially in institutions where funding can be an issue.

Librarians can and should use these surveys for their own needs, both internally for library services and externally on an institution-wide scale. Whether you are a public library trying to prove why you need a larger part of the community’s budget, or a corporate library vying for that larger space in the office, the needs assessment survey can prove helpful to cementing the importance of a library as well as development of library programs.

In the words of Socrates, “an unexamined life is not worth living.”

Jeremy Blake’s Time-Based Paintings

Posted on by

Julia here. In my last post, I gave an overview of the digital forensics AMIA panel I chaired. In this post, I’ll go over some of the work I’m doing as a resident at New York University Libraries, with a special focus on the Jeremy Blake Papers. My current task is to create access-driven workflows for the handling of complex, born-digital media archives.  My work, then, does not stop at ingest but must account for researcher access.  I’m processing 20 collections, each with its own set of factors that influence the direction workflows may take. For example, collections can range in size from 30 MB on 2 floppy disks to multiple terabytes from an institution’s RAID.  Collection content may comprise simple .txt and ubiquitous .doc files or, as is the case of material collected from computer hard drives, may hold hundreds of unique and proprietary file types. Further complicating the task of workflow creation, collections of born-digital media often present thorny privacy and intellectual property issues, especially with regard to identity-specific (ex: social security) information which is generally considered off-limits in areas of public access.

At this point in the fellowship, I have conducted preliminary surveys of several small collections  with relatively simple image, text, moving image, and sound file formats. Through focusing on accessibility with these smaller collections first, I’ll develop a workflow that encompasses disparate collection characteristics. These initial efforts will help me to formulate a workflow as I approach two large, incredibly complex collections: the Jeremy Blake Papers and the Exit Art Collection.  I’ll spend the rest of this post discussing the Blake Papers.

Jeremy Blake (1971-2007) was an American digital artist best known for his “time-based paintings” and his innovations in new media. The Winchester trilogy exemplifies his methodology, which transversed myriad artistic practices: here, he combined 8mm film, vector graphics, and hand-painted imagery to create distinctive color-drenched, even hallucinatory, atmospheric works.  Blake cemented his reputation as a gifted artist with his early artistic and commercial successes, such as his consecutive Whitney Biennial entries (2000–2004, inclusive) and his animated sequences in P.T. Anderson’s Punch Drunk Love (2002).

The Jeremy Blake Papers include over 340 pieces of legacy media physical formats that span optical media, short-lived Zip and Jaz disks, digital linear tape cartridges, and multiple duplicative hard drives.  Much of what we recovered seemed to be a carefully kept personal working archive of drafts, digitized and digital source images, and various backups in multiple formats, both for himself and for exhibition. While the content was often bundled into stacks by artwork title (as exhibited), knowing that multiple individuals had already combed through the archive before and after acquisition of the material make any certainty as to provenance and dating impossible for now.  In addition to work files, we are also processing emails and other assorted files recovered from his laptop.

Photoshop component files from Chemical Sundown (2011) displayed on PowerMac G3.

Photoshop component files from Chemical Sundown (2011)
displayed on PowerMac G3.

Through the work I’ll be doing over the course of this fellowship (stay tuned), researchers will be able to explore Blake’s work process, the software tools he used, and the different digital drafts of moving image productions like Chemical Sundown (2011).

Processing the Jeremy Blake Papers will necessitate exploration of the problems inherent in the treatment of digital materials.  Are emails, with their ease of transmission and seeming immateriality, actually analogous to the paper-based drafts and correspondences in the types of archives we have been processing for years? Or are we newly faced with the transition to a medium that requires seriously rethinking our understandings and retooling of our policy procedures to protect privacy and prevent future vulnerability?  While we haven’t explicitly addressed the issue yet, these are some of the bigger questions that our field will need to explore as we balance our obligations to donors as well as future researchers. Tangential, but not irrelevant, are the types of questions surrounding the belated conception, positioning, and exhibition of post-mortem presentations of incomplete works, such as Blake’s unfinished Glitterbest). These are some of the serious conundrums I am addressing in my work as I draft the clauses addressing born-digital materials for our donor agreement templates—creating concrete policy formations which will be implemented during the course of an acquisition and donor interview next week.

The Blake collection was initially reported to include over 125,000 files. We have recently had to renumber and rethink the accuracy of some of the initial figures, thanks in no small part to the discovery of hitherto occluded media in unprocessed boxes. Initially, my mentor, New York University Digital Archivist Don Mennerich, and I were working with files copied (and therefore significantly altered) from Blake’s hard drives received in 2010, before write-blocker hardware was part of the required protocol for handling digital material at NYU. Taking clues from the fields of legal and criminal investigation, adoption of relevant digital forensics practice in cultural heritage institutions didn’t happen until after breakthroughs such as the publication of CLIR born-digital forensic (2010) paper. Not having the file timestamps severely limited our ability to assess the collection’s historical timespan. In our predictions with regard to research interest, charting Blake’s work progress over time would have been high up on the list, so this bar chart (created from Access Data’s FTK software) was obviously not ideal. Digital files are delicate; the ways in which file access information is recorded lends itself to distortion.

 

Visualization of 1st set of Blake born-digital material with all dates modified.  The grey rectangle represents the modified access date.  That is, all the files show the same date rather than a span of years.

Visualization of 1st set of Blake born-digital material with all dates modified. The grey rectangle represents the modified access date. That is, all the files show the same date rather than a span of years.

Visualization of 2nd set of Blake born-digital material with intact date span, as represented by the many gray lines across almost a decade.

Visualization of 2nd set of Blake born-digital material with intact date span, as
represented by the many gray lines across almost a decade.

Luckily, the issues created by previous access to archival files were resolved after some digging into written reports regarding the collection, along with the important discovery of four boxes of unprocessed material. Enlisting the aid of a number of student interns, we’ve imaged (created bit exact replicas, which can itself be a difficult hurdle ) more than half of these materials. Comparing newly imaged material with the initial Blake acquisition files, we have determined that many of the acquired, compromised files are duplicative, and consequently we have been able to assign the correct time-date stamps! That is, many of the files from the 2nd set of born-digital media images were in the 1st set as well. Blake clearly understood the importance of redundancy in his own workflow. I’ve no doubt that this is (or may prove to be) a common experience for archivists processing digital materials.

Examples of Jeremy Blake media.

Examples of media from the Jeremy Blake Papers (note the optical media that looks like vinyl).

At this point, Blake’s collections have been previewed, preliminarily processed, and arranged through Access Data’s FTK software. This is a powerful but expensive software program that can make an archivist’s task-—to dynamically sift through vast quantities of digital materials—even plausible as a 9-month project. While Don and I manage the imaging and processing, we’ve also starting discussing what access types might look like. This necessitates discussions with representatives from all three of NYU’s archival bodies (Fales, University Archives, and Tamiment), as well as the head of its new (trans-archive) processing department, the Archival Collections Management Department. In our inaugural meeting last week, we discussed making a very small (30 MB) collection accessible to researchers in the very near future as a test case for providing access to some of our larger collections. As part of my responsibilities here, I’ll be chairing this group as we devise access strategies to collection content.

More specifically, we have also set up hardware and software formulations that may help us to understand Blake’s artistic output. In the past two weeks, for example, Don has identified the various Adobe Photoshop versions that Blake used through viewing the files through the hex (hexadecimal of the binary). We have sought out those obsolete versions of Adobe Photoshop, and my office area is now crowded with different computers configured to read materials from software versions common to Blake’s most active years of artistic production. Redundancy isn’t just conducive to preventing data loss, however. We still need multiple methods with which to view and assess Blake’s working files. In addition to using multiple operating systems, write-blockers, imaging techniques, and programs, I spent several days installing emulators on our contemporary Mac, PC, and Unix machines. After imaging material, we’ll start systematically accessing outdated Photoshop files via these older environments, both emulated and actual.

Hex editor view used to help identify software versions used.

Hex editor view used to help identify software versions used (extra points if you recognize what Blake piece this file is from).

In the meantime, I still need to make a number of decisions and the workflow is still very much a work in progress! This underpins a larger point: This fellowship necessitates documentation to address gaps like these. That is, while there are concrete deliverables for each phase of the project, in order to deliver I’ll need to understand and investigate intricacies in the overall digital preservation strategy here at NYU. While working with very special collections like the Jeremy Blake Papers is a great opportunity, it’s also great that the questions we address will be useful at our host sites for many other projects down the line. While I may not be able to write more on Blake in the blog, Don Mennerich and I will co-present our paper documenting our findings at the American Institute of Conservation (AIC) this May…but in the meantime, lot’s of work will need to get done!

On the Subject of Trust

Posted on by

Building a Network of Trustworthy Digital Repositories
Shira here. Earlier this week Karl gave us an excellent run-down of some of the newly released 2015 National Agenda for Digital Stewardship’s principal findings. Whereas Karl’s blog post focused on two of the banner recommendations made in the “Organizational Policies and Practices” section of the report—namely, the importance of multi-institutional collaborations and the acute need to train and empower a new generation of digital stewardship professionals—my blog post today will focus on one of the “blink and you’ll miss it” points that the report makes in its final section on “Research Priorities”.

If you made it all the way to page 41 of the report (and I know you all did… right?), you may have noticed a vaguely phrased section entitled “Policy Research on Trust Frameworks”. Although this section is buried in the report under other findings and actionable recommendations, it contains what is, in my opinion, one of the most important points that the report raises. The purpose of this section is to highlight the importance of developing robust “trust frameworks” for digital repositories. If you read this last sentence and thought, “Hooray! I’m delighted to see this in the NDSA report” then you can probably skip this blog post. For those of you still wondering what a trust framework is and why they matter to digital preservation, read on.

Kara Van Malssen and Seth Anderson giving a workshop to NDSR mentors and residents on the Audit and Certification of Trustworthy Digital Repositories (ISO 16363:2012)


WTF – What’s a Trust Framework?
A trust framework is a digital preservation planning tool that clearly defines the characteristics and responsibilities of a sustainable digital repository. Trust frameworks typically lay out the organizational and technical infrastructure required for an institution to be considered trustworthy and capable of storing digital information over the long-term. Simply put, a trust framework provides organizations with a way to measure—and thereby demonstrate to potential donors, clients, or auditors—its trustworthiness as a steward of digital information. The need for trust frameworks in digital preservation has become more apparent over the past couple decades, and a number of distinct initiatives have been developed in response.

Some trust frameworks, like the Global LOCKSS Network, Meta Archive, Data Preservation Alliance for the Social Sciences (Data-PASS), and the Digital Preservation Network, work on the principle of redundancy and broad-based collaborative institutional mechanisms as a strategy for mitigating single-points-of-failure within a given institution. In this system, multiple organizations that may not be able to individually provide all the elements necessary for a sustainable, end-to-end digital repository enter into an agreement to become collaborative stewards of digital information.

redundancy_graphic

Other trust frameworks, like the NESTOR Catalogue of Criteria for Trusted Digital Repositories, DRAMBORA, and the Trustworthy Repositories Audit & Certification (TRAC) criteria, take the form of auditing tools. These trust frameworks are intended to allow organizations to determine their capability, reliability, commitment, and readiness to assume long-term preservation responsibilities. While some of these tools offer organizations the possibility of performing a self-audit, others offer repositories the possibility of being evaluated and ultimately certified as a trustworthy digital repository by an outside auditor.

criteria_graphic

This post was largely inspired by a recent workshop given by Kara Van Malssen and Seth Anderson of AVPreserve on one of these tools, the Audit and Certification of Trustworthy Digital Repositories, so I thought I’d take this opportunity to talk a little bit about what it is and why it matters to digital preservation.


TDR: A Little Bit of Background
I’ll assume anyone reading this blog has more than a passing familiarity with the Reference Model for an Open Archival Information System, which defines the requirements necessary for an archive to permanently preserve digital information. (If you’re not familiar with OAIS—also known as ISO 14721:2003—stop whatever you’re doing and go read it here). While it’s pretty hard to overstate the importance of OAIS for digital preservation, one thing it lacks is any comprehensive definition or consensus on the characteristics and responsibilities of a sustainable digital repository. That’s precisely where the Audit and Certification of Trustworthy Digital Repositories comes in.

In 2003, Research Libraries Group (RLG) and the National Archives and Records Administration (NARA) embarked on a joint initiative to specifically address the creation of trust framework for digital repositories that would rely on a process of certification. The result of their effort was the Trustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC), the purpose of which is to identify digital repositories capable of reliably storing, migrating, and providing access to digital collections. The TRAC criteria formed the basis of the Audit and Certification of Trustworthy Digital Repositories, which ultimately superseded it.

The Audit and Certification of Trustworthy Digital Repositories (ISO 16363:2012, or just “TDR” for short), outlines 109 distinct criteria designed to measure a repository’s trustworthiness as a steward of digital information. Intended for use in combination with the OAIS Reference Model, TDR lays out the organizational and technical infrastructure that an institution must have in order to be considered trustworthy and capable of ensuring the stewardship of digital objects over the long-term.

oaistdr

 

Aww Come On. Just Trust Me!
Why do these tools matter? Here are a few reasons:

  • In order to provide reliable, long‐term access to managed digital resources, archives must assume high-level responsibility for this material. This requires a significant amount of resources, organization, infrastructure, and planning across all levels of an organization. Attempting to steward digital material over the long-term on an ad-hoc basis or without the appropriate resources and infrastructure in place is dangerous, and will ultimately put the material they are tasked with caring for at risk. In order to do this effectively, archives must have some metric by which they can evaluate their progress. Trust frameworks like TDR provide this.
  • TDR is designed to take into account a number of criteria beyond merely an organization’s digital preservation infrastructure. These include, for example, the degree of fiscal responsibility an institution is able to demonstrate, and whether or not the institution of an appropriate succession plan, contingency plans, and/or escrow arrangements in place in case the repository ceases to operate. In spite of the fact that both of these criteria are critical to an organization’s ability to provide long‐term access to digital resources, they might be easily overlooked. Being evaluated according to set of established standards—whether vis-à-vis a self-audit or by an external auditor—can highlight holes in a repository’s operation that may not be apparent in the course of normal, day-to-day business.
  • As TDR states in its introduction, “Claims of trustworthiness are easy to make but are thus far difficult to justify or objectively prove.” On a very basic level, trust frameworks provide institutions with a metric that allows them to compare their own systems and procedures against an established, high profile standard in order to evaluate their trustworthiness. Employing a trust framework like TDR will allow archives to provide evidence to potential grantmaking bodies, donors, or board members that they are responsible and trustworthy digital stewards.


Sounds Great. Sign Me Up!
Not so fast, cowboy. While it is undeniably clear that establishing reliable trust frameworks is of the utmost importance to the field, it doesn’t mean that TDR—or any of the other trust frameworks I’ve mentioned here so far—provide the whole answer in and of themselves. As the NDSA report points out, this is still a relatively under-explored area and there is a lot of room for additional standards, models, and frameworks to be developed. Moreover, the report takes pains to point out that many of these frameworks have yet to be empirically tested and systematically measured, and that there are still a lot of questions that remain to be answered: “How reliable are certification procedures, self-evaluations, and the like at identifying good practices? How much do the implementations of such practices actually reduce risk of loss?” the report asks. “The evidence base is not yet rich enough to answer these questions”.

The report concludes with a recommendation that funding and research bodies concentrate their research efforts on exploring ways to improve the reliability and efficiency of trust frameworks for digital preservation. As with so many things in this field, there is no final solution; the journey is the destination.