Earlier this month, I had the opportunity to participate in my very first hackathon while at Code4Lib 2015 in Portland, OR. The goal of the hackathon was to evaluate the PBCore metadata standard (used to describe audiovisual materials – more about it here) and determine the best way to create a PBCore RDF ontology. There have already been some great recaps of the event, such as this one by WGBH’s Karen Cariani. This post is just my two cents on the experience of participating in a hackathon.
The hackathon took place the Saturday and Sunday before the conference at an Airbnb house a ways outside of downtown Portland. Seven of us stayed there for the entirety of the conference, and we were joined by five or six conference attendees who weren’t staying at the house. As a marked introvert, I was worried about the prospect of spending an entire week with people who were, to me, more or less strangers. What I found was that most of the housemates were the same as me – introverted librarians/archivists who loved metadata, audiovisual collections, and organizing stuff. Needless to say, there were no problems. Everyone got along fabulously and I think the fact that we were all situated in the same place for an extended period of time helped us when coming to decisions during the hackathon. In Karen Cariani’s post she discusses the benefits of working on this project while being physically together (rather than on conference calls). It was apparent that we would not have been able to make the progress we did without getting this group of people in the same place for an extended period of time.
The goal of the hackathon as it was originally envisioned was to come to a consensus about how to move PBCore forward into RDF linked data. We figured we’d spend the entire weekend considering the various options, specifically whether it would be better to create and design an entire new RDF ontology for PBCore or whether it would be better to draw from existing ontologies (specifically, EBUCore, a European “sister” standard of PBCore – both standards developed as audiovisual-specific expansions of Dublin Core). We came to the conclusion, relatively quickly, that it would be superfluous to create an entirely new PBCore ontology when a perfectly good EBUCore ontology already existed with a majority of the information we would need to express. Granted, there are significant differences between the design, structure, and language of the two standards, as we discovered very clearly when we attempted to map from PBCore XML to EBUCore RDF/XML. However, there is enough similarity in the overall design of the standards that mapping between them is worth the effort.
It was exciting to come out of this hackathon with tangible results, rather than just ideas to present as possibilities. You can view some of these results – and contribute more! – at our GitHub repo: https://github.com/WGBH/pbucore. This repo contains our first attempts at mapping PBCore XML to EBUCore RDF/XML using XSLT – with my apologies for all those acronyms. Basically we decided that the best way to deal with creating an ontology for PBCore was to examine each PBCore element individually, down to its attributes, and see if it would be possible to map that into the language and hierarchy prescribed by EBUCore. We did find a number of gaps that will need to be filled (for example, Instantiation Generations cannot be expressed using current EBUCore), and for those we will need to create new structures in the ontology. Whether that is as an extension of EBUCore, as its own namespace, or something else entirely remains to be seen.
For me, one of the biggest conceptual challenges was translating between an XML metadata standard and an RDF ontology. These are two things that are structurally and conceptually very different. Ontology Development 101 by Natalya F. Noy and Deborah L. McGuinness gives a good overview of the choices involved when designing an ontology, including things that one does not consider when making an XML schema such as classes, domains, and ranges. Because these things are not considered when designing a metadata standard, they can cause problems and inconsistencies when trying to map an existing metadata standard to an existing ontology. Things that were easily expressible in the metadata standard may become very difficult to express in the complex language of the ontology – although, luckily, the opposite is also true.
This hackathon was an enormously educational experience for me. I came into the hackathon with a very basic understanding of ontologies and the challenges involved in creating them, and came out of it with firsthand experience mapping an XML metadata standard to an RDF/XML ontology. I also became much more knowledgeable about what’s happening in Europe with linked data ontologies, as we were lucky to have Jean-Pierre Evain, creator of EBUCore, with us at the hackathon all the way from Switzerland (a fact which made the rest of us a little less grumpy about our East Coast/Midwest jetlag!). Even though I felt overwhelmed and out of my depth at the beginning of the hackathon, the immersive and collaborative nature of the hackathon and the intelligence and helpfulness of the other participants allowed me to quickly get up to speed. If you’re looking for a way to rip off the metaphorical bandaid and immerse yourself in something with which you’ve been meaning to gain more experience, I highly recommend a small, focused hackathon such as this one.
I consider my visit to Portland time well spent, with a great deal of professional development mixed with a healthy dose of sightseeing and, of course, sampling the local cuisine. The rest of Code4Lib 2015 is a topic too big for this post, although fellow NDSR resident Vicky Steeves does a great job discussing it here. For those planning on heading to Portland for AMIA 2015 or ACRL 2015: I am very jealous! Enjoy! If you have the opportunity, take the relatively brief trip out to the coast. I did, visiting Astoria, Oregon for my very first view of the Pacific Ocean:
For more specifics about the hackathon including notes, check out the hackathon wiki: http://wiki.code4lib.org/PBCore_RDF_Hackathon.