Kickoff meeting Quantifying Historical Perspectives on WWII

Today, the kickoff meeting for the project Quantifying Historical Perspectives on WWII was held. This is one of the projects funded by the Data Science Research Center. In this VU-UvA collaboration project*, two students will be investigating different perspectives on the Second World War. Specifically, they will employ a data science pipeline to look in all kinds of different media (Wikipedia, Verrijkt Koninkrijk, KB newspapers,…) and identify and visualize different perspectives. 

Image

The students will build on previous work (Verrijkt Koninkrijk, …)  and on existing analysis tool (xTAS, ThemeStreams) to provide insight into the volume, selection and depth of WWII-related topics across different media, times and locations.

 

* The project proposal was submitted by Daan Odijk from UvA and Laura Hollink, Jacco van Ossenbruggen and Victor de Boer from VUA.

Advertisements

Linked WW II Data made at the OpenCultuurData Hackathon

Image

Michiel and me presenting the result at the hackathon

For OpenCultuurData, I assisted NIOD (Dutch Institute for War Documentation) as an ‘Open Data coach’. For the hackathon, organised 16 june 2012 by hackdeoverheid, NIOD published part of its image archive Beeldbank WO2as open data (see also their datablog). The dataset contains 140.000 images about WW II as well as its metadata. It is accessible through OAI-PMH.

Also for OpenCultuurData, the ‘Nationaal Comité 4 en 5 mei‘ (VVM) presented their database about war monuments as open data (again, see their datablog). This database (available as an XML datadump) contains 3500 monuments, most of which are related to WW II, including the Dam Square Monument.

For the hackathon of 16 June, Michiel Hildebrand and myself decided to take these two datasets and convert them to ‘five star linked data‘.

Conversion

For the conversion, we used the XML to RDF tool enclosed within Cliopatria, VU’s semantic toolset. Using a few rewriting rules, we converted the OAI XML of NIOD’s beeldbankWo2 as well as the XML of 4en5mei to RDF.

  • The NIOD data consists of 2,097,214 RDF triples, using 15 predicates, most of which are Dublin Core metadata fields. The images records are annotated with concepts from the NIOD thesaurus, which is currently under development within the Verrijkt Koninkrijk project .
  • The VVM data set contains 122,233 RDF triples and uses 37 predicates, most of which are specific to the dataset. We mapped these predicates to Dublin Core using subProperty predicates (for example, the 4en5mei:artist predicate is mapped to dc:creator. To be able to map address locations to other data sources, we upgraded addresses from literals to SKOS concepts.

Links

We semi-automatically linked produced the following links:

  • VVM city and community relations to GeoNames instances  (4,124 links)
  • VVM address relations to Amsterdam Museum thesaurus concepts (77 links)
  • NIOD thesaurus concepts to Amsterdam Museum concepts (488 links)
Linked Data graph figure

This Linked Data graph figure shows the two datasets, plus the vocabularies and datasets they link to.

In a previous effort, we produced links betweeb the NIOD thesaurus and a) Cornetto and b) Dutch AAT. The result is shown in the mini-datacloud figure below.

URIs and access

For the datasets, we used PURL URIs. This is mainly a matter of convenience since we do not have direct access to either the NIOD or the VVM web servers. We used the basenames http://purl.org/collection/nl/niod/ and http://purl.org/collection/nl/viervijfmei/. HTTP requests are forwarded to a running instance of Cliopatria at http://semanticweb.cs.vu.nl/pvb. Here, a SPARQL endpoint can also be found.

Below is a list of example URIs:

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept.

Status and next steps
This represents only a first effort to make a these datasets linked open data. Some issues that we will look at in the near future are:
  • Link evaluation: none of the links were validated, so there is no guarantee of their quality.
  • More links: More possibilities for connecting the datasets remain. These include the enrichment of BeeldbankWO2 dc:coverage fields (to GeoNames) and mappings to Rijksmonumenten, Stadsarchief etc.
  • The NIOD data now lives on two separate Cliopatria servers (one associated with Amsterdam culture data and one with Verrijkt Koninkrijk). These should be merged.
  • We are also looking at use cases for applications that will use this linked data. We hope to submit one to the OpenCultuurData challenge.