Paper about automatic labeling in IJDL

mompeltOur paper  “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.

tess_alg

For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.

The paper will appear on the Journal site shortly. A final draft version of the paper can be found here: deboer_ijdl2016evaluating_draft [PDF].

 

 

Advertisements

CultuurLINK Linking Award

Happy and suprised to find the first (and so far only) CultuurLink Linking Award in my mail box yesterday! I checked with the nice people over at Spinque.com and it turns out it was a token of appreciation for being a prolific Cultuurlink user 🙂

I think the vocabulary alignment tool is great and easy to work with, so I can recommend it to anyone with a SKOS vocabulary who wants to match it with any of the major cultural thesauri in the ‘Hub’. Thanks to the people at Spinque for the great tool and the nice gesture!

spinqeprijs

ICT Open 2016

Below you find some impressions from ICT.Open 2016. At this very nice event members from the Web and Media group and VU master students presented their ICT research.

The images show me presenting the Observe project’s achievements so far. Oana Inel presenting the DIVE demo, Anca and Oana accepting the SIKS poster award, Gossa Lo presenting Kasadaka to demo jury members, three Web and Media posters and a nice presenation from Google on AlphaGo.

Eerste geslaagde pilot Observe is uitgevoerd – ACT MediaLab

Woensdag 27 januari jongstleden vond in Enschede de eerste pilot voor Observe plaats. Boven de ingang van winkelcentrum De Klanderij, hangt een scherm van 6×3 meter waarop de content van 100% FAT (genaamd: Walk The Line) werd getoond. Rond half 11 in de morgen begon 100% FAT met de opbouw van de installatie. Middels een …

Source: Eerste geslaagde pilot Observe is uitgevoerd – ACT MediaLab

DISH2015 Conference: the inflatable elephant says you should share your data

On 7 and 8 september, the international DISH 2015 conference on digital strategies for heritage was held in Rotterdam. For the second day, I was asked to join in a discussion panel around the value of (Linked) Open culture data and its business models.

DISH featured an inflatable elephant as a call to tweet about your “Elephant in the heritage room”. Mine was:

The conference itself consisted of many interesting keynotes, minikeynotes, round-table discussions and workshops around digitization, data management and sharing and digital presentation of cultural heritage content.

 

During one of the mini-keynotes Maarten Zeinstra presented Embedr.eu, a service for finding, cropping and embedding open cultural images with proper attribution. Other presentations such as the one from Jill Cousins of Europeana focused on the positive effects for institutions of sharing data.

The panel discussion in the afternoon was organized by Tine van Nierop from Archief2020 and Maarten Brinkerink of Sound and Vision on the topic “Cultural heritage is not a business: moving from the canvas to enabling re-use and collaboration”. The panel consisted of professionals from the heritage domain to discuss how opening their data lead to new usage of and interest in the collection. I shed some light on the added value of linking your collection to related collections and vocabularies, with examples from DIVE and Verrijkt Koninkrijk.

 

Dutch Ships and Sailors in 1st issue of the DHCommons journal

DHCommons journal logoA while ago, we submitted a project description of our Digital History project Dutch Ships and Sailors to the DHCommons journal and this week the first issue of the journal was published containing our paper “The Dutch Ships and Sailors project“.

This is a nice companion piece to the more technical description of the dataset which was published in the proceedings of ISWC 2014. The new version highlights more the general setup of the project and the considerations and innovations of the project from a historical point of view.

New datacloud

New datacloud

Since submission of this ‘mid-term project description’, the DSS data cloud has been expanding, and the ‘development’ version of the triple store now hosts six datasets thanks to the work of Jeroen Entjes (see the datacloud figure).

Msc. Project: Linking Maritime Datasets to Dutch Ships and Sailors Cloud – Case studies on Archangelvaart and Elbing

[This post was written by Jeroen Entjes and describes his Msc Thesis research]

The Dutch maritime supremacy during the Dutch Golden Age has had a profound influence on the modern Netherlands and possibly other places around the globe. As such, much historic research has been done on the matter, facilitated by thorough documentation done by many ports of their shipping. As more and more of these documentations are digitized, new ways of exploring this data are created.

screenshot1

Screenshot showing an entry from the Elbing website

This master project uses one such way. Based on the Dutch Ships and Sailors project digitized maritime datasets have been converted to RDF and published as Linked Data. Linked Data refers to structured data on the web that is published and interlinked according to a set of standards. This conversion was done based on requirements for this data, set up with historians from the Huygens ING Institute that provided the datasets. The datasets chosen were those of Archangel and Elbing, as these offer information of the Dutch Baltic trade, the cradle of the Dutch merchant navy that sailed the world during the Dutch Golden Age.

Along with requirements for the data, the historians were also interviewed to gather research questions that combined datasets could help solve. The goal of this research was to see if additional datasets could be linked to the existing Dutch Ships and Sailors cloud and if such a conversion could help solve the research questions the historians were interested in.
Data visualization showing shipping volume of different datasets.

elbing graphAs part of this research, the datasets have been converted to RDF and published as Linked Data as an addition to the Dutch Ships and Sailors cloud and a set of interactive data visualizations have been made to answer the research questions by the historians. Based on the conversion, a set of recommendations are made on how to convert new datasets and add them to the Dutch Ships and Sailors cloud. All data representations and conversions have been evaluated by historians to assess the their effectiveness.

The data visualizations can be found at http://www.entjes.nl/jeroen/thesis/. Jeroen’s thesis can be found here: Msc. Thesis Jeroen Entjes