Our paper “Evaluating Unsupervised Thesaurus-based Labeling of Audiovisual Content in an Archive Production Environment” was accepted for publication in the International Journal on Digital Libraries (IJDL). This paper, co-authored with Roeland Ordelman and Josefien Schuurman reports on a series of information extraction experiments carried out at the Netherlands Institute for Sound and Vision (NISV). Specifically, in the paper we report on a two-stage evaluation of unsupervised labeling of audiovisual content using subtitles. We look at how such an approach can provide acceptable results given requirements with respect to archival quality, authority and service levels to external users.
For this, we developed a text extraction pipeline (TESS), pictured here which extracts key terms and matches them to the NISV thesaurus, the GTAA. This journal paper is an extended version of the paper previously accepted at the TPDL conference and here provide an analysis of the term extraction after being taken into production, where we focus on performance variation with respect to term types and television programs. Having implemented the procedure in our production work-flow allows us to gradually develop the system further and to also assess the effect of the transformation from manual to automatic annotation from an end-user perspective.
Happy and suprised to find the first (and so far only) CultuurLink Linking Award in my mail box yesterday! I checked with the nice people over at Spinque.com and it turns out it was a token of appreciation for being a prolific Cultuurlink user 🙂
I think the vocabulary alignment tool is great and easy to work with, so I can recommend it to anyone with a SKOS vocabulary who wants to match it with any of the major cultural thesauri in the ‘Hub’. Thanks to the people at Spinque for the great tool and the nice gesture!
Today, the TPDL (International Conference on Theory and Practice of Digital Libraries) results came in and both papers on which I am a co-author got accepted. Today is a good day 🙂 The first paper, we present work done during my stay at Netherlands Institute for Sound and Vision on automatic term extraction from subtitles. The interesting thing about this paper was that it was mainly how these algorithms were functioning in a ‘real’ context, that is within a larger media ecosystem. The paper was co-authored with Roeland Ordelman and Josefien Schuurman.
A fresh start for me! As of July 1st, I work as a researcher at the Netherlands Institute for Sound and Vision (Beeld en Geluid). They have an awesome building, with awesome people and an awesome audiovisual collection. The latter could do with some Semantic Web technology, so that is what I will be working on.
I will keep this space for updates on past, present and future projects.