Nichesourcing pluvial data digitization for the Sahel

Example pluvial records digitized through Binyam's nichesourcing effort (photo's W. Tuijp)

Example pluvial records digitized through Binyam’s nichesourcing effort (photo’s W. Tuijp)

At EKAW 2012 I presented a position paper co-authored with a number of VU-colleagues on nichesourcing as a next phase in crowdsourcing practice. In Nichesourching, tasks are not distributed to the faceless crowd but rather to small groups of amateur experts that share a set of characteristics. These characteristics ensure that they can perform tasks that require specific knowledge with higher quality and furthermore they are more motivated through their connection with the context. The presentation slides are archived on Slideshare, the paper itself can be found here.

The paper and presentation features two use cases. One use case concerns the Master’s project by Binyam Tesfa, supervised by me and Pieter De Leenheer. Binyam investigated a Nichesourcing approach for digitizing pluvial data from the Sahel region in Africa. He developed and published a nichesourcing application on the web targetin the African diaspora (African expats currently living in the North). Binyam evaluated its success in terms of attracting dedicated participants and digitizing considerable amount of digital data. With one week release of our Nichesourcing application, the participants produced more than 5000 cells of structured digitized pluvial data. We also found that the anticipated niche (people with African affiliation) dedicatedly participated in the digitization. Binyam’s thesis can be found here: Nichesourcing: a case study for pluvial data digitization for the Sahel by B. Tesfa [PDF]

The other use case presented is the Rijksmuseum print annotation use case where 700.000 prints are to be annotated by amateur experts. Prints depicting flowers are distributed to flower-enthousiasts, prints of castles to castle-geeks etc. For this use case, the people in the COMMIT/ SEALINCMedia project are currently developing a nichesourcing methodology and application.

SPARQL Queries for Verrijkt Koninkrijk

[update: the links have been updated] In this post, I list a number of SPARQL queries that show the way external sources can be used to provide enriched access to the Verrijkt Koninkrijk text. The queries go with a two-page abstract  entitled “Enriched Access to a Large War Historical Text using the Back of the Book Index” I submitted to the SWAIE 2012 – Semantic Web and Information Extraction workshop I will be attending.

These queries use the back-of-the-book index that has been converted to SKOS and was subsequently aligned with a number of datasources.

The queries can be entered in the interactive SPARQL interface of the Verrijkt Koninkrijk semantic server, which can be found at http://semanticweb.cs.vu.nl/verrijktkoninkrijk/flint/ . (login: sparqltester, ww: sparqltester).

Query1: GeoNames. Get all paragrahs containing references to a place in the Dutch Province “Noord Holland”:

PREFIX niod: <http://purl.org/collections/nl/niod/&gt;
prefix dc:   <http://purl.org/dc/elements/1.1/&gt;
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;

SELECT DISTINCT ?subj ?bc ?par
WHERE  {
?subj <http://www.geonames.org/ontology#parentADM1&gt; <http://sws.geonames.org/2749879/&gt;.
?bc skos:closeMatch ?subj.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100

Edit 3 oct: I continued experimenting with some other SPARQL queries and used Willem van Hage and Tomi Kauppinen’s excellent SPARQl package for R to do some quick-and-dirty statistical analysis. I used a variant of  the query above, but with the province as a variable. I put the results in a pie chart showing Loe de Jong’s mentions of places found in each of the twelve provinces of the Netherlands.

Frequencies of page references to places in each of the twelve provinces in "Het Koninkrijk"

Frequencies of page references to places in each of the twelve provinces in “Het Koninkrijk”

And if you substitute the predicate ‘parentADM1’ for ‘parentADM2’, you get the frequencies for the individual municipalities:

Frequencies of page references to municipalities in "Het Koninkrijk"

Frequencies of page references to municipalities in “Het Koninkrijk”

I will leave the historical interpretation of these charts to the reader. Note however that a major disclaimer is needed. There are numerous errors in the data, including OCR errors, and concept  mapping errors. I am sure that the municipality ‘Berkelland’ is not as important as it now seems. Also, the data should be normalized by province size to give a better idea of what is going on.

The point is however that -given the linked data- these analyses are ridiculously easy to perform with SPARQL and R.

Query2: NIOD Thesaurus Beeldbank WO2. Get all combinations of BBWO2 images and paragraphs

PREFIX niod: <http://purl.org/collections/nl/niod/&gt;
prefix dc:   <http://purl.org/dc/elements/1.1/&gt;
PREFIX skos: <http://www.w3.org/2004/02/skos/core#&gt;

SELECT DISTINCT ?img ?par
WHERE {
?object dc:subject ?subj ;
dc:relation ?img .
?subj skos:inScheme niod:ConceptScheme.
?subj skos:exactMatch ?bc.
?bc skos:inScheme niod:BotBScheme.
?bc niod:pageRef ?pr.
?pr niod:parRef ?par.
}
limit 100