MSc Project: The Implications of Using Linked Data when Connecting Heterogeneous User Information

[This post describes Karl Lundfall‘s MSc Thesis research and is adapted from his thesis]

sms phoneIn the realm of database technologies, the reign of SQL is slowly coming to an end with the advent of many NoSQL (Not Only SQL) alternatives. Linked Data in the form of RDF is one of these, and is regarded to be highly effective when connecting datasets. In this thesis, we looked into how the choice of database can affect the development, maintenance, and quality of a product by revising a solution for the social enterprise Text to Change Mobile (TTC).

TTC is a non-governmental organization equipping customers in developing countries with high-quality information and important knowledge they could not acquire for themselves. TTC offers mobile-based solutions such as SMS and call services and focuses on projects implying a social change coherent with the values shared by the company.

We revised a real-world system for linking datasets based on a much more mainstream NoSQL technology, and by altering the approach to instead use Linked Data. The result (see the figure on the left) was a more modular system living up to many of the promises of RDF.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

On the other hand, we also found that there for this use case are some obstacles in adopting Linked Data. We saw indicators that more momentum needs to build up in order for RDF to gradually mature enough to be easily applied on use cases like this. The implementation we present and demonstrates a different flavor of Linked Data than the common scenario of publishing data for public reuse, and by applying the technology in business contexts we might be able to expand the possibilities of Linked Data.

As a by-product of the research, a Node.js module for Prolog communication with Cliopatria was developed and made available at https://www.npmjs.com/package/prolog-db . This module might illustrate that new applications usingRDF could contribute in creating a snowball effect of improved quality in RDF-powered applications attracting even more practitioners.

Read more in Karl’s MSc. Thesis 

Advertisements

Dutch Ships and Sailors in E-Data and Research magazine

DSS article in E-Data and ResearchThis year’s third issue of E-Data and Research magazine features an article about the Dutch Ships and Sailors project. The article (in Dutch) describes how our project provides new ways of interacting with Dutch maritime data. So far, four datasets are present in the DSS data cloud but we are currently extending the dataset with two new datasets. More on that later…Me presenting DSS and Dive

In the same issue, there is an article about the workshop around newspaper data as provided by the National Library. This includes a picture of me presenting the DIVE project.

You can read these articles and much more more in the june 2015 issue of E-Data and Research.  And the backlog at www.edata.nl.

2nd VU ICT4D symposium “Data for Development”

2015-05-22 11.39.40Today, the second international VU symposium in ICT for Development was held. As last year, the workshop was a great success, with an international host of speakers and a variety of attendees (around 80 people joined).This year’s symposium we looked at the opportunities and challenges for “Data for Development” from many angles. In his keynote speech, Gayo Diallo from Unversite de Bordeaux elaborated on how data from mobile telephony providers was used to identify issues with access to health care in Senegal. Marije Geldof discussed the success and difficulties in using mobile data services for assisting health workers in Malawi.2015-05-22 10.06.40 After these longer presentations, a series of duo-presentations were held. In the first the concept of upscaling and downscaling (big) data sharing solutions was discussed (Hans Akkermans and Christophe Gueret). In the second duo-presentation we heard from two Amsterdam-based organizations on the use of Open Data for aid transparency (Rolf Kleef) and how to connect data from different mobile projects (Karl Lundfall). The final duo-presentation featured Cheah Waishiang on how to connect to local communities using ICT in Malaysia and Chris van Aart who described the approach of the App-developer. Myrthe van der Wekken and Gossa Lo presented their research on Knowledge Sharing for the Rural Poor through a quick pitch and two very nice posters (see also their reports 1 and 2) .2015-05-22 11.52.14 All in all, the symposium showed that in every stage of the data value chain, there is progress being made in the development context. However, there are enormous challenges to be overcome at each stage as well. Enough to work on for a next installment of this yearly symposium series. You can watch the entire symposium through the embedded video below (3 hrs). Below the video you can see the list of speakers and the different timestamps in the video when their talk starts (clicking on the link will open in new window).

  • Gayo Diallo – Université de Bordeaux, Bordeaux, FR “Mobile Data in Senegal, a Health Decision Enabler” (6.58)
  • Marije Geldof – ICT4D professional The Hague, NL “‘Mobile health and the role of data in Malawi’” (45.05)
  • Hans Akkermans – The Network Institute, VU Amsterdam, NL, “Community-centric Data Services (1.12.00) for Social & Economic Development in Africa”
  • Christophe Guéret – DANS-KNAW The Hague, NL “Downscaling the (Semantic) Web: Decentralized Linked Open Data for World Citizens” (1.22.40)
  • Rolf Kleef – Open for Change, NL “Open Data for Development Agencies” (2.04.30)
  • Karl Lundfall – Text2Change, NL “Integration of Data Sources for Development” (2.15.18)
  •  Cheah Waishiang – Universiti Malaysia Sarawak, Malaysia “Empowering & knowledge through digital storytelling in Borneo, Sarawak, Malaysia” (2.28.26)
  •  Chris van Aart – 2CoolMonkeys, Utrecht, NL “Mr. Meteo, Weather forecasts for African farmers” (2.41.30)

DownScale 2013 workshop

DOWNSCALE 2013, the 2nd international workshop on downscaling the Semantic Web was held on 19-9-2013 in Geneva, Switzerland and was co-located with the Open Knowledge Conference 2013. The workshop seeks to provide first steps in exploring appropriate requirements, technologies, processes and applications for the deployment of Semantic Web technologies in constrained scenarios, taking into consideration local contexts. For instance, making Semantic Web platforms usable under limited computing power and limited access to Internet, with context-specific interfaces.

Downscale group picture

Downscale group picture

The workshop accepted three full papers after peer-review and featured five invited abstracts. in his keynote speech, Stephane Boyera of SBC4D gave a very nice overview of the potential use of Semantic Web for Social & Economic Development. The accepted papers and abstracts can be found in the  downscale2013 proceedings, which will also appear as part of the OKCon 2013 Open Book.

 

We broadcast the whole workshop live on the web, and you can actually watch the whole thing (or fragments) via the embedded videos below.


 

After the presentations, we had fruitful discussions about the main aspects of ‘downscaling’. The consensus seemed to be that Downscaling involved the investigation and usage of Semantic Web technologies and Linked Data principles to allow for data, information and knowledge sharing in circumstances where ‘mainstream’ SW and LD is not feasible or simply does not work. These circumstances can be because of cultural, technical or physical limitations or because of natural or artificial limitations.

bb_1

The figure  illustrates a first attempt to come to a common architecture. It includes three aspects that need to be considered when thinking about data sharing in exceptional circumstances:

  1. Hardware/ Infrastructure. This aspect includes issues with connectivity, low resource hardware, unavailability, etc.
  2.  Interfaces. This concerns the design and development of appropriate interfaces with respect to illiteracy of users or their specific usage. Building human-usable interfaces is a more general issue for Linked data.
  3. Pragmatic semantics. Developing LD solutions that consider which information is relevant in which (cultural) circumstances is crucial to its success. This might include filtering of information etc.

The right side of the picture illustrates the downscaling stack.

Continue reading

African farmers in E-Data & Research magazine

The October edition of the KNAW’s E-Data and Research magazine features an article submitted by Christophe Gueret, Stefan Schlobach and myself on the need for facilitating data sharing in developing regions. Our submission was rewritten into a nice interview-like article, which you can find on page 8 (and copied below). The article is in Dutch.

For more information, visit http://worldwidesemanticweb.org

Image

The Verrijkt Koninkrijk Hackathon Report

On Friday, March 8th, we organized a Verrijkt Koninkrijk Linked Data Hackathon at the Intertain Lab of VU Amsterdam. The event was co-sponsored by the Network Institute. The goal of the hackathon was to allow third party developers to produce (ideas for) innovative applications beyond the Verrijkt Koninkrijk core research questions. We especially encouraged the use of the Linked Data produced in the project.

image015

As organizers, we are very happy with the produced prototypes. The benefits are following:

  • The produced applications show the (unexpected) reusability of the VK (Linked) Open Data. The applications produced or suggested give new browsing opportunities, links to other datasets or show how the data can be used in a completely novel context.The hackathon revealed that indeed the data is usable for external developers using the documentation provided. Some bugs were found, some of which could be fixed during the hackathon.
  • Important concepts around data quality were articulated by the users. Although it falls outside of the scope of this project, subsequent curation of the dat should involve considering ways of allowing experts or amateurs to correct errors in the data.
  •  The VK project data is made known to researchers and developers from related projects, for example that of Agora or BiographyNed. We expect that this ensures future use of the data by related projects.

We here present short descriptions of what the six hacker teams cooked up. Two prize winners were announced by the jury, for “best use of data” and “coolest app” respectively. The jury consisted of Kees Ribbens and Edwin Klijn from NIOD, Serge ter Braake and Victor de Boer from VU. More photos of the event can be seen at www.few.vu.nl/~vbr240/verrijktkoninkrijk/hackathon/.

TOUR APPLICATION AND TOUCH TABLE DEMO [Niels Ockeloen]  WINNER “COOLEST APP”

image024

Niels used the data from the Named Entity index to create a history browser which allows the user to browse information about WWII on basis of persons, locations, organisations, etc. (the NER classes). For this he reused the Agora Touch demonstrator. When a class is chosen a list of entities is shown with images which are resolved through the alignment with DBpedia. Niels used the LDtogo framework to map the selected data on the API interface of the Agora demo.

VERRIJKT KONINKRIJK ON FACEBOOK [Albert Merono & Wouter Beek] WINNER “BEST USE OF DATA”image016

This group set out to to recreate the network of important people of the Netherlands during WWII and their quotes in fake Facebook profiles, trying to imitate the reality of their time. We feed automatically these streams with the contents of the VK datasets: little Cliopatria and Python snippets retrieve data from SPARQL endpoints, resolve the structured XML texts, extract the quotes and expose them using the Facebook Graph API. View the project on GitHub and see the live demo at  http://www.facebook.com/verrijkt.koninkrijk

INTEGRATION WITH AGORA RIJKSMUSEUM DATA [Lourens van der Meij]
image031Lourens aligned the VK data with that of Agora Rijksmuseumusing the Amalgame alignment tool. This is used to link VK data to RM images using the Rijksmuseum API via http://eculture2.cs.vu.nl:43020/ (results shown here (pdf)) He furthermore started to use the Verrijkt Koninkrijk data to add links to VK from within our AGORA demo that is an event centered browser for the Rijksmuseum content. Very rough results show a AGORA demo entry for Duitsland.

CUBE-BASED BROWSING [Chris van Aart]
image028The application of Chris van Aart shows how the monument data from Vier en Vijf Mei can be browsed using the Cube browser on IOS. THis allows for multi-faceted browsing between Dutch war monuments. By flipping the screen, one can actually look at the RDF data!

MAP LAYERS SHOWING THE LIBERATION OF NIJMEGEN [Michiel van Dijk]
image029Michiel built a web map application showing the liberation of Nijmegen in 1944. 1940s data and current maps scan be superimposed over eachother therefore showing for example what part of the city was damaged during the liberation. Further additions include 17,19 and 20th Century maps. A demo can be seen at www.numagapp.nl An attempt was made to include Vier en Vijf Mei monument data in this dataset

INCONTEXT DATA VISUALISATION [Willem Melder]
image018Willem presented the idea to visualise the VK data using the InContext RDF visualizer for enriched publications. Unfortunately, due to time constraints, Willem did not succeed in getting everything up and running.  [screencast]

 

 

image010

Linked WW II Data made at the OpenCultuurData Hackathon

Image

Michiel and me presenting the result at the hackathon

For OpenCultuurData, I assisted NIOD (Dutch Institute for War Documentation) as an ‘Open Data coach’. For the hackathon, organised 16 june 2012 by hackdeoverheid, NIOD published part of its image archive Beeldbank WO2as open data (see also their datablog). The dataset contains 140.000 images about WW II as well as its metadata. It is accessible through OAI-PMH.

Also for OpenCultuurData, the ‘Nationaal Comité 4 en 5 mei‘ (VVM) presented their database about war monuments as open data (again, see their datablog). This database (available as an XML datadump) contains 3500 monuments, most of which are related to WW II, including the Dam Square Monument.

For the hackathon of 16 June, Michiel Hildebrand and myself decided to take these two datasets and convert them to ‘five star linked data‘.

Conversion

For the conversion, we used the XML to RDF tool enclosed within Cliopatria, VU’s semantic toolset. Using a few rewriting rules, we converted the OAI XML of NIOD’s beeldbankWo2 as well as the XML of 4en5mei to RDF.

  • The NIOD data consists of 2,097,214 RDF triples, using 15 predicates, most of which are Dublin Core metadata fields. The images records are annotated with concepts from the NIOD thesaurus, which is currently under development within the Verrijkt Koninkrijk project .
  • The VVM data set contains 122,233 RDF triples and uses 37 predicates, most of which are specific to the dataset. We mapped these predicates to Dublin Core using subProperty predicates (for example, the 4en5mei:artist predicate is mapped to dc:creator. To be able to map address locations to other data sources, we upgraded addresses from literals to SKOS concepts.

Links

We semi-automatically linked produced the following links:

  • VVM city and community relations to GeoNames instances  (4,124 links)
  • VVM address relations to Amsterdam Museum thesaurus concepts (77 links)
  • NIOD thesaurus concepts to Amsterdam Museum concepts (488 links)
Linked Data graph figure

This Linked Data graph figure shows the two datasets, plus the vocabularies and datasets they link to.

In a previous effort, we produced links betweeb the NIOD thesaurus and a) Cornetto and b) Dutch AAT. The result is shown in the mini-datacloud figure below.

URIs and access

For the datasets, we used PURL URIs. This is mainly a matter of convenience since we do not have direct access to either the NIOD or the VVM web servers. We used the basenames http://purl.org/collection/nl/niod/ and http://purl.org/collection/nl/viervijfmei/. HTTP requests are forwarded to a running instance of Cliopatria at http://semanticweb.cs.vu.nl/pvb. Here, a SPARQL endpoint can also be found.

Below is a list of example URIs:

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept

The link between a 4en5mei monument and an Amsterdam Museum object, through a mapped address concept.

Status and next steps
This represents only a first effort to make a these datasets linked open data. Some issues that we will look at in the near future are:
  • Link evaluation: none of the links were validated, so there is no guarantee of their quality.
  • More links: More possibilities for connecting the datasets remain. These include the enrichment of BeeldbankWO2 dc:coverage fields (to GeoNames) and mappings to Rijksmonumenten, Stadsarchief etc.
  • The NIOD data now lives on two separate Cliopatria servers (one associated with Amsterdam culture data and one with Verrijkt Koninkrijk). These should be merged.
  • We are also looking at use cases for applications that will use this linked data. We hope to submit one to the OpenCultuurData challenge.