[This post was written by Esra Atesçelik. It describes her MSc. project supervised by Antoine Isaac and myself]
The digital libraries and aggregators such as Europeana provide access to millions of Cultural Heritage Objects (CHOs). Europeana is one of the libraries which does not maintain collection-level metadata. Europeana can cluster the objects that have common information with each other. It can use collection-level information to organize results and help users.
In this project we want to show how we can cluster the objects from Europeana datasets. We also aim at finding the best way of clustering on Europeana metadata and the best parametric setting for clustering. We apply various clustering methods on Europeana metadata and aim at proposing a clustering technique that is most appropriate to group Europeana CHOs. In the experiments we evaluated the cluster results manually, on qualitative and quantitative level.
The results of experiments showed that it is difficult to define the best parametric setting and best clustering method only based on a number of experiments. However, we have shown a way to cluster Europeana objects which may be useful for Europeana.
View Esra’s presentation [pdf] and her thesis [pdf]
[This post was written by Andrea Bravo Balado and is cross-posted at her own blog. It describes her MSc. project supervised by myself]
Linking historical datasets and making them available for the Web has increasingly become a subject of research in the field of digital humanities. In the Netherlands, history is intimately related to the maritime activity because it has been essential in the development of economic, social and cultural aspects of Dutch society. As such an important sector, it has been well documented by shipping companies, governments, newspapers and other institutions.
In this master project we assume that, given the importance of maritime activity in every day life in the XIX and XX centuries, announcements on the departures and arrivals of ships or mentions of accidents or other events, can be found in newspapers.
We have taken a two-stage approach: first, an heuristic-based method for record linkage and then machine-learning algorithms for article classification to be used for filtering in combination with domain features. Evaluation of the linking method has shown that certain domain features were indicative of mentions of ships in newspapers. Moreover, the classifier methods scored near perfect precision in predicting ship related articles.
Enriching historical ship records with links to newspaper archives is significant for the digital history community since it connects two datasets that would have otherwise required extensive annotating work and man hours to align. Our work is part of the Dutch Ships and Sailors Linked Data Cloud project. Check out Andrea’s thesis[pdf].
[This post was written by Rianne Nieland. It describes her MSc. project supervised by myself]
People in developing countries cannot access information on the Web, because they have no Internet access and are often low literate. A solution could be to provide voice-based access to data on the Web by using the GSM network.
In my master project I have investigated how to make general-purpose data sets efficiently available using voice interfaces for GSM. To achieve this, I have developed two voice interfaces, one for Wikipedia and one for DBpedia. I have made two voice interfaces with two different kinds of input data sources, namely normal web data and Linked Data, to be able to compare them.
To develop the two voice interfaces, I first did requirements elicitation from literature and developed a user interface and conversion algorithms for Wikipedia and DBpedia concepts. With user tests the users evaluated the two voice interfaces, to be able to compare them on speed, error rate and usability.
[Rianne’s thesis presentation slides can be found on slideshare and is embedded below. Her thesis is attached here: Eindversie-Paper-Rianne-Nieland-2057069]
A fresh start for me! As of July 1st, I work as a researcher at the Netherlands Institute for Sound and Vision (Beeld en Geluid). They have an awesome building, with awesome people and an awesome audiovisual collection. The latter could do with some Semantic Web technology, so that is what I will be working on.
I will keep this space for updates on past, present and future projects.