2nd TMT Workshop in Bamako

2016-05-10 13.14.35.jpg

Kasadaka as presented by AOPP

From 7-9 May 2016, the second TMT-AOPP workshop was held in Bamako, Mali. This workshop was held in the context of the Tailor Made Training project that VU Amsterdam participates in together with the Malinese farmer organization Association des Organisations Professionnelles Paysannes (AOPP).

During the workshop, which was attended by around 25 AOPP members from all over Mali, we followed up on the results of a previous workshop in 2015, where we co-developed a number of use cases around improving the lives of rural farmers in Mali. Specifically, we developed two prototypes services accessible using simple mobile phones:

  1. An online marketplace for seeds. Farmers can call in to the system to place offerings of seeds or browse current offers of seeds of various quality levels in a specific region.
  2. A chicken vaccination service. For this service, an extension worker can register newly born chickens in the system. The system keeps an administration of when farmers need to vaccinate their chickens against specific diseases. The system then calls the farmer and plays a reminder message in his/her language.

2016-05-08 12.03.55.jpgThese services were developed on Kasadaka, the cheap and low-resource rapid-prototyping platform for knowledge-rich and voice-accessible services. During the workshop we were able to further test the Kasadaka in the field. A field trip to local farmers and a milk cooperation in nearby Ouelessebougou gave us further context and information in how these services can support locals (see also the video embedded below). Chris van Aart from 2coolmonkeys demonstrated his progress on the Senepedia wiki and two Android applications that allow farmers and organizers to use geo-services to count cows, trees or other objects in the field.

2016-05-09 09.37.40

Chris van Aart shows his apps

In addition to these two services, we also presented seven services on the Kasadaka, developed by students of the VUA ICT4D M.Sc. course. These included a weather information service, two vetirenary services, general-purpose knowledge sharing platforms, farmer alert services and a milk market. These services were all very well received and allowed the workshop participants to really see the full potential of voice-enabled information services.

The presentation below shows more information, my personal highlights from the workshop (hence the title) as well as feedback received on the seven student projects.

 

MSc Project: The Implications of Using Linked Data when Connecting Heterogeneous User Information

[This post describes Karl Lundfall‘s MSc Thesis research and is adapted from his thesis]

sms phoneIn the realm of database technologies, the reign of SQL is slowly coming to an end with the advent of many NoSQL (Not Only SQL) alternatives. Linked Data in the form of RDF is one of these, and is regarded to be highly effective when connecting datasets. In this thesis, we looked into how the choice of database can affect the development, maintenance, and quality of a product by revising a solution for the social enterprise Text to Change Mobile (TTC).

TTC is a non-governmental organization equipping customers in developing countries with high-quality information and important knowledge they could not acquire for themselves. TTC offers mobile-based solutions such as SMS and call services and focuses on projects implying a social change coherent with the values shared by the company.

We revised a real-world system for linking datasets based on a much more mainstream NoSQL technology, and by altering the approach to instead use Linked Data. The result (see the figure on the left) was a more modular system living up to many of the promises of RDF.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

Overview of the Linked Data-enabled tool to connect multiple heterogeneous databases developed in the context of this Msc Project.

On the other hand, we also found that there for this use case are some obstacles in adopting Linked Data. We saw indicators that more momentum needs to build up in order for RDF to gradually mature enough to be easily applied on use cases like this. The implementation we present and demonstrates a different flavor of Linked Data than the common scenario of publishing data for public reuse, and by applying the technology in business contexts we might be able to expand the possibilities of Linked Data.

As a by-product of the research, a Node.js module for Prolog communication with Cliopatria was developed and made available at https://www.npmjs.com/package/prolog-db . This module might illustrate that new applications usingRDF could contribute in creating a snowball effect of improved quality in RDF-powered applications attracting even more practitioners.

Read more in Karl’s MSc. Thesis 

MSc. Project: The search for credibility in news articles and tweets

[This post was written by Marc Jacobs and describes his MSc Thesis research]

Nowadays the world does not just rely on traditional news sources like newspapers, television and radio anymore. Social Media, such as Twitter, are claiming their key position here, thanks to the fast publishing speed and large amount of items. As one may suspect, the credibility of this unrated news becomes questionable. My Master thesis focuses on determining measurable features (such as retweets, likes or number of Wikipedia entities) in newsworthy tweets and online news articles.

marc_framework

Credibility framework pyramid


The gathering of the credibility features consisted of two parts: a theoretical and practical part. First, a theoretical credibility framework has been built using recent studies about credibility on the Web. Next, Ubuntu was booted, Python was started, and news articles and tweets, including metadata, were mined. The news items have been analysed, and, based on the credibility framework, features were extracted. Additional information retrieval techniques (website scraping, regular expressions, NLTK, IR-API’s) were used to extract additional features, so the coverage of the credibility framework was extended.

marc_pipeline

The data processing and experimentation pipeline

The last step in this research was to present the features to the crowd in an experimental design, using the crowdsourcing platform Crowdflower. The correlation between a specific feature and the credibility of the tweet or news article has been calculated. The results have been compared to find the differences and similarities between tweets and articles.

The highly correlated credibility features (which include the amount of matches with Wikipedia entries) may be used in the future for the construction of credibility algorithms that automatically assess the credibility of newsworthy tweets or news articles, and, hopefully, adds support to filter reliable news from the impenetrable pile of data on the Internet.

Read all the details in Marc’s thesis

MSc. Project Roy Hoeymans: Effective Recommendation in Knowlegde Portals – the SKYbrary case study

[This post was written by Roy Hoeymans. It describes his MSc. project ]

In this master project, which I have done externally at DNV-GL, I have built a recommender system for knowledge portals. Recommender systems are pieces of software that provide suggestions for related items to a user. My research focuses on the application of a recommender system in knowledge portals. A knowledge portal is an online single point of access to information or knowledge on a specific subject. Examples of knowledge portals are SKYbrary (www.skybrary.aero) or Navipedia (www.navipedia.org).

skybrary logoPart of this project was a case study on SKYbrary, a knowledge portal on the subject of aviation safety. In this project I looked at the types of data that are typically available to knowledge portals. I used user navigation pattern data, which I retrieved via the Google Analytics API, and the text of the articles to create a user-navigation based and a content based algorithm. The user-navigation based algorithm uses an item association formula and the content based algorithm uses a tf-idf weighting scheme to calculate content similarity between articles. Because both types of algorithm have their separate disadvantages, I also developed a hybrid algorithm that combines these two.

Screenshot of the demo application

Screenshot of the demo application

To see which type of algorithm was the most effective, I conducted a survey to the content editors of SKYbrary, who are domain experts on the subject. Each question in the survey showed an article and then recommendations for that article. The respondent was then asked to rate each recommended article on a scale from 1 (completely irrelevant) to 5 (very relevant). The results of the survey showed that the hybrid algorithm algorithm is, which a statistical significant difference, better than a user-navigation based algorithm. A difference between the hybrid algorithm and the content-based algorithm was not found however. Future work might include a more extensive or different type of evaluation.

In addition to the research I have done on the algorithms, I have also developed a demo application in which the content editors of SKYbrary can use to show recommendations for a selected article and algorithm.

For more informaton, view Roy Hoeymans’ Thesis Presentation [pdf] or read the thesis [Academia].