MSc. Project: The search for credibility in news articles and tweets

[This post was written by Marc Jacobs and describes his MSc Thesis research]

Nowadays the world does not just rely on traditional news sources like newspapers, television and radio anymore. Social Media, such as Twitter, are claiming their key position here, thanks to the fast publishing speed and large amount of items. As one may suspect, the credibility of this unrated news becomes questionable. My Master thesis focuses on determining measurable features (such as retweets, likes or number of Wikipedia entities) in newsworthy tweets and online news articles.

marc_framework

Credibility framework pyramid


The gathering of the credibility features consisted of two parts: a theoretical and practical part. First, a theoretical credibility framework has been built using recent studies about credibility on the Web. Next, Ubuntu was booted, Python was started, and news articles and tweets, including metadata, were mined. The news items have been analysed, and, based on the credibility framework, features were extracted. Additional information retrieval techniques (website scraping, regular expressions, NLTK, IR-API’s) were used to extract additional features, so the coverage of the credibility framework was extended.

marc_pipeline

The data processing and experimentation pipeline

The last step in this research was to present the features to the crowd in an experimental design, using the crowdsourcing platform Crowdflower. The correlation between a specific feature and the credibility of the tweet or news article has been calculated. The results have been compared to find the differences and similarities between tweets and articles.

The highly correlated credibility features (which include the amount of matches with Wikipedia entries) may be used in the future for the construction of credibility algorithms that automatically assess the credibility of newsworthy tweets or news articles, and, hopefully, adds support to filter reliable news from the impenetrable pile of data on the Internet.

Read all the details in Marc’s thesis

Advertisements

MSc. project: Requirements and design for a Business Intelligence system for SMEs

[This post was written by Arnold Kraakman and describes his MSc Thesis research] .

This master project is written as an advisory report for construction company and contractor K. Dekker B.V. and deals with Business Intelligence. Business Intelligence (BI) is a term that refers to information which can be used to make business decisions. The master thesis answers the question about what options are available for K. Dekker to implement BI within two years from the moment of writing. The research is done through semi-structured interviews and data mining. The interviews are used to gain a requirement list based on feedback the final users and with this list is a concept dashboard made, which could be used by K. Dekker. Having a BI dashboard is one of the solutions about what to do with their information to eventually implement Business Intelligence.

arnoldscr2

concept dashboard – project result in detail

Screenshot #1 shows an overview of the current running project, with the financial forecast. Most interviewees did not know which projects were currently running and done by K. Dekker B.V. Screenshot #2 shows the project characteristics and their financial result, this was the biggest must-have on the requirements list. A construction project has different characteristics, for example a bridge, made in Noord-Holland with a specific tender procedure and a specific contract form (for example: “design the whole project and build it as well” instead of only building it). Those characteristics could influence the final financial profit.

concept dashboard – project overview

concept dashboard – project overview

The thesis includes specific recommendations to K. Dekker to realize BI within two years from now on. This list is also generalized to Small and Medium-sized Enterprises (SMEs). These recommendations include that work instructions are made for ERP software therefore that everyone knows what and how information has to filled into the system. With incorrect entered data, the made decisions on this information could be incorrect as well. It is also recommended to make a project manager responsible for all the entered information. This will lead to better and more correct information and therefore the finally made business decisions are more reliable.

You can download the thesis here: arnold_kraakman_final_thesis

Msc. Project: Linking Maritime Datasets to Dutch Ships and Sailors Cloud – Case studies on Archangelvaart and Elbing

[This post was written by Jeroen Entjes and describes his Msc Thesis research]

The Dutch maritime supremacy during the Dutch Golden Age has had a profound influence on the modern Netherlands and possibly other places around the globe. As such, much historic research has been done on the matter, facilitated by thorough documentation done by many ports of their shipping. As more and more of these documentations are digitized, new ways of exploring this data are created.

screenshot1

Screenshot showing an entry from the Elbing website

This master project uses one such way. Based on the Dutch Ships and Sailors project digitized maritime datasets have been converted to RDF and published as Linked Data. Linked Data refers to structured data on the web that is published and interlinked according to a set of standards. This conversion was done based on requirements for this data, set up with historians from the Huygens ING Institute that provided the datasets. The datasets chosen were those of Archangel and Elbing, as these offer information of the Dutch Baltic trade, the cradle of the Dutch merchant navy that sailed the world during the Dutch Golden Age.

Along with requirements for the data, the historians were also interviewed to gather research questions that combined datasets could help solve. The goal of this research was to see if additional datasets could be linked to the existing Dutch Ships and Sailors cloud and if such a conversion could help solve the research questions the historians were interested in.
Data visualization showing shipping volume of different datasets.

elbing graphAs part of this research, the datasets have been converted to RDF and published as Linked Data as an addition to the Dutch Ships and Sailors cloud and a set of interactive data visualizations have been made to answer the research questions by the historians. Based on the conversion, a set of recommendations are made on how to convert new datasets and add them to the Dutch Ships and Sailors cloud. All data representations and conversions have been evaluated by historians to assess the their effectiveness.

The data visualizations can be found at http://www.entjes.nl/jeroen/thesis/. Jeroen’s thesis can be found here: Msc. Thesis Jeroen Entjes