Linkify: A Web-Based Collaborative Content Tagging System for Machine Learning Algorithms
Soares, Dante Mattos de Salles
Master of Science
Automated tutoring systems that use machine learning algorithms are a relatively new development which promises to revolutionize education by providing students on a large scale with an experience that closely resembles one-on-one tutoring. Machine learning algorithms are essential for these systems, as they are able to perform, with fairly good results, certain data processing tasks that have usually been considered difficult for artificial intelligence. However, the high performance of several machine learning algorithms relies on the existence of information about what is being processed in the form of tags, which have to be manually added to the content. Therefore, there is a strong need today for tagged educational resources. Unfortunately, tagging can be a very time-consuming task. Proven strategies for the mass tagging of content already exist: collaborative tagging systems, such as Delicious, StumbleUpon and CiteULike, have been growing in popularity in recent years. These websites allow users to tag content and browse previously tagged content that is relevant to the user’s interests. However, attempting to apply this particular strategy towards educational resource tagging presents several problems. Tags for educational resources to be used in tutoring systems need to be highly accurate, as mistakes in recommending or assigning material to students can be very detrimental to their learning, so ideally subject-matter experts would perform the resource tagging. The issue with hiring experts is that they can sometimes be not only scarce but also expensive, therefore limiting the number of resources that could potentially be tagged. Even if non-experts are used, another issue arises from the fact that a large user base would be required to tag large amounts of resources, and acquiring large numbers of users can be a challenge in itself. To solve these problems, we present Linkify, a system that allows the more accurate tagging of large amounts of educational resources by combining the efforts of users with certain existing machine learning algorithms that are also capable of tagging resources. This thesis will discuss Linkify in detail, presenting its database structure and components, and discussing the design choices made during its development. We will also discuss a novel model for tagging errors based on a binary asymmetric channel. From this model, we derive an EM algorithm which can be used to combine tags entered into the Linkify system by multiple users and machine learning algorithms, producing the most likely set of relevant tags for each given educational resource. Our goal is to enable automated tutoring systems to use this tagging information in the future in order to improve their capability of assessing student knowledge and predicting student performance. At the same time, Linkify’s standardized structure for data input and output will facilitate the development and testing of new machine learning algorithms.