Prof. Raja Presents Poster at Collective Intelligence 2016

POSTED ON: June 2, 2016

Abstract

Crowd-labeling is the process of having non-expert humans label a large dataset - a process that can be tedious, time-consuming and expensive if accomplished by experts alone. Since crowd-labelers are non-experts, multiple labels are acquired for quality assurance, which are later combined to get one final label. Most recent works about crowd-labeling have used Bayesian and non-Bayesian approaches to estimate the parameters and get one final label [Whitehill et al. 2009; Karger et al. 2011; Khattak and Salleb-Aouissi 2013]. Despite the fact that many research using machine learning and statistical techniques has been conducted in this area, e.g., [Dekel and Shamir 2009; Hovy et al. 2013; Liu et al. 2012; Donmez and Carbonell 2008], many questions remain open, and these include: (1) How to get the final label with high accuracy even in the presence of heterogenous quality crowd-labelers? (2) What are the best ways to evaluate labelers? (3) A labeler can be biased towards a specific data-class and hence his error rate for each class can be different. Is it better to consider the per-class error or over all error? (4) Can the prevalence of the classes affect the labeler ability? (5) How can clarity of a labeling task/question affect the accuracy [Kittur et al. 2008]?

To address these questions, we present a Bayesian approach to crowd-labeling. Our approach is inspired by Item Response Theory (IRT) [Lord 1952], that aims to design and analyze test scoring strategies. An IRT model is used to model parameters related to student and test questions as well as the probability of correctness of the answer. This makes IRT a compelling framework for crowd-labeling.

We model labeler and data instance related parameters as well as probability of correctness of the provided label to the instance. The main difference with the IRT model is that the correct answers are known while in the crowd labeling scenario the answers are to be inferred. Unlike an IRT model, our model learns the parameters and utilizes them to estimate the final labels. Empirical evaluations on synthetic and real dataset show that our model produces more stable results as compared to the other state-of-the-art crowd labeling methods.

More News from The Cooper Union