End-user feature labeling: supervised and semi-supervised approaches based on locally-weighted logistic regression

Das, S., Moore, T., Wong, W.-K., Stumpf, S. , Oberst, I., McIntosh, K. and Burnett, M. (2013) End-user feature labeling: supervised and semi-supervised approaches based on locally-weighted logistic regression. Artificial Intelligence, 204, pp. 56-74. (doi: 10.1016/j.artint.2013.08.003)

Full text not currently available from Enlighten.

Abstract

When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions—especially in early stages when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally-weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances. We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real-world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning. Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Stumpf, Dr Simone
Authors: Das, S., Moore, T., Wong, W.-K., Stumpf, S., Oberst, I., McIntosh, K., and Burnett, M.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Artificial Intelligence
Publisher:Elsevier
ISSN:0004-3702
ISSN (Online):1872-7921
Published Online:30 August 2013

University Staff: Request a correction | Enlighten Editors: Update this record