More Web Proxy on the site http://driver.im/

research-article

A Human-in-the-loop Attribute Design Framework for Classification

Authors:

Md Abdus Salam,

Saravanan Thirumuruganathan,

Senjuti Basu RoyAuthors Info & Claims

WWW '19: The World Wide Web Conference

Pages 1612 - 1622

https://doi.org/10.1145/3308558.3313547

Published: 13 May 2019 Publication History

Abstract

In this paper, we present a semi-automated, “human-in-the-loop” framework for attribute design that assists human analysts to transform raw attributes into effective derived attributes for classification problems. Our proposed framework is optimization guided and fully agnostic to the underlying classification model. We present an algebra with various operators (arithmetic, relational, and logical) to transform raw attributes into derived attributes and solve two technical problems: (a) the top-k buckets design problem aims at presenting human analysts with k buckets, each bucket containing promising choices of raw attributes that she can focus on only without having to look at all raw attributes; and (b) the top-l snippets generation problem, which iteratively aids human analysts with top-l derived attributes involving an attribute. For the former problem, we present an effective exact bottom-up algorithm that is empowered by pruning capability, as well as random walk based heuristic algorithms that are intuitive and work well in practice. For the latter, we present a greedy heuristic algorithm that is scalable and effective. Rigorous evaluations are conducted involving 6 different real world datasets to showcase that our framework generates effective derived attributes compared to fully manual or fully automated methods.

References

[1]

Rakesh Agarwal, Ramakrishnan Srikant, 1994. Fast algorithms for mining association rules. In Proc. of the 20th VLDB Conference. 487-499.

Digital Library

[2]

Michael R Anderson, Dolan Antenucci, Victor Bittorf, Matthew Burgess, Michael J Cafarella, Arun Kumar, Feng Niu, Yongjoo Park, Christopher Re´, and Ce Zhang. 2013. Brainwash: A Data System for Feature Engineering. In CIDR.

[3]

Michael R Anderson and Michael Cafarella. 2016. Input selection for fast feature engineering. In Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 577-588.

[4]

Senjuti Basu Roy, Ankur Teredesai, Kiyana Zolfaghar, Rui Liu, David Hazel, Stacey Newman, and Albert Marinez. 2015. Dynamic hierarchical classification for patient risk-of-readmission. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1691-1700.

Digital Library

[5]

Jonathan Bragg, Daniel S Weld, 2013. Crowdsourcing multi-label classification for taxonomy creation. In AAAI Conference on Human Computation and Crowdsourcing.

[6]

Sergey Brin, Rajeev Motwani, and Craig Silverstein. 1997. Beyond market baskets: Generalizing association rules to correlations. In Acm Sigmod Record, Vol. 26. ACM, 265-276.

Digital Library

[7]

Xi Chen, Paul N Bennett, Kevyn Collins-Thompson, and Eric Horvitz. 2013. Pairwise ranking aggregation in a crowdsourced setting. In ACM International Conference on Web Search and Data Mining. ACM, 193-202.

Digital Library

[8]

Justin Cheng and Michael S Bernstein. 2015. Flock: Hybrid crowd-machine learning classifiers. In ACM Conference on Computer Supported Cooperative Work & Social Computing. ACM, 600-611.

Digital Library

[9]

Brian Eriksson. 2013. Learning to top-k search using pairwise comparisons. In Artificial Intelligence and Statistics. 265-273.

[10]

Meng Fang, Jie Yin, and Dacheng Tao. 2014. Active Learning for Crowdsourcing Using Knowledge Transfer. In AAAI. 1809-1815.

Digital Library

[11]

Amber Feng, Michael J. Franklin, Donald Kossmann, Tim Kraska, Samuel Madden, Sukriti Ramesh, Andrew Wang, and Reynold Xin. 2011. CrowdDB: Query Processing with the Crowd. PVLDB 4, 12 (2011), 1387-1390.

Digital Library

[12]

Beno&icir;t Fre´nay, Gauthier Doquire, and Michel Verleysen. 2013. Is mutual information adequate for feature selection in regression?Neural Networks 48(2013), 1-7.

Digital Library

[13]

Stephen Guo, Aditya Parameswaran, and Hector Garcia-Molina. 2012. So who won?: dynamic max discovery with the crowd. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 385-396.

Digital Library

[14]

Jeff Heaton. 2016. An empirical analysis of feature engineering for predictive modeling. In SoutheastCon, 2016. IEEE, 1-6.

[15]

Chien-Ju Ho, Shahin Jabbari, and Jennifer W Vaughan. 2013. Adaptive task assignment for crowdsourced classification. In International Conference on Machine Learning. 534-542.

Digital Library

[16]

Muhammad Imran, Carlos Castillo, Ji Lucas, Patrick Meier, and Sarah Vieweg. 2014. AIDR: Artificial intelligence for disaster response. In International Conference on World Wide Web. ACM, 159-162.

Digital Library

[17]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM International Conference on Multimedia. ACM, 675-678.

Digital Library

[18]

James Max Kanter and Kalyan Veeramachaneni. 2015. Deep feature synthesis: Towards automating data science endeavors. In IEEE International Conference on Data Science and Advanced Analytics. IEEE, 1-10.

[19]

Haim Kaplan, Ilia Lotosh, Tova Milo, and Slava Novgorodov. 2013. Answering Planning Queries with the Crowd. In PVDLB.

Digital Library

[20]

Gilad Katz, Eui Chul Richard Shin, and Dawn Song. 2016. Explorekit: Automatic feature generation and selection. In Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE, 979-984.

[21]

Asif R Khan and Hector Garcia-Molina. 2014. Hybrid strategies for finding the max with the crowd: technical report. Technical Report. Stanford InfoLab.

[22]

Udayan Khurana, Deepak Turaga, Horst Samulowitz, and Srinivasan Parthasrathy. 2016. Cognito: Automated feature engineering for supervised learning. In IEEE International Conference on Data Mining Workshops. IEEE, 1304-1307.

[23]

Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, and Oznur Alkan. 2017. One button machine for automating feature engineering in relational databases. arXiv preprint arXiv:1706.00327(2017).

[24]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.

[25]

Wentian Li. 1990. Mutual information functions versus correlation functions. Journal of statistical physics 60, 5-6 (1990), 823-837.

[26]

Mokshay Madiman. 2008. On the entropy of sums. In Information Theory Workshop, 2008. ITW'08. IEEE. IEEE, 303-307.

[27]

Adam Marcus, David Karger, Samuel Madden, Robert Miller, and Sewoong Oh. 2012. Counting with the crowd. Proceedings of the VLDB Endowment 6, 2, 109-120.

Digital Library

[28]

Adam Marcus, Eugene Wu, David Karger, Samuel Madden, and Robert Miller. 2011. Human-powered sorts and joins. Proceedings of the VLDB Endowment 5, 1 (2011), 13-24.

Digital Library

[29]

Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, and Samuel Madden. 2014. Scaling up crowd-sourcing to very large datasets: a case for active learning. Proceedings of the VLDB Endowment 8, 2 (2014), 125-136.

Digital Library

[30]

Aditya Parameswaran, Stephen Boyd, Hector Garcia-Molina, Ashish Gupta, Neoklis Polyzotis, and Jennifer Widom. 2014. Optimal crowd-powered rating and filtering algorithms. Proceedings of the VLDB Endowment 7, 9 (2014), 685-696.

Digital Library

[31]

Aditya G Parameswaran, Hector Garcia-Molina, Hyunjung Park, Neoklis Polyzotis, Aditya Ramesh, and Jennifer Widom. 2012. Crowdscreen: Algorithms for filtering data with humans. In ACM SIGMOD International Conference on Management of Data. ACM, 361-372.

Digital Library

[32]

Hyunjung Park and Jennifer Widom. 2013. Query optimization over crowdsourced data. Proceedings of the VLDB Endowment 6, 10, 781-792.

Digital Library

[33]

Thomas Pfeiffer, Xi Alice Gao, Yiling Chen, Andrew Mao, and David G Rand. 2012. Adaptive Polling for Information Aggregation. In AAAI.

Digital Library

[34]

Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina, and Alon Halevy. 2014. Crowd-powered find algorithms. In IEEE International Conference on Data Engineering. IEEE, 964-975.

[35]

Frank Seide, Gang Li, Xie Chen, and Dong Yu. 2011. Feature engineering in context-dependent deep neural networks for conversational speech transcription. In IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 24-29.

[36]

Micah J Smith, Roy Wedge, and Kalyan Veeramachaneni. 2017. FeatureHub: Towards collaborative data science. In IEEE International Conference on Data Science and Advanced Analytics. IEEE, 590-600.

[37]

Chong Sun, Narasimhan Rampalli, Frank Yang, and AnHai Doan. 2014. Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proceedings of the VLDB Endowment 7, 13 (2014), 1529-1540.

Digital Library

[38]

Jorge R Vergara and Pablo A Este´vez. 2014. A review of feature selection methods based on mutual information. Neural computing and applications 24, 1 (2014), 175-186.

[39]

Norases Vesdapunt, Kedar Bellare, and Nilesh Dalvi. 2014. Crowdsourcing algorithms for entity resolution. Proceedings of the VLDB Endowment 7, 12 (2014), 1071-1082.

Digital Library

[40]

Jiannan Wang, Tim Kraska, Michael J Franklin, and Jianhua Feng. 2012. Crowder: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment 5, 11 (2012), 1483-1494.

Digital Library

[41]

Jiannan Wang, Guoliang Li, Tim Kraska, Michael J Franklin, and Jianhua Feng. 2013. Leveraging transitive relations for crowdsourced joins. In ACM SIGMOD International Conference on Management of Data. ACM, 229-240.

Digital Library

[42]

Sibo Wang, Xiaokui Xiao, and Chun-Hee Lee. 2015. Crowd-based deduplication: An adaptive approach. In ACM SIGMOD International Conference on Management of Data. ACM, 1263-1277.

Digital Library

[43]

Steven Euijong Whang, Peter Lofgren, and Hector Garcia-Molina. 2013. Question selection for crowd entity resolution. Proceedings of the VLDB Endowment 6, 6 (2013), 349-360.

Digital Library

[44]

Tingxin Yan, Vikas Kumar, and Deepak Ganesan. 2010. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In International Conference on Mobile Systems, Applications and Services. ACM, 77-90.

Digital Library

[45]

Yan Yan, Romer Rosales, Glenn Fung, and Jennifer G Dy. 2011. Active learning from crowds. In International Conference on Machine Learning, Vol. 11. 1161-1168.

Digital Library

[46]

Guizhen Yang. 2004. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 344-353.

Digital Library

[47]

Peng Ye and David Doermann. 2013. Combining preference and absolute judgements in a crowd-sourced setting. In International Conference on Machine Learning. 1-7.

[48]

Ce Zhang, Arun Kumar, and Christopher Re´. 2016. Materialization optimizations for feature selection workloads. ACM Transactions on Database Systems 41, 1 (2016), 2.

Digital Library

[49]

Jinhong Zhong, Ke Tang, and Zhi-Hua Zhou. 2015. Active Learning from Crowds with Unsure Option. In IJCAI. 1061-1068.

Digital Library

[50]

James Y Zou, Kamalika Chaudhuri, and Adam Tauman Kalai. 2015. Crowdsourcing feature discovery via adaptively chosen comparisons. arXiv preprint arXiv:1504.00064(2015).

Cited By

Yuan HKang LLi YFan Z(2024)Human‐in‐the‐loop machine learning for healthcare: Current progress and future opportunities in electronic health recordsMedicine Advances10.1002/med4.702:3(318-322)Online publication date: 23-Aug-2024
https://doi.org/10.1002/med4.70
Mesbah SArous IYang JBozzon A(2023)HybridEval: A Human-AI Collaborative Approach for Evaluating Design Ideas at ScaleProceedings of the ACM Web Conference 202310.1145/3543507.3583496(3837-3848)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583496
Salam MBasu Roy SDas G(2022)Efficient approximate top-k mutual information based feature selectionJournal of Intelligent Information Systems10.1007/s10844-022-00750-461:1(191-223)Online publication date: 18-Oct-2022
https://doi.org/10.1007/s10844-022-00750-4

Recommendations

Five design challenges for human computation
NordiCHI '10: Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries

Human computation systems, which draw upon human competencies in order to solve hard computational problems, represent a growing interest within HCI. Despite the numerous technical demonstrations of human computation systems, however, there are few ...
A Classification of Noncircular Attribute Grammars Based on the Look-Ahead Behavior

We propose a family of static evaluators for subclasses of the well-defined (i.e., noncircular) attribute grammars. These evaluators augment the evaluator for the absolutely noncircular attribute grammars with look-ahead behaviors. Because this family ...
Designing Category-Level Attributes for Discriminative Visual Recognition
CVPR '13: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition

Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

WWW '19: The World Wide Web Conference

May 2019

3620 pages

ISBN:9781450366748

DOI:10.1145/3308558

Editors:
Ling Liu
Georgia Tech, USA
,
Ryen White
Microsoft Research, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

IW3C2: International World Wide Web Conference Committee

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '19

WWW '19: The Web Conference

May 13 - 17, 2019

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
462
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)5

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yuan HKang LLi YFan Z(2024)Human‐in‐the‐loop machine learning for healthcare: Current progress and future opportunities in electronic health recordsMedicine Advances10.1002/med4.702:3(318-322)Online publication date: 23-Aug-2024
https://doi.org/10.1002/med4.70
Mesbah SArous IYang JBozzon A(2023)HybridEval: A Human-AI Collaborative Approach for Evaluating Design Ideas at ScaleProceedings of the ACM Web Conference 202310.1145/3543507.3583496(3837-3848)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583496
Salam MBasu Roy SDas G(2022)Efficient approximate top-k mutual information based feature selectionJournal of Intelligent Information Systems10.1007/s10844-022-00750-461:1(191-223)Online publication date: 18-Oct-2022
https://doi.org/10.1007/s10844-022-00750-4
Badr YZhu XAlraja M(2021)Security and privacy in the Internet of Things: threats and challengesService Oriented Computing and Applications10.1007/s11761-021-00327-z15:4(257-271)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s11761-021-00327-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents