More Web Proxy on the site http://driver.im/

tutorial

Faster, Simpler, More Accurate: Practical Automated Machine Learning with Tabular, Text, and Image Data

Authors:

Alexander SmolaAuthors Info & Claims

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 3509 - 3510

https://doi.org/10.1145/3394486.3406706

Published: 20 August 2020 Publication History

Abstract

Automated machine learning (AutoML) offers the promise of translating raw data into accurate predictions with just a few lines of code. Rather than relying on human time/effort and manual experimentation, models can be improved by simply letting the AutoML system run for more time. In this hands-on tutorial, we demonstrate fundamental techniques that enable powerful AutoML. We consider standard supervised learning tasks on various types of data including tables, text, images, as well as multi-modal data comprised of multiple types. Rather than technical descriptions of how individual ML models work, we emphasize how to best use models within an overall ML pipeline that takes in raw training data and outputs pre-dictions for test data. A major focus of our tutorial is on automating deep learning, a class of powerful techniques that are cumbersome to manage manually. Despite this, hardly any educational material describes their successful automation. Each topic covered in the tutorial is accompanied by a hands-on Jupyter notebook that implements best practices (which will be available on Github before and after the tutorial). Most of this code is adopted from AutoGluon (autogluon.mxnet.io), a recent AutoML toolkit for automated deep learning that is both state-of-the-art and easy-to-use.

References

[1]

2019. AutoGluon: AutoML Toolkit for Deep Learning. https://github.com/ awslabs/autogluon/.

[2]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In ICLR.

[3]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, Vol. 5 (2017), 135--146.

[4]

CS231n. 2019. Transfer Learning. http://cs231n.github.io/transfer-learning/.

[5]

Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, and Quoc V Le. 2019. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 113--123.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[7]

Thomas G Dietterich. 2000. Ensemble methods in machine learning. In International workshop on multiple classifier systems. Springer, 1--15.

Digital Library

[8]

Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. 2018. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363 (2018).

[9]

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. 2020. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv preprint arXiv:2003.06505 (2020).

[10]

Rasool Fakoor, Jonas Mueller, Nick Erickson, Pratik Chaudhari, and Alexander J Smola. 2020. Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation. arXiv preprint arXiv:2006.14284 (2020).

[11]

Matthias Feurer, Jost Tobias Springenberg, and Frank Hutter. 2014. Using meta-learning to initialize bayesian optimization of hyperparameters. In International Conference on Meta-learning and Algorithm Selection, Vol. 1201. 3--10.

[12]

Cheng Guo and Felix Berkhahn. 2016. Entity Embeddings of Categorical Variables. arXiv preprint arXiv:1604.06737 (2016).

[13]

Jian Guo, He He, Tong He, Leonard Lausen, Mu Li, Haibin Lin, Xingjian Shi, Chenguang Wang, Junyuan Xie, Sheng Zha, et almbox. 2020. GluonCV and GluonNLP: Deep learning in computer vision and natural language processing. Journal of Machine Learning Research, Vol. 21, 23 (2020), 1--7.

[14]

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019 a. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 558--567.

[15]

Xin He, Kaiyong Zhao, and Xiaowen Chu. 2019 b. AutoML: A Survey of the State-of-the-Art. arXiv preprint arXiv:1908.00709 (2019).

[16]

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2018. Automated Machine Learning: Methods, Systems, Challenges. https://www.automl.org/book/.

[17]

Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1946--1956.

Digital Library

[18]

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively parallel hyperparameter tuning. arXiv preprint arXiv:1810.05934 (2018).

[19]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.

[20]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, Vol. 12 (2011), 2825--2830.

Digital Library

[21]

Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient Neural Architecture Search via Parameters Sharing. In Proceedings of the 35th International Conference on Machine Learning, Vol. 80. PMLR, 4095--4104.

[22]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE, Vol. 104, 1 (2015), 148--175.

[23]

Leslie N Smith. 2018. A disciplined approach to neural network hyper-parameters: Part 1-learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820 (2018).

[24]

Yue Sun, Chongruo Wu, Zhongyue Zhang, Tong He, Jonas Mueller, and Hang Zhang. 2020. Image Classification on Kaggle using AutoGluon. https://medium.com/@zhanghang0704/image-classification-on-kaggle-using-autogluon-fc896e74d7e8.

[25]

Anh Truong, Austin Walters, Jeremy Goodsitt, Keegan Hines, Bayan Bruss, and Reza Farivar. 2019. Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools. arXiv preprint arXiv:1908.05557 (2019).

[26]

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. 2019. Dive into Deep Learning. http://www.d2l.ai.

[27]

Marc-André Zöller and Marco F Huber. 2019. Benchmark and Survey of Automated Machine Learning Frameworks. arXiv preprint arXiv:1904.12054 (2019).

Cited By

Liu AChen YCheng X(2024)Evaluating ICESat-2 and GEDI with Integrated Landsat-8 and PALSAR-2 for Mapping Tropical Forest Canopy HeightRemote Sensing10.3390/rs1620379816:20(3798)Online publication date: 12-Oct-2024
https://doi.org/10.3390/rs16203798
Daniel C(2024)Comparative Analysis of Automated Machine Learning and Optimized Conventional Machine Learning for Concrete’s Uniaxial Compressive Strength PredictionAdvances in Civil Engineering10.1155/adce/34036772024:1Online publication date: 18-Dec-2024
https://doi.org/10.1155/adce/3403677
Zhu HLiang SHu WLi FYuan YWang SCheng G(2024)Improve Deep Forest with Learnable Layerwise Augmentation Policy SchedulesICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446501(6660-6664)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446501
Show More Cited By

Index Terms

Faster, Simpler, More Accurate: Practical Automated Machine Learning with Tabular, Text, and Image Data
1. Computing methodologies
  1. Machine learning
  2. Modeling and simulation
    1. Model development and analysis
2. Social and professional topics
  1. Professional topics
    1. Computing and business
      1. Automation

Recommendations

Multimodal AutoML for Image, Text and Tabular Data
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Automated machine learning (AutoML) offers the promise of translating raw data into accurate predictions without the need for significant human effort, expertise, and manual experimentation. In this lecture-style tutorial, we demonstrate fundamental ...
Autonomous Learning Rate Optimization for Deep Learning
Learning and Intelligent Optimization
Abstract
A significant question in deep learning is: what should that learning rate be? The answer to this question is often tedious and time consuming to obtain, and a great deal of arcane knowledge has accumulated in recent years over how to pick and ...
Deep learning: systematic review, models, challenges, and research directions
Abstract
The current development in deep learning is witnessing an exponential transition into automation applications. This automation transition can provide a promising framework for higher performance and lower complexity. This ongoing transition ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

August 2020

3664 pages

ISBN:9781450379984

DOI:10.1145/3394486

General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 August 2020

Check for updates

Author Tags

Qualifiers

Tutorial

Conference

KDD '20

Sponsor:

KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

July 6 - 10, 2020

CA, Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
523
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu AChen YCheng X(2024)Evaluating ICESat-2 and GEDI with Integrated Landsat-8 and PALSAR-2 for Mapping Tropical Forest Canopy HeightRemote Sensing10.3390/rs1620379816:20(3798)Online publication date: 12-Oct-2024
https://doi.org/10.3390/rs16203798
Daniel C(2024)Comparative Analysis of Automated Machine Learning and Optimized Conventional Machine Learning for Concrete’s Uniaxial Compressive Strength PredictionAdvances in Civil Engineering10.1155/adce/34036772024:1Online publication date: 18-Dec-2024
https://doi.org/10.1155/adce/3403677
Zhu HLiang SHu WLi FYuan YWang SCheng G(2024)Improve Deep Forest with Learnable Layerwise Augmentation Policy SchedulesICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446501(6660-6664)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446501
Liu WLi JWang GWang K(2024)Pavement Safety Characteristics Evaluation Utilizing Crowdsourced Vehicular and Cellular Sensor DataJournal of Transportation Engineering, Part B: Pavements10.1061/JPEODX.PVENG-1486150:3Online publication date: Sep-2024
https://doi.org/10.1061/JPEODX.PVENG-1486
Villarreal-Torres HÁngeles-Morales JCano-Mejía JMejía-Murillo CFlores-Reyes GCruz-Cruz OUrcia-Quispe MPalomino-Márquez MSolar-Jara MEscobedo-Zarzosa R(2023)Comparative analysis of performance of AutoML algorithms: Classification model of payment arrears in students of a private universityICST Transactions on Scalable Information Systems10.4108/eetsis.4550Online publication date: 6-Dec-2023
https://doi.org/10.4108/eetsis.4550
Trifan MIonescu BIonescu D(2023)A Combined Finite State Machine and PlantUML Approach to Machine Learning Applications2023 IEEE 17th International Symposium on Applied Computational Intelligence and Informatics (SACI)10.1109/SACI58269.2023.10158543(000631-000636)Online publication date: 23-May-2023
https://doi.org/10.1109/SACI58269.2023.10158543
Nayak GFriedland G(2023)Deep Layers Beware: Unraveling the Surprising Benefits of JPEG Compression for Image Classification Pre-processing2023 IEEE International Symposium on Multimedia (ISM)10.1109/ISM59092.2023.00033(182-185)Online publication date: 11-Dec-2023
https://doi.org/10.1109/ISM59092.2023.00033
Schubert DGupta PWever M(2023)Meta-learning for Automated Selection of Anomaly Detectors for Semi-supervised DatasetsAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_31(392-405)Online publication date: 1-Apr-2023
https://doi.org/10.1007/978-3-031-30047-9_31
Nasrin TPourkamali‐Anaraki FPeterson A(2023)Application of machine learning in polymer additive manufacturing: A reviewJournal of Polymer Science10.1002/pol.2023064962:12(2639-2669)Online publication date: 6-Dec-2023
https://doi.org/10.1002/pol.20230649
Wang CWu QLiu XQuintanilla LZhang ARangwala H(2022)Automated Machine Learning & Tuning with FLAMLProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3542636(4828-4829)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3542636
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents