[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3377929.3398092acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

A surrogate-assisted GA enabling high-throughput ML by optimal feature and discretization selection

Published: 08 July 2020 Publication History

Abstract

Novel lookup-based classification approaches allow machine-learning (ML) to be performed at extremely high classification rates for suitable low-dimensional classification problems. A central aspect of such approaches is the crucial importance placed on the optimal selection of features and discretized feature representations. In this work we propose and study a hybrid-genetic algorithm (hGAm) approach to solve this optimization problem. For the considered problem the fitness evaluation function is expensive, as it entails training a ML classifier with the proposed set of features and representations, and then evaluating the resulting classifier. We have here devised a surrogate problem by casting the feature selection and representation problem as a combinatorial optimization problem in the form of a multiple-choice quadratic knapsack problem (MCQKP). The orders of magnitude faster evaluation of the surrogate problem allows a comprehensive hGAm performance evaluation to be performed. The results show that a suitable trade-off exists at around 5000 fitness evaluations, and the results also provide a characterization of the parameter behaviors as input to future extensions.

References

[1]
Md Asafuddoula, Tapabrata Ray, and Hemant Kumar Singh. 2015. Characterizing Pareto front approximations in many-objective optimization. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation. 607--614.
[2]
Mustafa Avci and Seyda Topaloglu. 2017. A multi-start iterated local search algorithm for the generalized quadratic multiple knapsack problem. Computers & Operations Research 83 (2017), 54--65.
[3]
Atidel Ben Hadj-Alouane and James C Bean. 1997. A genetic algorithm for the multiple-choice integer program. Operations research 45, 1 (1997), 92--101.
[4]
Alain Billionnet and Frédéric Calmels. 1996. Linear programming for the 0--1 quadratic knapsack problem. European Journal of Operational Research 92, 2 (1996), 310--325.
[5]
Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5--32.
[6]
Yuning Chen and Jin-Kao Hao. 2016. Memetic search for the generalized quadratic multiple knapsack problem. IEEE Transactions on Evolutionary Computation 20, 6 (2016), 908--923.
[7]
Igor Crévits, Saïd Hanafi, Raïd Mansi, and Christophe Wilbaut. 2012. Iterative semi-continuous relaxation heuristics for the multiple-choice multidimensional knapsack problem. Computers & Operations Research 39, 1 (2012), 32--41.
[8]
Martin E Dyer. 1984. An O (n) algorithm for the multiple-choice knapsack linear program. Mathematical programming 29, 1 (1984), 57--63.
[9]
Chun-Nan Lu et al. 2016. High performance traffic classification based on message size sequence and distribution. Journal of Network and Computer Applications 76 (2016), 60--74.
[10]
Giorgio Gallo, Peter L Hammer, and Bruno Simeone. 1980. Quadratic knapsack problems. In Combinatorial optimization. Springer, 132--149.
[11]
Johan Garcia and Topi Korhonen. 2018. Efficient Distribution-Derived Features for High-Speed Encrypted Flow Classification. In Proceedings of the SIGCOMM 2018 Workshop on Network Meets AI & ML (NetAI'18). ACM, 21--27.
[12]
Johan Garcia and Topi Korhonen. 2019. On Runtime and Classification Performance of the Discretize-Optimize (DISCO) Classification Approach. ACM SIGMETRICS Performance Evaluation Review 46, 3 (2019), 167--170.
[13]
Johan Garcia and Topi Korhonen. 2020. DIOPT: Extremely Fast Classification Using Lookups and Optimal Feature Discretization. In International Joint Conference on Neural Networks (IJCNN).
[14]
Johan Garcia, Topi Korhonen, Ricky Andersson, and Filip Västlund. 2018. Towards video flow classification at a million encrypted flows per second. In 2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA). IEEE, 358--365.
[15]
Isabelle Guyon, Jason Weston, Stephen Barnhill, and Vladimir Vapnik. 2002. Gene selection for cancer classification using support vector machines. Machine learning 46, 1-3 (2002), 389--422.
[16]
Md Monirul Kabir, Md Shahjahan, and Kazuyuki Murase. 2011. A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 74, 17 (2011), 2914--2928.
[17]
Hans Kellerer, Ulrich Pferschy, and David Pisinger. 2004. Knapsack problems. Springer, Berlin.
[18]
Lucas Létocart, Marie-Christine Plateau, and Gérard Plateau. 2014. An efficient hybrid heuristic method for the 0-1 exact kitem quadratic knapsack problem. Pesquisa Operacional 34, 1 (2014), 49--72.
[19]
Lucas Létocart and Angelika Wiegele. 2016. Exact solution methods for the k-item quadratic knapsack problem. In International Symposium on Combinatorial Optimization. Springer, 166--176.
[20]
Yanfeng Liu, Aimin Zhou, and Hu Zhang. 2018. Termination detection strategies in evolutionary algorithms: a survey. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, 1063--1070.
[21]
PTH Nguyen and D Sudholt. 2018. Memetic algorithms beat evolutionary algorithms on the class of hurdle problems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2018). ACM.
[22]
G Pavai and TV Geetha. 2016. A survey on crossover operators. ACM Computing Surveys (CSUR) 49, 4 (2016), 1--43.
[23]
David Pisinger. 2007. The quadratic knapsack problem: a survey. Discrete applied mathematics 155, 5 (2007), 623--648.
[24]
J. Ross Quinlan. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[25]
Pedro M Santiago del Rio, Dario Rossi, Francesco Gringoli, Lorenzo Nava, Luca Salgarelli, and Javier Aracil. 2012. Wirespeed statistical classification of network traffic on commodity hardware. In Proceedings of the 2012 Internet Measurement Conference. ACM, 65--72.
[26]
Tugba Saraç and Aydin Sipahioglu. 2014. Generalized quadratic multiple knapsack problem and two solution approaches. Computers & Operations Research 43 (2014), 78--89.
[27]
Dan Simon. 2013. Evolutionary optimization algorithms. John Wiley & Sons.
[28]
Prabhakant Sinha and Andris A Zoltners. 1979. The multiplechoice knapsack problem. Operations Research 27, 3 (1979), 503--515.
[29]
Chih-Fong Tsai, William Eberle, and Chi-Yuan Chu. 2013. Genetic algorithms in feature and instance selection. Knowledge-Based Systems 39 (2013), 240--247.
[30]
Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked Science in Machine Learning. SIGKDD Explorations 15, 2 (2013), 49--60.
[31]
Haibo Wang, Gary Kochenberger, and Fred Glover. 2012. A computational study on the quadratic knapsack problem with multiple constraints. Computers & Operations Research 39, 1 (2012), 3--11.
[32]
Yi-Leh Wu, Cheng-Yuan Tang, Maw-Kae Hor, and Pei-Fen Wu. 2011. Feature selection using genetic algorithm and cluster validation. Expert Systems with Applications 38, 3 (2011), 2727--2732.

Cited By

View all
  • (2021)Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates2021 12th International Conference on Network of the Future (NoF)10.1109/NoF52522.2021.9609828(1-9)Online publication date: 6-Oct-2021

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '20: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion
July 2020
1982 pages
ISBN:9781450371278
DOI:10.1145/3377929
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GA
  2. discretization
  3. feature selection
  4. surrogate problem

Qualifiers

  • Research-article

Funding Sources

  • Stiftelsen för Kunskaps- och Kompetensutveckling

Conference

GECCO '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Building Efficient Regular Expression Matchers Through GA Optimization With ML Surrogates2021 12th International Conference on Network of the Future (NoF)10.1109/NoF52522.2021.9609828(1-9)Online publication date: 6-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media