CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students
<p>Combined neural network models with Bert/Gensim</p> "> Figure 2
<p>Cell structure and equations describing LSTM gates [<a href="#B40-data-09-00074" class="html-bibr">40</a>].</p> "> Figure 3
<p>Diagram of a one-unit Gated Recurrent Unit (GRU) [<a href="#B40-data-09-00074" class="html-bibr">40</a>].</p> "> Figure 4
<p>Illustration of text representation extraction from BERT [<a href="#B46-data-09-00074" class="html-bibr">46</a>].</p> "> Figure 5
<p>The overall process of the experiment.</p> "> Figure 6
<p>(<b>a</b>) The loss function of training and validation of resumes dataset; (<b>b</b>) the accuracy of training and validation of resumes dataset for the model CNN-GRU/BERT with high accuracy.</p> "> Figure 7
<p>Confusion matrix for CV classification using the CNN-GRU/BERT model. We can explain the misclassification of the Computer Engineering with other classes due to the proximity of these categories in the uses of certain skills, in particular hard skills presented with the computer technologies.</p> "> Figure 8
<p>The t-SNE visualization of text representation of resumes using BERT and Gensim embedding methods.</p> ">
Abstract
:1. Introduction
2. Related Works
3. Materials and Methods
3.1. Sampling and Data Collection Method
3.2. Experiment and Problem Definition
3.3. Architecture of the Proposed Solution
3.4. Methods
3.4.1. Long Short-Term Memory (LSTM)
3.4.2. Gated Recurrent Unit (GRU)
3.4.3. The Convolutional Neural Network (CNN)
3.4.4. The Bidirectional Encoder Representations from Transformers (BERT)
3.4.5. Data Loading
- Input: the resumes of students
- Output: A model trained on the CVs of students and one of the five pre-defined classes for each resume in the test dataset
- Import dataset file (CVs.csv) into pandas data frame.
- Pre-process data (cleaning and deleting noisy data).
- Generate one hot encoding for each class representing the field of study.
- Split dataset into two parts, training and testing dataset, with ratio the 80:20, respectively.
- Tokenization step based on either BERT-obtained model or the Gensim embedding approach where the tokenization was based on the unigram mode.
- 5.
- Add new token-related competencies and unknown vocabulary into the vocab.txt of the BERT models.
- 6.
- Create an embedding matrix for every word in the vocabulary
- 7.
- Builda simple model or hybrid model based on a combination of CNN, LSTM, and GRU.
- 8.
- Dropout layer (0.2).
- 9.
- Dense (5 classes) layer with Softmax activation function.
- 10.
- Train the model on the training set.
- 11.
- Evaluate the model on the test set.
3.4.6. Experimental Settings
- Tensorflow and Keras libraries were used;
- Number of LSTM, GRU and CNN(Conv1d) layers: 1;
- Dropout rate: 0.2;
- Activation Function: SoftMax;
- learning_rate = 1 × 10−5;
- decay = 1 × 10−6;
- loss function: CategoricalCrossentropy();
- Learning rate: 0.001;
- Epochs: 15;
- Batch size: 32;
- Optimizer: Adam.
4. Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nichols, J.A.; Chan, H.W.H.; Baker, M.A.B. Machine learning: Applications of artificial intelligence to imaging and diagnosis. Biophys. Rev. 2019, 11, 111–118. [Google Scholar] [CrossRef] [PubMed]
- Kaul, V.; Enslin, S.; Gross, S.A. History of artificial intelligence in medicine. Gastrointest. Endosc. 2020, 92, 807–812. [Google Scholar] [CrossRef] [PubMed]
- Li, Q.; Cai, W.; Wang, X.; Zhou, Y.; Feng, D.D.; Chen, M. Medical image classification with convolutional neural network. In Proceedings of the IEEE 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV), Singapore, 10–12 December 2014; pp. 844–848. [Google Scholar] [CrossRef]
- Yağcı, M. Educational data mining: Prediction of students’ academic performance using machine learning algorithms. Smart Learn. Environ. 2022, 9, 11. [Google Scholar] [CrossRef]
- Nieto, Y.; Gacia-Diaz, V.; Montenegro, C.; Gonzalez, C.C.; Crespo, R.G. Usage of Machine Learning for Strategic Decision Making at Higher Educational Institutions. IEEE Access 2019, 7, 75007–75017. [Google Scholar] [CrossRef]
- Ramteke, J.; Shah, S.; Godhia, D.; Shaikh, A. Election result prediction using Twitter sentiment analysis. In Proceedings of the IEEE 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; pp. 1–5. [Google Scholar] [CrossRef]
- Alaei, A.R.; Becken, S.; Stantic, B. Sentiment Analysis in Tourism: Capitalizing on Big Data. J. Travel Res. 2019, 58, 175–191. [Google Scholar] [CrossRef]
- Golowko, N. Future Skills in Education: Knowledge Management, AI and Sustainability as Key Factors in Competence-Oriented Education; Sustainable Management, Wertschöpfung und Effizienz; Springer Fachmedien Wiesbaden: Wiesbaden, Germany, 2021; ISBN 978-3-658-33996-8. [Google Scholar] [CrossRef]
- Huang, A.Y.Q.; Lu, O.H.T.; Huang, J.C.H.; Yin, C.J.; Yang, S.J.H. Predicting students’ academic performance by using educational big data and learning analytics: Evaluation of classification methods and learning logs. Interact. Learn. Environ. 2020, 28, 206–230. [Google Scholar] [CrossRef]
- Pal, R.; Shaikh, S.; Satpute, S.; Bhagwat, S. Resume Classification using various Machine Learning Algorithms. ITM Web Conf. 2022, 44, 03011. [Google Scholar] [CrossRef]
- Urdaneta-Ponte, M.C.; Oleagordia-Ruíz, I.; Méndez-Zorrilla, A. Using LinkedIn Endorsements to Reinforce an Ontology and Machine Learning-Based Recommender System to Improve Professional Skills. Electronics 2022, 11, 1190. [Google Scholar] [CrossRef]
- Cole, M.S.; Feild, H.S.; Giles, W.F.; Harris, S.G. Recruiters’ Inferences of Applicant Personality Based on Resume Screening: Do Paper People have a Personality? J. Bus. Psychol. 2009, 24, 5–18. [Google Scholar] [CrossRef]
- Kumalasari, L.D.; Susanto, A. Recommendation System of Information Technology Jobs using Collaborative Filtering Method Based on LinkedIn Skills Endorsement. SISFORMA 2020, 6, 63–72. [Google Scholar] [CrossRef]
- Appadoo, K.; Soonnoo, M.B.; Mungloo-Dilmohamud, Z. Job Recommendation System, Machine Learning, Regression, Classification, Natural Language Processing. In Proceedings of the 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), Gold Coast, Australia, 16–18 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Kowsari, K.; Meimandi, J.K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text Classification Algorithms: A Survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep Learning--based Text Classification: A Comprehensive Review. ACM Comput. Surv. 2022, 54, 1–40. [Google Scholar] [CrossRef]
- Sellamy, K.; El Farouki, M.; Sabri, Z.; Nouib, H.; Qostal, A.; Fakhri, Y.; Moumen, A. Exploring the IT’s Needs in Morocco Using Online Job Ads. In Automatic Control and Emerging Technologies; El Fadil, H., Zhang, W., Eds.; Springer Nature: Singapore, 2024; pp. 665–677. [Google Scholar] [CrossRef]
- Đurđević Babić, I. Machine learning methods in predicting the student academic motivation. Croat. Oper. Res. Rev. 2017, 8, 443–461. [Google Scholar] [CrossRef]
- Qazdar, A.; Er-Raha, B.; Cherkaoui, C.; Mammass, D. A machine learning algorithm framework for predicting students performance: A case study of baccalaureate students in Morocco. Educ. Inf. Technol. 2019, 24, 3577–3589. [Google Scholar] [CrossRef]
- Mourdi, Y.; Sadgal, M.; Berrada Fathi, W.; El Kabtane, H. A Machine Learning Based Approach to Enhance Mooc Users’ Classification. Turk. Online J. Distance Educ. 2020, 21, 47–68. [Google Scholar] [CrossRef]
- Sadqui, A.; Ertel, M.; Sadiki, H.; Amali, S. Evaluating Machine Learning Models for Predicting Graduation Timelines in Moroccan Universities. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 10. [Google Scholar] [CrossRef]
- Ouatik, F.O.; Erritali, M.E.; Jourhmane, M.J. Student orientation using machine learning under MapReduce with Hadoop. J. Ubiquitous Syst. Pervasive Netw. 2020, 13, 21–26. [Google Scholar] [CrossRef]
- Qostal, A.; Moumen, A.; Lakhrissi, Y. Systematic Literature Review on Big Data and Data Analytics for Employment of Youth People: Challenges and Opportunities. In Proceedings of the 2nd International Conference on Advanced Technologies for Humanity; SCITEPRESS—Science and Technology Publications: Rabat, Morocco, 2020; pp. 179–185. [Google Scholar] [CrossRef]
- Casuat, C.D.; Festijo, E.D. Predicting Students’ Employability using Machine Learning Approach. In Proceedings of the 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Kuala Lumpur, Malaysia, 20–21 December 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Mewburn, I.; Grant, W.J.; Suominen, H.; Kizimchuk, S. A Machine Learning Analysis of the Non-academic Employment Opportunities for Ph.D. Graduates in Australia. High. Educ. Policy 2020, 33, 799–813. [Google Scholar] [CrossRef]
- ElSharkawy, G.; Helmy, Y.; Yehia, E. Employability Prediction of Information Technology Graduates using Machine Learning Algorithms. Int. J. Adv. Comput. Sci. Appl. 2022, 13. [Google Scholar] [CrossRef]
- Roy, A. Recent Trends in Named Entity Recognition (NER). arXiv 2021. [Google Scholar] [CrossRef]
- Narendra, G.O.; Hashwanth, S. Named Entity Recognition based Resume Parser and Summarizer. Int. J. Adv. Res. Sci. Commun. Technol. 2022, 2, 728–735. [Google Scholar] [CrossRef]
- Gugnani, A.; Misra, H. Implicit Skills Extraction Using Document Embedding and Its Use in Job Recommendation. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13286–13293. [Google Scholar] [CrossRef]
- Fareri, S.; Melluso, N.; Chiarello, F.; Fantoni, G. SkillNER: Mining and mapping soft skills from any text. Expert Syst. Appl. 2021, 184, 115544. [Google Scholar] [CrossRef]
- Casuat, C.D. Predicting Students’ Employability using Support Vector Machine: A SMOTE-Optimized Machine Learning System. Int. J. Emerg. Trends Eng. Res. 2020, 8, 2101–2106. [Google Scholar] [CrossRef]
- Baffa, M.H.; Miyim, M.A.; Dauda, A.S. Machine Learning for Predicting Students’ Employability. UMYU Sci. 2023, 2, 001–009. [Google Scholar] [CrossRef] [PubMed]
- Sun, T.; He, Z. Developing intelligent hybrid DNN model for predicting students’ employability—A Machine Learning approach. J. Educ. Humanit. Soc. Sci. 2023, 18, 235–248. [Google Scholar] [CrossRef]
- Makdoun, I.; Mezzour, G.; Carley, K.M.; Kassou, I. Analyzing the Needs of the Automotive Job Market in Morocco. In Proceedings of the 2018 13th International Conference on Computer Science & Education (ICCSE), Colombo, Sri Lanka, 8–11 August 2018; pp. 1–6. [Google Scholar]
- Habous, A.; Nfaoui, E.H. Combining Word Embeddings and Deep Neural Networks for Job Offers and Resumes Classification in IT Recruitment Domain. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 7. [Google Scholar] [CrossRef]
- Mgarbi, H.; Chkouri, M.; Tahiri, A. Towards a New Job Offers Recommendation System Based on the Candidate Resume. Int. J. Comput. Digit. Syst. 2023, 14, 31–38. [Google Scholar] [CrossRef] [PubMed]
- Qostal, A.; Sellamy, K.; Sabri, Z.; Nouib, H.; Lakhrissi, Y.; Moumen, A. Perceived employability of moroccan engineering students: A PLS-SEM approach. Int. J. Instr. 2024, 17, 259–282. [Google Scholar] [CrossRef]
- Hopfield, J.J. Brain, neural networks, and computation. Rev. Mod. Phys. 1999, 71, S431–S437. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Wu, H.; Zhu, N.; Jiang, Y.; Tan, J.; Guo, Y. Prediction of dissolved oxygen in a fishery pond based on gated recurrent unit (GRU). Inf. Process. Agric. 2021, 8, 185–193. [Google Scholar] [CrossRef]
- Ren, L.; Cheng, X.; Wang, X.; Cui, J.; Zhang, L. Multi-scale Dense Gate Recurrent Unit Networks for bearing remaining useful life prediction. Future Gener. Comput. Syst. 2019, 94, 601–609. [Google Scholar] [CrossRef]
- Nosouhian, S.; Nosouhian, F.; Khoshouei, A.K. A Review of Recurrent Neural Network Architecture for Sequence Learning: Comparison between LSTM and GRU. Preprints 2021, 2021070252. [Google Scholar] [CrossRef]
- O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Alaparthi, S.; Mishra, M. Bidirectional Encoder Representations from Transformers (BERT): A sentiment analysis odyssey. arXiv 2020. [Google Scholar] [CrossRef]
- Subakti, A.; Murfi, H.; Hariadi, N. The performance of BERT as data representation of text clustering. J. Big Data 2022, 9, 15. [Google Scholar] [CrossRef] [PubMed]
- Roy, P.K.; Chowdhary, S.S.; Bhatia, R. A Machine Learning approach for automation of Resume Recommendation system. Procedia Comput. Sci. 2020, 167, 2318–2327. [Google Scholar] [CrossRef]
- Rahhal, I.; Carley, K.M.; Kassou, I.; Ghogho, M. Two Stage Job Title Identification System for Online Job Advertisements. IEEE Access 2023, 11, 19073–19092. [Google Scholar] [CrossRef]
Study | Context | Model | Accuracy |
---|---|---|---|
[24] | Dataset from the career center of technological institute of the Philippines, Manila with 27,000 information of students with 3000 observations and 9 features of each student |
|
|
[31] | Dataset based on mock job interview results with three thousand (3000) observations and twelve (12) features, student performance rating of the on-the-job training students collected |
|
|
[26] | Dataset (296 records) from survey ofgraduates and employers in Egypt oftraining skills, soft skills, andhard skills |
|
|
[32] | Proposed models for predicted performance and students’ employability. Primary datasets of 218 graduate students of higher educational institutions (heis). |
|
|
[33] | Hybrid DNN model for predicting students’ employability using a machine learning approach |
|
|
ENSA Kenitra | Department | Total |
---|---|---|
Department | Computer Engineering (CE) | 263 |
Networks and Systems Telecommunications (NST) | 134 | |
Automotive Mechatronics Engineering (AutoMec) | 149 | |
Industrial Engineering (Indus) | 114 | |
Electrical Engineering (ELE) | 207 | |
Total | 867 |
Type File | Total |
---|---|
Docx | 321 |
308 | |
Png | 123 |
Jpg/Jpeg | 115 |
Text Representation Method | Model | Accuracy | Precision | Recall |
---|---|---|---|---|
BERT | GRU—LSTM | 0.8122 | 0.8995 | 0.8331 |
GRU—CNN | 0.8821 | 0.8722 | 0.8754 | |
LSTM—GRU | 0.8354 | 0.9021 | 0.8463 | |
LSTM—CNN | 0.8531 | 0.8911 | 0.8234 | |
CNN—GRU | 0.9351 | 0.9310 | 0.9411 | |
CNN—LSTM | 0.9242 | 0.9329 | 0.9012 | |
GRU | 0.9013 | 0.9181 | 0.8051 | |
LSTM | 0.8951 | 0.8886 | 0.8125 | |
CNN | 0.9188 | 0.9102 | 0.8963 | |
Gensim | GRU—LSTM | 0.8241 | 0.8542 | 0.7741 |
GRU—CNN | 0.8321 | 0.8669 | 0.7725 | |
LSTM—GRU | 0.8214 | 0.8632 | 0.7921 | |
LSTM—CNN | 0.9025 | 0.8552 | 0.7626 | |
CNN—GRU | 0.8751 | 0.8224 | 0.7995 | |
CNN—LSTM | 0.8423 | 0.8821 | 0.7768 | |
GRU | 0.7742 | 0.8256 | 0.7951 | |
LSTM | 0.8287 | 0.8413 | 0.7858 | |
CNN | 0.9021 | 0.8961 | 0.7551 |
Speciality | Precision | Recall |
---|---|---|
Electrical Engineering (ELE) | 0.975 | 0.951 |
Networks and Systems Telecommunications (NST) | 0.933 | 0.933 |
Computer Engineering (CE) | 0.903 | 0.886 |
Industrial Engineering (Indus) | 0.937 | 0.967 |
Automotive Mechatronics Engineering (AutoMec) | 0.935 | 0.966 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qostal, A.; Moumen, A.; Lakhrissi, Y. CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students. Data 2024, 9, 74. https://doi.org/10.3390/data9060074
Qostal A, Moumen A, Lakhrissi Y. CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students. Data. 2024; 9(6):74. https://doi.org/10.3390/data9060074
Chicago/Turabian StyleQostal, Aniss, Aniss Moumen, and Younes Lakhrissi. 2024. "CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students" Data 9, no. 6: 74. https://doi.org/10.3390/data9060074
APA StyleQostal, A., Moumen, A., & Lakhrissi, Y. (2024). CVs Classification Using Neural Network Approaches Combined with BERT and Gensim: CVs of Moroccan Engineering Students. Data, 9(6), 74. https://doi.org/10.3390/data9060074