[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,316)

Search Parameters:
Keywords = NLP

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 1844 KiB  
Article
Using Artificial Intelligence to Support Peer-to-Peer Discussions in Science Classrooms
by Kelly Billings, Hsin-Yi Chang, Jonathan M. Lim-Breitbart and Marcia C. Linn
Educ. Sci. 2024, 14(12), 1411; https://doi.org/10.3390/educsci14121411 - 23 Dec 2024
Abstract
In successful peer discussions students respond to each other and benefit from supports that focus discussion on one another’s ideas. We explore using artificial intelligence (AI) to form groups and guide peer discussion for grade 7 students. We use natural language processing (NLP) [...] Read more.
In successful peer discussions students respond to each other and benefit from supports that focus discussion on one another’s ideas. We explore using artificial intelligence (AI) to form groups and guide peer discussion for grade 7 students. We use natural language processing (NLP) to identify student ideas in science explanations. The identified ideas, along with Knowledge Integration (KI) pedagogy, informed the design of a question bank to support students during the discussion. We compare groups formed by maximizing the variety of ideas among participants to randomly formed groups. We embedded the chat tool in an earth science unit and tested it in two classrooms at the same school. We report on the accuracy of the NLP idea detection, the impact of maximized versus random grouping, and the role of the question bank in focusing the discussion on student ideas. We found that the similarity of student ideas limited the value of maximizing idea variety and that the question bank facilitated students’ use of knowledge integration processes. Full article
Show Figures

Figure 1

Figure 1
<p>Mt. Hood explanation item.</p>
Full article ">Figure 2
<p>Chat grouping logic for the NLP-informed condition and randomized condition.</p>
Full article ">Figure 3
<p>Chat interface and sample student discussion. Students’ initial answers to the Mt. Hood assessment item are displayed above the chat environment. Question bank prompts are displayed next to the chat environment, and students can select questions they want to add to the chat. <span class="html-italic">Students used * to indicate corrections in spelling</span>.</p>
Full article ">Figure 4
<p>Number of KI processes groups engaged in during the chat was split into groups that used the adaptive question bank and groups that did not use the adaptive question bank.</p>
Full article ">
18 pages, 930 KiB  
Case Report
Ontological Representation of the Structure and Vocabulary of Modern Greek on the Protégé Platform
by Nikoletta Samaridi, Evangelos Papakitsos and Nikitas Karanikolas
Computation 2024, 12(12), 249; https://doi.org/10.3390/computation12120249 - 23 Dec 2024
Abstract
One of the issues in Natural Language Processing (NLP) and Artificial Intelligence (AI) is language representation and modeling, aiming to manage its structure and find solutions to linguistic issues. With the pursuit of the most efficient capture of knowledge about the Modern Greek [...] Read more.
One of the issues in Natural Language Processing (NLP) and Artificial Intelligence (AI) is language representation and modeling, aiming to manage its structure and find solutions to linguistic issues. With the pursuit of the most efficient capture of knowledge about the Modern Greek language and, given the scientifically certified usability of the ontological structuring of data in the field of the semantic web and cognitive computing, a new ontology of the Modern Greek language at the level of structure and vocabulary is presented in this paper, using the Protégé platform. With the specific logical and structured form of knowledge representation to express, this research processes and exploits in an easy and useful way the distributed semantics of linguistic information. Full article
Show Figures

Figure 1

Figure 1
<p>The four (4) basic concepts (Μορφολογία_Morphology, Σύνταξη_Syntax, Σημασιολογία_Semantics, and Φωνητική_Phonetics), on which the ontology of Modern Greek is structured on the Protégé platform.</p>
Full article ">Figure 2
<p>The data properties (GreekLanguageDataProperty) of the new Greek Language Ontology Dictionary on the Protégé platform.</p>
Full article ">
26 pages, 359 KiB  
Review
Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review
by Mehmet Cem Sabaner, Rodrigo Anguita, Fares Antaki, Michael Balas, Lars Christian Boberg-Ans, Lorenzo Ferro Desideri, Jakob Grauslund, Michael Stormly Hansen, Oliver Niels Klefter, Ivan Potapenko, Marie Louise Roed Rasmussen and Yousif Subhi
J. Pers. Med. 2024, 14(12), 1165; https://doi.org/10.3390/jpm14121165 - 21 Dec 2024
Viewed by 344
Abstract
Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). [...] Read more.
Artificial intelligence (AI) is becoming increasingly influential in ophthalmology, particularly through advancements in machine learning, deep learning, robotics, neural networks, and natural language processing (NLP). Among these, NLP-based chatbots are the most readily accessible and are driven by AI-based large language models (LLMs). These chatbots have facilitated new research avenues and have gained traction in both clinical and surgical applications in ophthalmology. They are also increasingly being utilized in studies on ophthalmology-related exams, particularly those containing multiple-choice questions (MCQs). This narrative review evaluates both the opportunities and the challenges of integrating chatbots into ophthalmology research, with separate assessments of studies involving open- and close-ended questions. While chatbots have demonstrated sufficient accuracy in handling MCQ-based studies, supporting their use in education, additional exam security measures are necessary. The research on open-ended question responses suggests that AI-based LLM chatbots could be applied across nearly all areas of ophthalmology. They have shown promise for addressing patient inquiries, offering medical advice, patient education, supporting triage, facilitating diagnosis and differential diagnosis, and aiding in surgical planning. However, the ethical implications, confidentiality concerns, physician liability, and issues surrounding patient privacy remain pressing challenges. Although AI has demonstrated significant promise in clinical patient care, it is currently most effective as a supportive tool rather than as a replacement for human physicians. Full article
(This article belongs to the Section Methodology, Drug and Device Discovery)
39 pages, 8207 KiB  
Article
Multidimensional Visualization of Sound–Sense Harmony for Shakespeare’s Sonnets Classification
by Rodolfo Delmonte and Nicolò Busetto
Appl. Sci. 2024, 14(24), 11949; https://doi.org/10.3390/app142411949 - 20 Dec 2024
Viewed by 259
Abstract
In this article, we focus on the association of sound and sense harmony in the collection of sonnets written by Shakespeare in the XVI° beginning of the XVII° century and propose a new four-dimensional representation to visualize them by means of the system [...] Read more.
In this article, we focus on the association of sound and sense harmony in the collection of sonnets written by Shakespeare in the XVI° beginning of the XVII° century and propose a new four-dimensional representation to visualize them by means of the system called SPARSAR. To compute the degree of harmony and disharmony, we automatically extracted the sound grids of all the sonnets and combined them with the semantics and polarity expressed by their contents. We explain in detail the algorithm and show the representation of the whole collection of 154 sonnets and comment on them extensively. In a second experiment, we use data from the manual annotation of the sonnets for satire detection using the Appraisal Theory Framework, to gauge the system’s accuracy in matching these data with the output of the automatic algorithm for sound–sense harmony. The results obtained with a 94.6% accuracy confirm the obvious fact that the poet has to account for both sound and meaning in the choice of words. Full article
(This article belongs to the Special Issue Algorithmic Music and Sound Computing)
Show Figures

Figure 1

Figure 1
<p>A short sample of the sound–sense multidimensional visualization.</p>
Full article ">Figure 2
<p>Multidimensional patterns for sonnets 1–53.</p>
Full article ">Figure 3
<p>Multidimensional patterns for sonnets 54–105.</p>
Full article ">Figure 4
<p>Multidimensional patterns for sonnets 106–154.</p>
Full article ">Figure 5
<p>Distribution of ATF classes in 49 sonnets according to manual evaluation for Irony classification.</p>
Full article ">Figure 6
<p>Distribution of ATF classes for 42 sonnets according to manual evaluation for Sarcasm classification.</p>
Full article ">Figure 7
<p>Distribution of 63 sonnets evaluated as Neutral from manual ATF annotation.</p>
Full article ">Figure A1
<p>Architecture of <span class="html-italic">SPARSAR</span> with main pipeline organized into three levels.</p>
Full article ">Figure A2
<p>Upper part of multidimensional representation for sonnets 1–53 with negative and positive harmony.</p>
Full article ">Figure A3
<p>Upper part of multidimensional representation for sonnets 1-53 with negative and positive harmony.</p>
Full article ">Figure A4
<p>Upper part of multidimensional representation for sonnets 1–53 with disharmonic sonnets.</p>
Full article ">Figure A5
<p>Lower part of multidimensional representation for sonnets 1–53 with three columns.</p>
Full article ">Figure A6
<p>Upper part of multidimensional representation for sonnets 54–105 with negatives and positives.</p>
Full article ">Figure A7
<p>Upper part of multidimensional representation for sonnets 54–105 with negatives and positives.</p>
Full article ">Figure A8
<p>Upper part of multidimensional representations for sonnets 54–105 with disharmony.</p>
Full article ">Figure A9
<p>Lower part of multidimensional representation for sonnets 54–105 with three columns.</p>
Full article ">Figure A10
<p>Upper part of multidimensional representation for sonnets 106–154 with negatives and positives.</p>
Full article ">Figure A11
<p>Upper part of multidimensional representation for sonnets 106–154 with negatives and positives.</p>
Full article ">Figure A12
<p>Upper part of multidimensional representation for sonnets 106–154 with disharmony.</p>
Full article ">Figure A13
<p>Lower part of multidimensional representation for sonnets 106–154 with three columns.</p>
Full article ">
36 pages, 2037 KiB  
Article
Contextual Fine-Tuning of Language Models with Classifier-Driven Content Moderation for Text Generation
by Matan Punnaivanam and Palani Velvizhy
Entropy 2024, 26(12), 1114; https://doi.org/10.3390/e26121114 - 20 Dec 2024
Viewed by 354
Abstract
In today’s digital age, ensuring the appropriateness of content for children is crucial for their cognitive and emotional development. The rise of automated text generation technologies, such as Large Language Models like LLaMA, Mistral, and Zephyr, has created a pressing need for effective [...] Read more.
In today’s digital age, ensuring the appropriateness of content for children is crucial for their cognitive and emotional development. The rise of automated text generation technologies, such as Large Language Models like LLaMA, Mistral, and Zephyr, has created a pressing need for effective tools to filter and classify suitable content. However, the existing methods often fail to effectively address the intricate details and unique characteristics of children’s literature. This study aims to bridge this gap by developing a robust framework that utilizes fine-tuned language models, classification techniques, and contextual story generation to generate and classify children’s stories based on their suitability. Employing a combination of fine-tuning techniques on models such as LLaMA, Mistral, and Zephyr, alongside a BERT-based classifier, we evaluated the generated stories against established metrics like ROUGE, METEOR, and BERT Scores. The fine-tuned Mistral-7B model achieved a ROUGE-1 score of 0.4785, significantly higher than the base model’s 0.3185, while Zephyr-7B-Beta achieved a METEOR score of 0.4154 compared to its base counterpart’s score of 0.3602. The results indicated that the fine-tuned models outperformed base models, generating content more aligned with human standards. Moreover, the BERT Classifier exhibited high precision (0.95) and recall (0.97) for identifying unsuitable content, further enhancing the reliability of content classification. These findings highlight the potential of advanced language models in generating age-appropriate stories and enhancing content moderation strategies. This research has broader implications for educational technology, content curation, and parental control systems, offering a scalable approach to ensuring children’s exposure to safe and enriching narratives. Full article
Show Figures

Figure 1

Figure 1
<p>Evolution of LLMs.</p>
Full article ">Figure 2
<p>Types of fine-tuning.</p>
Full article ">Figure 3
<p>Traditional story generation using LLM.</p>
Full article ">Figure 4
<p>Pipeline of the proposed system.</p>
Full article ">Figure 5
<p>Prompt generation module.</p>
Full article ">Figure 6
<p>Supervised fine-tuning.</p>
Full article ">Figure 7
<p>Gradient normalization of LLaMA, Mistral, and Zephyr models. (<b>a</b>) LLaMA gradient normalization; (<b>b</b>) Mistral gradient normalization; (<b>c</b>) Zephyr gradient normalization.</p>
Full article ">Figure 8
<p>Learning rate of LLaMA, Mistral, and Zephyr models. (<b>a</b>) LLaMA learning rate; (<b>b</b>) Mistral learning rate; (<b>c</b>) Zephyr learning rate.</p>
Full article ">Figure 9
<p>Training loss of LLaMA, Mistral, and Zephyr models. (<b>a</b>) LLaMA training loss; (<b>b</b>) Mistral training loss; (<b>c</b>) Zephyr training loss.</p>
Full article ">Figure 10
<p>BERT Classifier.</p>
Full article ">Figure 11
<p>Types of classifiers.</p>
Full article ">Figure 12
<p>Confusion matrix BERT Classifier.</p>
Full article ">
29 pages, 1921 KiB  
Article
Large Language Models and the Elliott Wave Principle: A Multi-Agent Deep Learning Approach to Big Data Analysis in Financial Markets
by Michał Wawer, Jarosław A. Chudziak and Ewa Niewiadomska-Szynkiewicz
Appl. Sci. 2024, 14(24), 11897; https://doi.org/10.3390/app142411897 - 19 Dec 2024
Viewed by 406
Abstract
Traditional technical analysis methods face limitations in accurately predicting trends in today’s complex financial markets. Meanwhile, existing AI-driven approaches, while powerful in processing large datasets, often lack interpretability due to their black-box nature. This paper presents ElliottAgents, a multi-agent system that combines the [...] Read more.
Traditional technical analysis methods face limitations in accurately predicting trends in today’s complex financial markets. Meanwhile, existing AI-driven approaches, while powerful in processing large datasets, often lack interpretability due to their black-box nature. This paper presents ElliottAgents, a multi-agent system that combines the Elliott wave principle with LLMs, showcasing the application of deep reinforcement learning (DRL) and natural language processing (NLP) in financial analysis. By integrating retrieval-augmented generation (RAG) and deep reinforcement learning (DRL), the system processes vast amounts of market data to identify Elliott wave patterns and generate actionable insights. The system employs a coordinated team of specialized agents, each responsible for specific aspects of analysis, from pattern recognition to investment strategy formulation. We tested ElliottAgents on both stock and cryptocurrency markets, evaluating its effectiveness in pattern identification and trend prediction across different time scales. Our experimental results demonstrate improvements in prediction accuracy when combining classical technical analysis with AI-driven approaches, particularly when enhanced by DRL-based backtesting process. This research contributes to the advancement of financial technology by introducing a scalable, interpretable framework that enhances market analysis capabilities, offering a promising new methodology for both practitioners and researchers. Full article
Show Figures

Figure 1

Figure 1
<p>Example of a modern candlestick chart; screenshot from Ref. [<a href="#B5-applsci-14-11897" class="html-bibr">5</a>].</p>
Full article ">Figure 2
<p>Example of chart with technical analysis markers—Elliott waves, marked as 1-2-3-4-5; screenshot from Ref. [<a href="#B5-applsci-14-11897" class="html-bibr">5</a>].</p>
Full article ">Figure 3
<p>The fractal character of Elliott wave pattern, adapted from Refs. [<a href="#B4-applsci-14-11897" class="html-bibr">4</a>,<a href="#B15-applsci-14-11897" class="html-bibr">15</a>].</p>
Full article ">Figure 4
<p>Horizontal lines show Fibonnaci retracement levels of 23%, 38%, 50%, 62%, and 78%, measured from the top of the uptrend; screenshot from Ref. [<a href="#B5-applsci-14-11897" class="html-bibr">5</a>].</p>
Full article ">Figure 5
<p>Timeline of foundation models released since 2023. Blue color (upper part) indicates closed-source models and green color indicates open-source models.</p>
Full article ">Figure 6
<p>Transformer architecture, reprinted from Ref. [<a href="#B30-applsci-14-11897" class="html-bibr">30</a>].</p>
Full article ">Figure 7
<p>LangGraph RAG algorithm, adapted from Ref. [<a href="#B31-applsci-14-11897" class="html-bibr">31</a>].</p>
Full article ">Figure 8
<p>Diagram of supervisor agent in multi-agent hierarchical architecture, adapted from Ref. [<a href="#B34-applsci-14-11897" class="html-bibr">34</a>].</p>
Full article ">Figure 9
<p>Overview of a LLM autonomous agent, adapted from Ref. [<a href="#B37-applsci-14-11897" class="html-bibr">37</a>].</p>
Full article ">Figure 10
<p>ReAct agent components, adapted from Ref. [<a href="#B41-applsci-14-11897" class="html-bibr">41</a>].</p>
Full article ">Figure 11
<p>The agent–environment interaction in the Markov Decision Process (MPD) for ranking information with reinforcement learning, reprinted from Ref. [<a href="#B43-applsci-14-11897" class="html-bibr">43</a>].</p>
Full article ">Figure 12
<p>Diagram showing the components used by each agent and the flow of data between them.</p>
Full article ">Figure 13
<p>Fragment of Python code, presenting prompt and ReAct agent definition.</p>
Full article ">Figure 14
<p>Logic inside Backtester agent.</p>
Full article ">Figure 15
<p>Impulse wave (labeled 1-2-3-4-5) recognized on AMZN 1d chart.</p>
Full article ">Figure 16
<p>Partial corrective wave (labeled A-B) found on the Bitcoin 1d chart.</p>
Full article ">Figure 17
<p>All waves (marked with blue lines) detected by ElliottAgents for Alphabet stock over a 2-year period applied in one chart.</p>
Full article ">Figure 18
<p>All waves (marked with blue lines) detected by ElliottAgents for BTC-USD over a 2-year period applied in one chart.</p>
Full article ">Figure 19
<p>Response from Investment Advisor agent.</p>
Full article ">Figure 20
<p>Response from Elliott Wave Analyst agent.</p>
Full article ">
24 pages, 2642 KiB  
Article
Identification of Scientific Texts Generated by Large Language Models Using Machine Learning
by David Soto-Osorio, Grigori Sidorov, Liliana Chanona-Hernández and Blanca Cecilia López-Ramírez
Computers 2024, 13(12), 346; https://doi.org/10.3390/computers13120346 - 19 Dec 2024
Viewed by 408
Abstract
Large language models (LLMs) are tools that help us in a variety of activities, from creating well-structured texts to quickly consulting information. But as these new technologies are so easily accessible, many people use them for their own benefit without properly citing the [...] Read more.
Large language models (LLMs) are tools that help us in a variety of activities, from creating well-structured texts to quickly consulting information. But as these new technologies are so easily accessible, many people use them for their own benefit without properly citing the original author, or in other cases the student sector can be heavily compromised because students may opt for a quick answer over understanding and comprehending a specific topic in depth, considerably reducing their basic writing, editing and reading comprehension skills. Therefore, we propose to create a model to identify texts produced by LLM. To do so, we will use natural language processing (NLP) and machine-learning algorithms to recognize texts that mask LLM misuse using different types of adversarial attack, like paraphrasing or translation from one language to another. The main contributions of this work are to identify the texts generated by the large language models, and for this purpose several experiments were developed looking for the best results implementing the f1, accuracy, recall and precision metrics, together with PCA and t-SNE diagrams to see the classification of each one of the texts. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram illustrating the stages involved in text preprocessing, from the elimination of non-relevant elements to obtaining vector representations, allowing their use in machine-learning models.</p>
Full article ">Figure 2
<p>Diagram illustrating the process of extracting text in Markdown format from PDF documents hosted on the arXiv platform, using an API query and the implementation of a natural language processing model (Nougat).</p>
Full article ">Figure 3
<p>Diagram showing the text preprocessing process for large language model (LLM) training. It starts with an input text that is preprocessed through several stages, including cleaning, tokenization, stop word removal, spell checking and lemmatization, before being fed to different LLM models (Gemini, LLaMA2, LLaMA3, LLaMA).</p>
Full article ">Figure 4
<p>Diagram representing a text processing process involving recursive paraphrasing, translation and assignment of new labels. The original text goes through several stages of transformation before a new dataset is generated.</p>
Full article ">Figure 5
<p>Diagram showing the process of generating embeddings from a linguistic corpus. The text is preprocessed and then different techniques (TF-IDF, Word2Vec, GloVe, BERT, RoBERTa and 4xLLLM) are used to create vector representations of the words (embeddings), thus generating new datasets for each technique.</p>
Full article ">Figure 6
<p>This diagram shows the general process of training and evaluation of basic models, using different validation techniques and evaluation metrics.</p>
Full article ">Figure 7
<p>This diagram represents a basic neural network training and evaluation process. Embeddings and labels are used to train the model, and then its performance is evaluated through various metrics and visualizations.</p>
Full article ">Figure 8
<p>This diagram compares two main approaches to working with large language models (LLMs): prompt engineering and fine tuning. Both methods seek to optimize model performance, but use different strategies.</p>
Full article ">Figure 9
<p>The figure shows a confusion matrix that evaluates the performance of a classification model (LSTM with Word2Vec) on a dataset. Each cell of the matrix represents the number of instances that were classified in a certain class but actually belong to another class.</p>
Full article ">Figure 10
<p>The PCA diagram presents a two-dimensional visualization of the training data, reducing the original dimensionality to only two principal components. The different colored dots represent the different classes that the model attempts to classify. The scatter and clustering of the dots gives us an intuitive idea of how well the model separates the classes. If points in the same class are compactly grouped and separated from points in other classes, it indicates good model performance. In this case, it appears that some classes are better separated than others.</p>
Full article ">Figure 11
<p>The t-SNE plot shows the distribution of the data in a low-dimensional space, revealing the ability of the LSTM+Word2Vec model to separate the different classes. The considerable overlap between groups indicates that the model has difficulty distinguishing between certain categories. This suggests that improvements can be made to the model to achieve greater classification accuracy.</p>
Full article ">Figure 12
<p>The confusion matrix shows the performance of the SVM model trained with LLaVA embeddings, evaluated by 10-fold cross-validation. The main diagonal reveals a high level of hits, indicating that the model correctly classifies most of the samples. However, some classification errors are observed, suggesting that the model could benefit from additional adjustments.</p>
Full article ">Figure 13
<p>Principal component analysis (PCA) reveals the ability of the SVM model trained with LLaVA embeddings to separate the different classes. The formation of distinct groups indicates good classification performance, especially for the LLaVA and Gemini classes. However, some overlap is observed between human, LLaMA2 and LLaMA3, suggesting that the model could benefit from additional tuning.</p>
Full article ">Figure 14
<p>The t-SNE analysis reveals the ability of the SVM model trained with LLaVA embeddings to separate the different classes. The formation of distinct groups indicates good classification performance, especially for the LLaVA and Gemini classes. However, some overlap is observed between human, LLaMA2 and LLaMA3, suggesting that the model could benefit from additional tuning.</p>
Full article ">Figure 15
<p>The confusion matrix shows the performance of the DistilRoBERTa model fitted over 10 epochs. The main diagonal reveals a high level of hits, indicating that the model correctly classifies most of the samples. However, some misclassification errors are observed, especially in classes flame 2 and flame 3, suggesting that the model could benefit from additional adjustments.</p>
Full article ">Figure 16
<p>Principal component analysis (PCA) reveals the ability of the fitted DistilRoBERTa model to separate the different classes. The formation of distinct groups indicates good classification performance, especially for the human, LLaVA and Gemini classes. However, some overlap is observed between the LLaMA2 and LLaMA3 classes, suggesting that the model could benefit from further adjustments.</p>
Full article ">Figure 17
<p>The t-SNE analysis reveals the ability of the fitted DistilRoBERTa model to separate the different classes. The formation of distinct groups indicates good classification performance, especially for the human, LLaVA and Gemini classes. However, some overlap is observed between LLaMA2 and LLaMA3, suggesting that the model could benefit from additional adjustments.</p>
Full article ">
21 pages, 1728 KiB  
Article
Sentence Embedding Generation Framework Based on Kullback–Leibler Divergence Optimization and RoBERTa Knowledge Distillation
by Jin Han and Liang Yang
Mathematics 2024, 12(24), 3990; https://doi.org/10.3390/math12243990 - 18 Dec 2024
Viewed by 449
Abstract
In natural language processing (NLP) tasks, computing semantic textual similarity (STS) is crucial for capturing nuanced semantic differences in text. Traditional word vector methods, such as Word2Vec and GloVe, as well as deep learning models like BERT, face limitations in handling context dependency [...] Read more.
In natural language processing (NLP) tasks, computing semantic textual similarity (STS) is crucial for capturing nuanced semantic differences in text. Traditional word vector methods, such as Word2Vec and GloVe, as well as deep learning models like BERT, face limitations in handling context dependency and polysemy and present challenges in computational resources and real-time processing. To address these issues, this paper introduces two novel methods. First, a sentence embedding generation method based on Kullback–Leibler Divergence (KLD) optimization is proposed, which enhances semantic differentiation between sentence vectors, thereby improving the accuracy of textual similarity computation. Second, this study proposes a framework incorporating RoBERTa knowledge distillation, which integrates the deep semantic insights of the RoBERTa model with prior methodologies to enhance sentence embeddings while preserving computational efficiency. Additionally, the study extends its contributions to sentiment analysis tasks by leveraging the enhanced embeddings for classification. The sentiment analysis experiments, conducted using a Stochastic Gradient Descent (SGD) classifier on the ACL IMDB dataset, demonstrate the effectiveness of the proposed methods, achieving high precision, recall, and F1 score metrics. To further augment model accuracy and efficacy, a feature selection approach is introduced, specifically through the Dynamic Principal Component Selection (DPCS) algorithm. The DPCS method autonomously identifies and prioritizes critical features, thus enriching the expressive capacity of sentence vectors and significantly advancing the accuracy of similarity computations. Experimental results demonstrate that our method outperforms existing methods in semantic similarity computation on the SemEval-2016 dataset. When evaluated using cosine similarity of average vectors, our model achieved a Pearson correlation coefficient (τ) of 0.470, a Spearman correlation coefficient (ρ) of 0.481, and a mean absolute error (MAE) of 2.100. Compared to traditional methods such as Word2Vec, GloVe, and FastText, our method significantly enhances similarity computation accuracy. Using TF-IDF-weighted cosine similarity evaluation, our model achieved a τ of 0.528, ρ of 0.518, and an MAE of 1.343. Additionally, in the cosine similarity assessment leveraging the Dynamic Principal Component Smoothing (DPCS) algorithm, our model achieved a τ of 0.530, ρ of 0.518, and an MAE of 1.320, further demonstrating the method’s effectiveness and precision in handling semantic similarity. These results indicate that our proposed method has high relevance and low error in semantic textual similarity tasks, thereby better capturing subtle semantic differences between texts. Full article
Show Figures

Figure 1

Figure 1
<p>Construction of word vector space and text similarity calculation.</p>
Full article ">Figure 2
<p>The similarity between two relevant sentences. (<b>a</b>) Before KLD optimization; (<b>b</b>) after KLD optimization.</p>
Full article ">Figure 3
<p>The similarity between two irrelevant sentences. (<b>a</b>) Before KLD optimization; (<b>b</b>) after KLD optimization.</p>
Full article ">Figure 4
<p>Knowledge distillation model.</p>
Full article ">Figure 5
<p>Comprehensive analysis of loss and metrics over training rounds.</p>
Full article ">
30 pages, 4270 KiB  
Review
Unlocking Organizational Success: A Systematic Literature Review of Superintendent Selection Strategies, Core Competencies, and Emerging Technologies in the Construction Industry
by Mahdiyar Mokhlespour Esfahani, Mostafa Khanzadi, Sogand Hasanzadeh, Alireza Moradi, Igor Martek and Saeed Banihashemi
Sustainability 2024, 16(24), 11106; https://doi.org/10.3390/su162411106 - 18 Dec 2024
Viewed by 309
Abstract
An organization’s success depends on its ability to attract and retain skilled personnel. Superintendents play a critical role in overseeing project sites in the construction industry and can adapt to the increasingly complicated requirements of modern construction projects. This study examines traditional and [...] Read more.
An organization’s success depends on its ability to attract and retain skilled personnel. Superintendents play a critical role in overseeing project sites in the construction industry and can adapt to the increasingly complicated requirements of modern construction projects. This study examines traditional and modern personnel selection methods to determine effective tactics, essential competencies, and emerging trends regarding supervisory personnel. The research methodology follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework. First, this study examines traditional and modern selection methods used by organizations and engineering firms to provide a comprehensive overview of the topic and assist in selecting appropriate staff recruitment procedures. Second, the Web of Science, Scopus, and Google Scholar databases were reviewed to identify superintendent selection approaches and competencies, over the period January 2000 to September 2024. A total of 22 relevant papers were analyzed. Superintendent selection processes included questionnaires (57%), interviews (26%), literature reviews (14%), and data-driven AI tools (3%). Forty competency criteria were identified, with the top five being knowledge, communication skills, leadership, health and safety expertise, and commitment. As a result, novel approaches employing Industry 4.0 technologies, including virtual reality (VR), wearable sensing devices (WSDs), natural language processing (NLP), blockchain, and computer vision, are recommended. These findings support a better understanding of how best to identify the most qualified supervisory personnel and provides enhanced methods for evaluating job applicants. Full article
Show Figures

Figure 1

Figure 1
<p>Personnel selection methods across industries.</p>
Full article ">Figure 2
<p>The process of document selection (the PRISMA paradigm).</p>
Full article ">Figure 3
<p>Number of papers published yearly.</p>
Full article ">Figure 4
<p>Distribution of publishers.</p>
Full article ">Figure 5
<p>The distribution of papers among different nations and related citations.</p>
Full article ">Figure 6
<p>Keyword relationships and relevancy based on the VOSviewer software.</p>
Full article ">Figure 7
<p>Superintendent competencies as identified in extant literature.</p>
Full article ">
24 pages, 3264 KiB  
Article
Enhancing Personalized Mental Health Support Through Artificial Intelligence: Advances in Speech and Text Analysis Within Online Therapy Platforms
by Mariem Jelassi, Khouloud Matteli, Houssem Ben Khalfallah and Jacques Demongeot
Information 2024, 15(12), 813; https://doi.org/10.3390/info15120813 - 18 Dec 2024
Viewed by 407
Abstract
Automatic speech recognition (ASR) and natural language processing (NLP) play key roles in advancing human–technology interactions, particularly in healthcare communications. This study aims to enhance French-language online mental health platforms through the adaptation of the QuartzNet 15 × 5 ASR model, selected for [...] Read more.
Automatic speech recognition (ASR) and natural language processing (NLP) play key roles in advancing human–technology interactions, particularly in healthcare communications. This study aims to enhance French-language online mental health platforms through the adaptation of the QuartzNet 15 × 5 ASR model, selected for its robust performance across a variety of French accents as demonstrated on the Mozilla Common Voice dataset. The adaptation process involved tailoring the ASR model to accommodate various French dialects and idiomatic expressions, and integrating it with an NLP system to refine user interactions. The adapted QuartzNet 15 × 5 model achieved a baseline word error rate (WER) of 14%, and the accompanying NLP system displayed weighted averages of 64.24% in precision, 63.64% in recall, and an F1-score of 62.75%. Notably, critical functionalities such as ‘Prendre Rdv’ (schedule appointment) achieved precision, recall, and F1-scores above 90%. These improvements substantially enhance the functionality and management of user interactions on French-language digital therapy platforms, indicating that continuous adaptation and enhancement of these technologies are beneficial for improving digital mental health interventions, with a focus on linguistic accuracy and user satisfaction. Full article
Show Figures

Figure 1

Figure 1
<p>Automatic speech recognition process.</p>
Full article ">Figure 2
<p>Connectionist Temporal Classification decoding algorithm.</p>
Full article ">Figure 3
<p>Configuration of beam search decoder with N-gram language model.</p>
Full article ">Figure 4
<p>Voice assistant flowchart.</p>
Full article ">Figure 5
<p>NLU pipeline.</p>
Full article ">Figure 6
<p>System architecture. An overview of the system’s infrastructure, illustrating the inter-play between the automatic speech recognition component, dialogue management, and the user interface.</p>
Full article ">Figure 7
<p>Intent recognition confusion matrix.</p>
Full article ">Figure 8
<p>Dual Intent and Entity Transformer classifier confusion matrix.</p>
Full article ">Figure 9
<p>Intent prediction confidence distribution.</p>
Full article ">
24 pages, 6615 KiB  
Article
The Identification of AMT Family Genes and Their Expression, Function, and Regulation in Chenopodium quinoa
by Xiangxiang Wang, He Wu, Nazer Manzoor, Wenhua Dongcheng, Youbo Su, Zhengjie Liu, Chun Lin and Zichao Mao
Plants 2024, 13(24), 3524; https://doi.org/10.3390/plants13243524 - 17 Dec 2024
Viewed by 369
Abstract
Quinoa (Chenopodium quinoa) is an Andean allotetraploid pseudocereal crop with higher protein content and balanced amino acid composition in the seeds. Ammonium (NH4+), a direct source of organic nitrogen assimilation, mainly transported by specific transmembrane ammonium transporters ( [...] Read more.
Quinoa (Chenopodium quinoa) is an Andean allotetraploid pseudocereal crop with higher protein content and balanced amino acid composition in the seeds. Ammonium (NH4+), a direct source of organic nitrogen assimilation, mainly transported by specific transmembrane ammonium transporters (AMTs), plays important roles in the development, yield, and quality of crops. Many AMTs and their functions have been identified in major crops; however, no systematic analyses of AMTs and their regulatory networks, which is important to increase the yield and protein accumulation in the seeds of quinoa, have been performed to date. In this study, the CqAMTs were identified, followed by the quantification of the gene expression, while the regulatory networks were predicted based on weighted gene co-expression network analysis (WGCNA), with the putative transcriptional factors (TFs) having binding sites on the promoters of CqAMTs, nitrate transporters (CqNRTs), and glutamine-synthases (CqGSs), as well as the putative TF expression being correlated with the phenotypes and activities of GSs, glutamate synthase (GOGAT), nitrite reductase (NiR), and nitrate reductase (NR) of quinoa roots. The results showed a total of 12 members of the CqAMT family with varying expressions in different organs and in the same organs at different developmental stages. Complementation expression analyses in the triple mep1/2/3 mutant of yeast showed that except for CqAMT2.2b, 11/12 CqAMTs restored the uptake of NH4+ in the host yeast. CqAMT1.2a was found to mainly locate on the cell membrane, while TFs (e.g., CqNLPs, CqG2Ls, B3 TFs, CqbHLHs, CqZFs, CqMYBs, CqNF-YA/YB/YC, CqNACs, and CqWRKY) were predicted to be predominantly involved in the regulation, transportation, and assimilation of nitrogen. These results provide the functions of CqAMTs and their possible regulatory networks, which will lead to improved nitrogen use efficiency (NUE) in quinoa as well as other major crops. Full article
(This article belongs to the Section Plant Genetics, Genomics and Biotechnology)
Show Figures

Figure 1

Figure 1
<p>Phylogenetic tree of <span class="html-italic">AMTs</span> and their colinear relationship among <span class="html-italic">C. watsonii</span>, <span class="html-italic">C. suecicum</span>, and <span class="html-italic">C. quinoa</span>: (<b>A</b>) Phylogenetic tree of the AMT proteins of six plant species (<span class="html-italic">A. thaliana</span>, <span class="html-italic">O. sativa</span>, <span class="html-italic">S. lycopersicum</span>, <span class="html-italic">P. richocarpa</span>, <span class="html-italic">C. sinensis</span> var. sinensis, and <span class="html-italic">C. quinoa</span>). Each group is represented by a different color, and the CqAMT proteins are marked with red dots. (<b>B</b>) Reservation and loss of the <span class="html-italic">AMT</span> genes among <span class="html-italic">C. watsonii</span> (<span class="html-italic">Cw</span>), <span class="html-italic">C. suecicum</span> (<span class="html-italic">Cs</span>), and <span class="html-italic">C. quinoa</span> (<span class="html-italic">Cq</span>).</p>
Full article ">Figure 2
<p>Phylogenetic relationships, conserved motifs, and gene structure analysis of <span class="html-italic">CqAMT</span> genes: (<b>A</b>) Phylogenetic tree of the 12 CqAMT proteins. (<b>B</b>) The conserved protein motifs were identified using MEME; each color represents a motif. The lengths of the motifs are proportional. (<b>C</b>) The exon–intron distribution of <span class="html-italic">CqAMTs</span> with black lines indicated introns, while exons are indicated with yellow boxes (CDS) and blue boxes (UTR).</p>
Full article ">Figure 3
<p>Predicted cis-elements in 12 <span class="html-italic">CqAMTs</span> promoters, predicted using PlantCARE.</p>
Full article ">Figure 4
<p>Expression patterns of <span class="html-italic">CqAMT</span> genes: (<b>A</b>) W32 leaves and roots under 0, 8, and 21 mM NH<sub>4</sub><sup>+</sup> concentrations, respectively (L-0mM-21D—leaf samples were treated with 0 mM ammonium nitrogen for 21 d, with similar descriptions as those below for L-0mM-27D, L-21mM-21D, L-21mM-27D, L-8mM-21D, and L-8mM-27D; R-0mM-21D—root samples were treated with 0 mM ammonium nitrogen for 21 d, with similar descriptions as those below for R-0mM-27D, R-21mM-21D, R-21mM-27D, R-8mM-21D, and R-8mM-27D). (<b>B</b>) <span class="html-italic">CqAMTs</span> expressed in different developmental reproductive stages of both W19 and W25 planted in field (W19-FL—leaves of W19 at the flower development stage; W19-SL—leaves of W19 at the seed-filling stage; W19-FP—panicles of W19 at the flowering stage; W19-SP—panicles of W19 at the seed formation stage); W25 samples were labeled similarly as W19 samples.</p>
Full article ">Figure 5
<p>qRT-PCR analysis of the 10 <span class="html-italic">CqAMT</span>s in both leaves and roots under hydroponic cultivation of W32 after 21 d with 0, 8, and 21 mM NH<sub>4</sub><sup>+</sup> concentrations.</p>
Full article ">Figure 6
<p>Functional verification of 11 <span class="html-italic">CqAMTs</span> and <span class="html-italic">CqAMT1.2a</span> subcellular location detection: (<b>A</b>) Growth of the yeast mutants (31019b) was complemented via heterologous expression of <span class="html-italic">CqAMTs</span>. The yeast mutant strain (31019b) was transformed with the empty vector pYES2, or 11 <span class="html-italic">CqAMTs</span> expression vectors, namely CqAMT2.2a-pYES2, CqAMT1.3a-pYES2, CqaMT1.4a-pYES2, CqAMT3.1b-pYES2, CqAMT1.2c-pYES2, CqAMT1.2a-pYES2, CqAMT1.4b-pYES2, CqAMT1.2b-pYES2, CqAMT1.2d-pYES2, CqAMT3.1a-pYES2, and CqAMT1.3b-pYES2. The mutant 31019b transformed with pYES2 was used as a negative control. The transformants were grown on the SD medium at 30 °C for 2–3 days. (<b>B</b>) Subcellular localization detection of <span class="html-italic">CqAMT1.2a</span> was performed by fusing the expression with <span class="html-italic">GFP</span> in tobacco leaves.</p>
Full article ">Figure 7
<p>Co-expression network construction and identification of TFs: (<b>A</b>) The expression levels of screened TFs and genes related to nitrogen metabolism in different tissues and different nitrogen concentration samples in BGM. (<b>B</b>) The correlation between physiological traits and the expression level of screened TFs in the BGM. (<b>C</b>) The expression levels of screened TFs and genes related to nitrogen metabolism in different tissues and samples using different nitrogen concentrations in TGM. (<b>D</b>) The correlation between physiological traits and the expression level of screened TFs in TGM. (<b>E</b>) Co-expression network of top 15 TFs in BGM. (<b>F</b>) Co-expression network of top 21 TFs in TGM. (<b>G</b>) TF-TF co-expression network of BGM. (<b>H</b>) TF-TF co-expression network of TGM. The * symbol represents 0.01 &lt; <span class="html-italic">p</span> &lt; 0.05. The ** symbol represents <span class="html-italic">p</span> &lt; 0.01.</p>
Full article ">Figure 8
<p>Nitrogen uptake and utilization mechanism in <span class="html-italic">Chenopodium quinoa</span> [<a href="#B39-plants-13-03524" class="html-bibr">39</a>,<a href="#B40-plants-13-03524" class="html-bibr">40</a>,<a href="#B41-plants-13-03524" class="html-bibr">41</a>]. NO<sub>3</sub><sup>−</sup>—nitrate; NO<sub>2</sub><sup>−</sup>—nitrite ion; NH<sub>4</sub><sup>+</sup>—ammonium; NRT—nitrate transporter; NiR—nitrite reductase; NR—nitrate reductase; Gln—glutamine; Glu—glutamic acid; GS—glutamine synthase; GOGAT—glutamate synthetase; GDH—glutamate dehydrogenase; α-OG—α-ketoglutaric acid; NADP—nicotinamide adenine dinucleotide phosphate.</p>
Full article ">
15 pages, 1694 KiB  
Article
SSMBERT: A Space Science Mission Requirement Classification Method Based on BERT
by Yiming Zhu, Yuzhu Zhang, Xiaodong Peng, Changbin Xue, Bin Chen and Yu Cao
Aerospace 2024, 11(12), 1031; https://doi.org/10.3390/aerospace11121031 - 17 Dec 2024
Viewed by 316
Abstract
Model-Based Systems Engineering (MBSE) has demonstrated importance in the aerospace field. However, the MBSE modeling process is often tedious and heavily reliant on specialized knowledge and experience; thus, a new modeling method is urgently required to enhance modeling efficiency. This article focuses on [...] Read more.
Model-Based Systems Engineering (MBSE) has demonstrated importance in the aerospace field. However, the MBSE modeling process is often tedious and heavily reliant on specialized knowledge and experience; thus, a new modeling method is urgently required to enhance modeling efficiency. This article focuses on the MBSE modeling in space science mission phase 0, during which the mission requirements are collected, and the corresponding dataset is constructed. The dataset is utilized to fine-tune the BERT pre-training model for the classification of requirements pertaining to space science missions. This process supports the subsequent automated creation of the MBSE requirement model, which aims to facilitate scientific objective analysis and enhances the overall efficiency of the space science mission design process. Based on the characteristics of space science missions, this paper categorized the requirements into four categories: scientific objectives, performance, payload, and engineering requirements, and constructed a requirements dataset for space science missions. Then, utilizing this dataset, the BERT model is fine-tuned to obtain a space science mission requirements classification model (SSMBERT). Finally, SSMBERT is compared with other models, including TextCNN, TextRNN, and GPT-2, in the context of the space science mission requirement classification task. The results indicate that SSMBERT performs effectively under Few-Shot conditions, achieving a precision of 95%, which is at least 10% higher than other models, demonstrating superior performance and generalization capabilities. Full article
(This article belongs to the Special Issue Artificial Intelligence in Aerospace Propulsion)
Show Figures

Figure 1

Figure 1
<p>Flowchart of space science mission requirement classification method based on BERT.</p>
Full article ">Figure 2
<p>Dataset text fragment length distribution.</p>
Full article ">Figure 3
<p>Space science mission requirement dataset structure.</p>
Full article ">Figure 4
<p>Space science mission requirement classification process.</p>
Full article ">Figure 5
<p>SSMBERT classification obfuscation matrix.</p>
Full article ">
22 pages, 873 KiB  
Article
A Comparison of Responsive and General Guidance to Promote Learning in an Online Science Dialog
by Libby Gerard, Marcia C. Linn and Marlen Holtmann
Educ. Sci. 2024, 14(12), 1383; https://doi.org/10.3390/educsci14121383 - 17 Dec 2024
Viewed by 317
Abstract
Students benefit from dialogs about their explanations of complex scientific phenomena, and middle school science teachers cannot realistically provide all the guidance they need. We study ways to extend generative teacher–student dialogs to more students by using AI tools. We compare Responsive web-based [...] Read more.
Students benefit from dialogs about their explanations of complex scientific phenomena, and middle school science teachers cannot realistically provide all the guidance they need. We study ways to extend generative teacher–student dialogs to more students by using AI tools. We compare Responsive web-based dialogs to General web-based dialogs by evaluating the ideas students add and the quality of their revised explanations. We designed the General guidance to motivate and encourage students to revise their explanations, similar to how an experienced classroom teacher might instruct the class. We designed the Responsive guidance to emulate a student–teacher dialog, based on studies of experienced teachers guiding individual students. The analyses comparing the Responsive and the General condition are based on a randomized assignment of a total sample of 507 pre-college students. These students were taught by five different teachers in four schools. A significantly higher proportion of students added new accurate ideas in the Responsive condition compared to the General condition during the dialog. This research shows that by using NLP to identify ideas and assign guidance, students can broaden and refine their ideas. Responsive guidance, inspired by how experienced teachers guide individual students, is more valuable than General guidance. Full article
Show Figures

Figure 1

Figure 1
<p>Responsive dialog.</p>
Full article ">Figure 2
<p>Proportion of students who added a new accurate idea, a new vague idea or repeated at least one idea of their previous response at each time point.</p>
Full article ">Figure 3
<p>The proportion of students in each condition who added this idea. Notes. <span class="html-italic">Mechanistic ideas are in italics. * indicates p &lt; 0.05</span>.</p>
Full article ">
26 pages, 1463 KiB  
Article
Natural Language Processing Tools and Workflows for Improving Research Processes
by Noel Khan, David Elizondo, Lipika Deka and Miguel A. Molina-Cabello
Appl. Sci. 2024, 14(24), 11731; https://doi.org/10.3390/app142411731 - 16 Dec 2024
Viewed by 448
Abstract
The modern research process involves refining a set of keywords until sufficiently pertinent results are obtained from acceptable sources. References and citations from the most relevant results can then be traced to related works. This process iteratively develops a set of keywords to [...] Read more.
The modern research process involves refining a set of keywords until sufficiently pertinent results are obtained from acceptable sources. References and citations from the most relevant results can then be traced to related works. This process iteratively develops a set of keywords to find the most relevant literature. However, because a keyword-based search essentially samples a corpus, it may be inadequate for capturing a broad or exhaustive understanding of a topic. Further, a keyword-based search is dependent upon the underlying storage and retrieval technology and is essentially a syntactical search rather than a semantic search. To overcome such limitations, this paper explores the use of well-known natural language processing (NLP) techniques to support a semantic search and identifies where specific NLP techniques can be employed and what their primary benefits are, thus enhancing the opportunities to further improve the research process. The proposed NLP methods were tested through different workflows on different datasets and each workflow was designed to exploit latent relationships within the data to refine the keywords. The results of these tests demonstrated an improvement in the identified literature when compared to the literature extracted from the end-user-given keywords. For example, one of the defined workflows reduced the number of search results by two orders of magnitude but contained a larger percentage of pertinent results. Full article
(This article belongs to the Special Issue New Trends in Natural Language Processing)
Show Figures

Figure 1

Figure 1
<p>Data flow diagrams for a customary keyword search. The Level-1 DFD in subfigure (<b>a</b>) emphasizes the inputs and outputs. The Level-2 DFD in subfigure (<b>b</b>) shows a typical waterfall process that progresses from refinement to filtering and has two feedback loops. The first loop refines keywords until results are sufficiently relevant and the second adds to the body of knowledge with only an implicit dependency on the keywords.</p>
Full article ">Figure 2
<p>Data flow diagrams for an NLP-assisted keyword search. Subfigure (<b>a</b>) shows the two additional outputs, subtopics and validated keywords, which are both derived from a pertinent set of papers. The Level-2 DFD in subfigure (<b>b</b>) shows four additional feedback loops and two additional feed-forward loops, which have been labeled “FB#” and “FF#”, respectively. The value of the additional NLP processes and loops will be evaluated in this work.</p>
Full article ">Figure 3
<p>A spatio-semantic map of words from a corpus of papers regarding control systems. Search keywords are labeled and the distribution of keywords through the corpus qualitatively indicate how focused or diffused the ideas represented by keywords are. Greater diffusion indicates generality and greater focus indicates speciality. Note the keywords robot[s], body, agent autonomous[ly] in the neighborhood of <math display="inline"><semantics> <mrow> <mo>(</mo> <mo>−</mo> <mn>70</mn> <mo>,</mo> <mo>−</mo> <mn>30</mn> <mo>)</mo> </mrow> </semantics></math> and the keywords computation, intelligence, and autonomy towards <math display="inline"><semantics> <mrow> <mo>(</mo> <mn>0</mn> <mo>,</mo> <mn>0</mn> <mo>)</mo> </mrow> </semantics></math>. The first set of keywords do indeed share a meaning and are clustered spatially. These two sets of keywords appear to share the word autonomy, but quite interestingly, the adjective autonomous[ly] from the first set concerns a state of being while the noun autonomy from the second set concerns independence, and therefore, is rightly adjacent to intelligence.</p>
Full article ">Figure 4
<p>Spatio-semantic map of topics created using the Python library pyLDAvis, where the number inside each circle is a distinct topic identifier assigned during training. Circles in close proximity are more semantically related and circle size indicates the prevalence of a topic in the corpus. Although not shown, the pyLDAvis GUI displays the frequency of occurrence for keywords in a corpus against their frequency within selected topics.</p>
Full article ">Figure 5
<p>Spatio-semantic map of top 200 relevant papers from Scopus on predicting energy consumption.</p>
Full article ">Figure 6
<p>The Naïve Bayes filter excels at negation.</p>
Full article ">Figure 7
<p>The AdaBoost filter works better than Naïve Bayes at identifying true positives.</p>
Full article ">
24 pages, 4353 KiB  
Article
What Is the Attitude of Romanian Smallholders Towards a Ground Mole Infestation? A Study Using Topic Modelling and Sentiment Analysis on Social Media and Blog Discussions
by Alina Delia Călin and Adriana Mihaela Coroiu
Animals 2024, 14(24), 3611; https://doi.org/10.3390/ani14243611 - 14 Dec 2024
Viewed by 496
Abstract
In this paper, we analyse the attitudes and sentiments of Romanian smallholders towards mole infestations, as expressed in online contexts. A corpus of texts on the topic of ground moles and how to get rid of them was collected from social media and [...] Read more.
In this paper, we analyse the attitudes and sentiments of Romanian smallholders towards mole infestations, as expressed in online contexts. A corpus of texts on the topic of ground moles and how to get rid of them was collected from social media and blog thread discussions. The texts were analysed using topic modelling, clustering, and sentiment analysis, revealing both negative and positive sentiments and attitudes. The methods used by farmers when dealing with ground moles involve both eco-friendly repellent solutions and toxic substances and pesticides. Even well-intentioned farmers are discouraged by crop and lawn damage, resorting to environmentally aggressive solutions. The study shows that the relationship between humans and moles could be improved by active education on effective ecological agricultural approaches. Full article
Show Figures

Figure 1

Figure 1
<p>Overground dead mole pictured in Făget Forest, Cluj, Romania, October 2023 (<b>left</b>) and in Dumbrava Forest, Sibiu, Romania, August 2024 (<b>right</b>). Photo credit Mihai Cuibus.</p>
Full article ">Figure 2
<p>Molehills in Dumbrava Forest, Sibiu, Romania, August 2024.</p>
Full article ">Figure 3
<p>Wordcloud of the most frequent terms.</p>
Full article ">Figure 4
<p>Distribution per year of the dataset posts.</p>
Full article ">Figure 5
<p>The word length frequency (blue) of the dataset posts and standard distribution curve (red).</p>
Full article ">Figure 6
<p>BerTOPIC similarity matrix.</p>
Full article ">Figure 7
<p>BerTOPIC topics and top frequency words.</p>
Full article ">Figure 8
<p>The intertopic distance map.</p>
Full article ">Figure 9
<p>Number of clusters identified automatically using the elbow method.</p>
Full article ">Figure 10
<p>The clusters’ representation: each cluster is represented with one colour, and the centroid is marked with an X in the middle of each cluster.</p>
Full article ">Figure 11
<p>The clusters’ distribution.</p>
Full article ">Figure 12
<p>The clusters representation for K-Means++. The centroids of the each cluster is marked with an X.</p>
Full article ">Figure 13
<p>Words frequencies in the clusters.</p>
Full article ">Figure 14
<p>Sentiment polarity with Textblob, Vader, and Flair for the entire dataset.</p>
Full article ">Figure 15
<p>Emotions using Roberta. Colour codes: orange—negative; green—positive; yellow—neutral.</p>
Full article ">Figure 16
<p>Top-scoring emotions based on Roberta across the dataset.</p>
Full article ">Figure 17
<p>Emotion map using Distilbert for each of the 1402 posts.</p>
Full article ">Figure 18
<p>Emotion distribution based on Distilbert across the dataset.</p>
Full article ">
Back to TopTop