Issue Downloads
Development of a Benchmark Odia Handwritten Character Database for an Efficient Offline Handwritten Character Recognition with a Chronological Survey
A good benchmark dataset is a primary requirement in the offline handwritten character recognition (HCR) process. Only three handwritten numerals and alphabet datasets from Odia are publicly accessible for study, although many writers have used several ...
Detection of Offensive Language and ITS Severity for Low Resource Language
Continuous proliferation of hate speech in different languages on social media has drawn significant attention from researchers in the past decade. Detecting hate speech is indispensable irrespective of the scale of use of language, as it inflicts huge ...
Contrastive Adversarial Training for Multi-Modal Machine Translation
The multi-modal machine translation task is to improve translation quality with the help of additional visual input. It is expected to disambiguate or complement semantics while there are ambiguous words or incomplete expressions in the sentences. ...
Think More Ambiguity Less: A Novel Dual Interactive Model with Local and Global Semantics for Chinese Named Entity Recognition
Chinese is a representative East Asian language. Chinese Named Entity Recognition (CNER) aims to recognize various entities. It is significant for other NLP tasks to utilize CNER. Recent research to develop CNER systems has been dedicated to either ...
Knowledge-enhanced Prompt-tuning for Stance Detection
Investigating public attitudes on social media is important in opinion mining systems. Stance detection aims to analyze the attitude of an opinionated text (e.g., favor, neutral, or against) toward a given target. Existing methods mainly address this ...
BayesKGR: Bayesian Few-Shot Learning for Knowledge Graph Reasoning
Reasoning over knowledge graphs (KGs) has received increasing attention recently due to its promising applications in many areas, such as semantic search and recommendation systems. Subsequently, most reasoning models are inherently transductive and ...
Image–Text Multimodal Sentiment Analysis Framework of Assamese News Articles Using Late Fusion
Before the arrival of the web as a corpus, people detected positive and negative news based on the understanding of the textual content from physical newspaper rather than an automatic identification approach from readily available e-newspapers. Thus, the ...
Semi-Supervised Semantic Role Labeling with Bidirectional Language Models
The recent success of neural networks in NLP applications has provided a strong impetus to develop supervised models for semantic role labeling (SRL) that forego the requirement for extensive feature engineering. Recent state-of-the-art approaches require ...
Integrating Reconstructor and Post-Editor into Neural Machine Translation
Neural machine translation (NMT) mainly comprises the encoder and decoder. The encoder is mainly used to extract the feature vector of the source language sentence. The decoder predicts the next token according to the feature vector extracted by the ...
An Efficient and Accurate Detection of Fake News Using Capsule Transient Auto Encoder
Fake news is “news reports that are deliberatively and indisputably fake.” News that uses fake information is becoming a threat. It becomes challenging for humans to distinguish between fake and actual news. It has become necessary to detect fake news, ...
LFWE: Linguistic Feature Based Word Embedding for Hindi Fake News Detection
It is essential for research communities to investigate ways for authenticating news. The use of linguistic feature based analysis to automatically detect false news is gaining popularity among the scientific community. However, such techniques are ...
Vietnamese Sentiment Analysis: An Overview and Comparative Study of Fine-tuning Pretrained Language Models
Sentiment Analysis (SA) is one of the most active research areas in the Natural Language Processing (NLP) field due to its potential for business and society. With the development of language representation models, numerous methods have shown promising ...
Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
Automatic part-of-speech (POS) tagging is a preprocessing step of many natural language processing tasks, such as named entity recognition, speech processing, information extraction, word sense disambiguation, and machine translation. It has already ...
Komala and Kaṭhora: A Novel Approach Towards Classification of Hindi Poetry
Literary compositions are very often analyzed using various constituent units like words, phrases, sentences, and paragraphs. Unlike the conventional research that focuses on the aforementioned constituent units, our task is a statistical effort carried ...
Improving Multilingual Neural Machine Translation System for Indic Languages
The Machine Translation System (MTS) serves as effective tool for communication by translating text or speech from one language to another language. Recently, neural machine translation (NMT) has become popular for its performance and cost-effectiveness. ...
Prose2Poem: The Blessing of Transformers in Translating Prose to Persian Poetry
Persian poetry has consistently expressed its philosophy, wisdom, speech, and rationale based on its couplets, making it an enigmatic language on its own to both native and non-native speakers. Nevertheless, the noticeable gap between Persian prose and ...
TPoet: Topic-Enhanced Chinese Poetry Generation
Chinese poetry generation has been a challenging part of natural language processing due to the unique literariness and aesthetics of poetry. In most cases, the content of poetry is topic related. In other words, specific thoughts or emotions are usually ...
Metadial: A Meta-learning Approach for Arabic Dialogue Generation
Dialogue generation is the automatic generation of a text response, given a user’s input. Dialogue generation for low-resource languages has been a challenging tasks for researchers. However, the advancements in deep learning models have made developing ...
Cross-lingual Text Reuse Detection at Document Level for English-Urdu Language Pair
In recent years, the problem of Cross-Lingual Text Reuse Detection (CLTRD) has gained the interest of the research community due to the availability of large digital repositories and automatic Machine Translation (MT) systems. These systems are readily ...
Enhancing RDF Verbalization with Descriptive and Relational Knowledge
RDF verbalization has received increasing interest, which aims to generate a natural language description of the knowledge base. Sequence-to-sequence models based on Transformer are able to obtain strong performance equipped with pre-trained language ...
Semantic Tagging for the Urdu Language: Annotated Corpus and Multi-Target Classification Methods
Extracting and analysing meaning-related information from natural language data has attracted the attention of researchers in various fields, such as natural language processing, corpus linguistics, information retrieval, and data science. An important ...
Cross-lingual Sentence Embedding for Low-resource Chinese-Vietnamese Based on Contrastive Learning
Cross-lingual sentence embedding’s goal is mapping sentences with similar semantics but in different languages close together and dissimilar sentences farther apart in the representation space. It is the basis of many downstream tasks such as cross-...
Text Polishing with Chinese Idiom: Task, Datasets and Pre-trained Baselines
This work presents the task of text polishing, which generates a sentence that is more graceful than the input sentence while retaining its semantic meaning. Text polishing has great value in real usage and is an important component in modern writing ...
Alabib-65: A Realistic Dataset for Algerian Sign Language Recognition
Sign language recognition (SLR) is a promising research field that aims to blur boundaries between Deaf and hearing people by creating a system that can transcribe signs into a written or vocal language. There is a growing body of literature that ...
Using Data Augmentation and Bidirectional Encoder Representations from Transformers for Improving Punjabi Named Entity Recognition
Named entity recognition (NER) is a task of proper noun identification from natural language text and classification into various types such as location, person, and organization. Due to NER's applications in different natural language processing (NLP) ...
From Softmax to Nucleusmax: A Novel Sparse Language Model for Chinese Radiology Report Summarization
The Chinese radiology report summarization is a crucial component in smart healthcare that employs language models to summarize key findings in radiology reports and communicate these findings to physicians. However, most language models for radiology ...
The Impact of Arabic Diacritization on Word Embeddings
Word embedding is used to represent words for text analysis. It plays an essential role in many Natural Language Processing (NLP) studies and has hugely contributed to the extraordinary developments in the field in the last few years. In Arabic, diacritic ...
Robust Multi-task Learning-based Korean POS Tagging to Overcome Word Spacing Errors
End-to-end neural network-based approaches have recently demonstrated significant improvements in natural language processing (NLP). However, in the NLP application such as assistant systems, NLP components are still processed to extract results using a ...
Multilingual BERT-based Word Alignment By Incorporating Common Chinese Characters
Word alignment is an important task of detecting translation equivalents between a sentence pair. Although word alignment is no longer necessarily needed for neural machine translation, it’s still useful in a wealth of applications, e.g., bilingual ...
Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
Named entity recognition in the Indonesian language has significantly developed in recent years. However, it still lacks standardized publicly available corpora; a small dataset is available but suffers from inconsistent annotations. Therefore, we re-...