[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
Reflects downloads up to 09 Jan 2025Bibliometrics
Skip Table Of Content Section
research-article
TIARA 2.0: an interactive tool for annotating discourse structure and text improvement
Abstract

Discourse structure annotation aims at analysing how discourse units (e.g. sentences or clauses) relate to each other and what roles they play in the overall discourse. Several annotation tools for discourse structure have been developed. However, ...

research-article
Statistical quality estimation for partially subjective classification tasks through crowdsourcing
Abstract

When constructing a large-scale data resource, the quality of artifacts has great significance, especially when they are generated by creators through crowdsourcing. A widely used approach is to estimate the quality of each artifact based on ...

research-article
Public Access
COLLIE: a broad-coverage ontology and lexicon of verbs in English
Abstract

Progress on deep language understanding is inhibited by the lack of a broad coverage lexicon that connects linguistic behavior to ontological concepts and axioms. We have developed COLLIE-V, a deep lexical resource for verbs, with the coverage of ...

research-article
The WASABI song corpus and knowledge graph for music lyrics analysis
Abstract

We present the WASABI Song Corpus, a large corpus of songs enriched with metadata extracted from music databases on the Web, and resulting from the processing of song lyrics and from audio analysis. More specifically, given that lyrics encode an ...

research-article
Between welcome culture and border fence: A dataset on the European refugee crisis in German newspaper reports
Abstract

Newspaper reports provide a rich source of information on the unfolding of public debates, which can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite ...

research-article
Investigating the role of swear words in abusive language detection tasks
Abstract

Swearing plays an ubiquitous role in everyday conversations among humans, both in oral and textual communication, and occurs frequently in social media texts, typically featured by informal language and spontaneous writing. Such occurrences can be ...

research-article
EventDNA: a dataset for Dutch news event extraction as a basis for news diversification
Abstract

News organizations increasingly tailor their news offering to the reader through personalized recommendation algorithms. However, automated recommendation algorithms reflect a commercial logic based on calculated relevance to the user, rather than ...

research-article
Usage disambiguation of Turkish discourse connectives
Abstract

This paper describes a rule-based approach and a machine learning approach to disambiguate the discourse usage of Turkish connectives, which not only has single and phrasal connectives as most languages do, but also suffixal connectives that ...

research-article
The impact of preprocessing on word embedding quality: a comparative study
Abstract

Data preprocessing is among the principal stages in virtually all text-based tasks. In this light, recent approaches have employed word embeddings in the majority of text-based tasks, wherein word co-occurrences are used as the basis of word ...

research-article
Spelling errors made by people with dyslexia
Abstract

In this paper, we present a review of studies that have collected and annotated errors produced by people with dyslexia from corpora of written texts (six studies involving English, Spanish, German and French). Such resources are useful for ...

research-article
Nonverbal communication with emojis in social media: dissociating hedonic intensity from frequency
Abstract

As a popular means of nonverbal communication in social media, emojis provide quick predictions about public sentiments towards social events. Previous analyses of emojis reported that people use positive emojis more frequently than negative ...

research-article
Managing, storing, and sharing long-form recordings and their annotations
Abstract

The technique of long-form recordings via wearables is gaining momentum in different fields of research, notably linguistics and neurology. This technique, however, poses several technical challenges, some of which are amplified by the ...

brief-report
Manipuri–English comparable corpus for cross-lingual studies
Abstract

This paper presents Mni-EnCC, a temporal alligned Manipuri–English comparable corpus, to facilitate cross-lingual studies between Manipuri and English. Mni-EnCC has been created by collating text from two publicly published news sources in ...

research-article
Resources for Turkish natural language processing: A critical survey
Abstract

This paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.