DMKD: Vol 38, No 6

Volume 38, Issue 6Nov 2024Current Issue

Latest Issue

Volume 38, Issue 6

Nov 2024

Publisher:

Kluwer Academic Publishers
101 Philip Drive Assinippi Park Norwell, MA
United States

ISSN:1384-5810

Tags:

Bibliometrics

Select All

Export Citations Save to Binder

research-article

Discord-based counterfactual explanations for time series classification

Pages 3347–3371https://doi.org/10.1007/s10618-024-01028-9

Abstract

The opacity inherent in machine learning models presents a significant hindrance to their widespread incorporation into decision-making processes. To address this challenge and foster trust among stakeholders while ensuring decision fairness, the ...

research-article

Robust explainer recommendation for time series classification

Pages 3372–3413https://doi.org/10.1007/s10618-024-01045-8

Abstract

Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as ...

research-article

GeoRF: a geospatial random forest

Pages 3414–3448https://doi.org/10.1007/s10618-024-01046-7

Abstract

The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and ...

research-article

Modelling event sequence data by type-wise neural point process

Bingqing Liu

Pages 3449–3472https://doi.org/10.1007/s10618-024-01047-6

Abstract

Event sequence data widely exists in real life, where each event is typically represented as a tuple, event type and occurrence time. Recently, neural point process (NPP), a probabilistic model that learns the next event distribution with events ...

research-article

Randomnet: clustering time series using untrained deep neural networks

Pages 3473–3502https://doi.org/10.1007/s10618-024-01048-5

Abstract

Neural networks are widely used in machine learning and data mining. Typically, these networks need to be trained, implying the adjustment of weights (parameters) within the network based on the input data. In this work, we propose a novel ...

research-article

Towards effective urban region-of-interest demand modeling via graph representation learning

Pages 3503–3530https://doi.org/10.1007/s10618-024-01049-4

Abstract

Identifying the region’s functionalities and what the specific Point-of-Interest (POI) needs is essential for effective urban planning. However, due to the diversified and ambiguity nature of urban regions, there are still some significant ...

research-article

Knowledge graph embedding closed under composition

Pages 3531–3562https://doi.org/10.1007/s10618-024-01050-x

Abstract

Knowledge Graph Embedding (KGE) has attracted increasing attention. Relation patterns, such as symmetry and inversion, have received considerable focus. Among them, composition patterns are particularly important, as they involve nearly all ...

research-article

On regime changes in text data using hidden Markov model of contaminated vMF distribution

Pages 3563–3589https://doi.org/10.1007/s10618-024-01051-w

Abstract

This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is implemented using forward and ...

research-article

Negative-sample-free knowledge graph embedding

Pages 3590–3620https://doi.org/10.1007/s10618-024-01052-9

Abstract

Recently, knowledge graphs (KGs) have been shown to benefit many machine learning applications in multiple domains (e.g. self-driving, agriculture, bio-medicine, recommender systems, etc.). However, KGs suffer from incompleteness, which motivates ...

research-article

Explainable decomposition of nested dense subgraphs

Nikolaj Tatti

Pages 3621–3642https://doi.org/10.1007/s10618-024-01053-8

Abstract

Discovering dense regions in a graph is a popular tool for analyzing graphs. While useful, analyzing such decompositions may be difficult without additional information. Fortunately, many real-world networks have additional information, namely ...

research-article

Bayesian network Motifs for reasoning over heterogeneous unlinked datasets

Pages 3643–3689https://doi.org/10.1007/s10618-024-01054-7

Abstract

Modern data-oriented applications often require integrating data from multiple heterogeneous sources. When these datasets share attributes, but are otherwise unlinked, there is no way to join them and reason at the individual level explicitly. ...

research-article

Gradient-based explanation for non-linear non-parametric dimensionality reduction

Pages 3690–3718https://doi.org/10.1007/s10618-024-01055-6

Abstract

Dimensionality reduction (DR) is a popular technique that shows great results to analyze high-dimensional data. Generally, DR is used to produce visualizations in 2 or 3 dimensions. While it can help understanding correlations between data, ...

research-article

Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures

Pages 3719–3757https://doi.org/10.1007/s10618-024-01056-5

Abstract

An outlier probability is the probability that an observation is an outlier. Typically, outlier detection algorithms calculate real-valued outlier scores to identify outliers. Converting outlier scores into outlier probabilities increases the ...

research-article

Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback

Pages 3758–3782https://doi.org/10.1007/s10618-024-01057-4

Abstract

We study the problem of predicting the next query to be recommended in interactive data exploratory analysis to guide users to correct content. Current query prediction approaches are based on sequence-to-sequence learning, exploiting past ...

research-article

De-confounding representation learning for counterfactual inference on continuous treatment via generative adversarial network

Pages 3783–3804https://doi.org/10.1007/s10618-024-01058-3

Abstract

Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the ...

research-article

Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN

Pages 3805–3830https://doi.org/10.1007/s10618-024-01059-2

Abstract

Accurate racism classification is crucial on social media, where racist and discriminatory content can harm individuals and society. Automated racism detection requires gathering and annotating a wide range of diverse and representative data as an ...

research-article

Statistical methods utilizing structural properties of time-evolving networks for event detection

Pages 3831–3867https://doi.org/10.1007/s10618-024-01060-9

Abstract

With the advancement of technology, real-world networks have become vulnerable to many attacks such as cyber-crimes, terrorist attacks, and financial frauds. Accuracy and scalability are the two principal but contrary characteristics for ...

research-article

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Pages 3868–3921https://doi.org/10.1007/s10618-024-01061-8

Abstract

Consider a large labeled graph (network), denoted the target. Subgraph matching is the problem of finding all instances of a small subgraph, denoted the query, in the target graph. Unlike the majority of existing methods that are restricted to ...

research-article

Detach-ROCKET: sequential feature selection for time series classification with random convolutional kernels

Pages 3922–3947https://doi.org/10.1007/s10618-024-01062-7

Abstract

Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural ...

research-article

Efficient learning with projected histograms

Pages 3948–4000https://doi.org/10.1007/s10618-024-01063-6

Abstract

High dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further ...

research-article

Opinion dynamics in social networks incorporating higher-order interactions

Pages 4001–4023https://doi.org/10.1007/s10618-024-01064-5

Abstract

The issue of opinion sharing and formation has received considerable attention in the academic literature, and a few models have been proposed to study this problem. However, existing models are limited to the interactions among nearest neighbors, ...

research-article

Random walks with variable restarts for negative-example-informed label propagation

Pages 4024–4039https://doi.org/10.1007/s10618-024-01065-4

Abstract

Label propagation is frequently encountered in machine learning and data mining applications on graphs, either as a standalone problem or as part of node classification. Many label propagation algorithms utilize random walks (or network ...

research-article

Evaluating the disclosure risk of anonymized documents via a machine learning-based re-identification attack

Pages 4040–4075https://doi.org/10.1007/s10618-024-01066-3

Abstract

The availability of textual data depicting human-centered features and behaviors is crucial for many data mining and machine learning tasks. However, data containing personal information should be anonymized prior making them available for ...

research-article

Regularization-based methods for ordinal quantification

Pages 4076–4121https://doi.org/10.1007/s10618-024-01067-2

Abstract

Quantification, i.e., the task of predicting the class prevalence values in bags of unlabeled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary ...

research-article

FRUITS: feature extraction using iterated sums for time series classification

Pages 4122–4156https://doi.org/10.1007/s10618-024-01068-1

Abstract

We introduce a pipeline for time series classification that extracts features based on the iterated-sums signature (ISS) and then applies a linear classifier. These features are intrinsically nonlinear, capture chronological information, and, ...

research-article

Bounding the family-wise error rate in local causal discovery using Rademacher averages

Pages 4157–4183https://doi.org/10.1007/s10618-024-01069-0

Abstract

Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data, focusing on two sets of variables. The first one, called Parent–Children (PC) set, contains all the variables that ...

research-article

Model-agnostic variable importance for predictive uncertainty: an entropy-based approach

Pages 4184–4216https://doi.org/10.1007/s10618-024-01070-7

Abstract

In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only ...

research-article

FRAPPE: fast rank approximation with explainable features for tensors

Pages 4217–4232https://doi.org/10.1007/s10618-024-01071-6

Abstract

Tensor decompositions have proven to be effective in analyzing the structure of multidimensional data. However, most of these methods require a key parameter: the number of desired components. In the case of the CANDECOMP/PARAFAC decomposition (...

correction

Correction to: Bias characterization, assessment, and mitigation in location-based recommender systems

Page 4233https://doi.org/10.1007/s10618-024-01013-2

correction

Correction: Marginal effects for non-linear prediction functions

Pages 4234–4235https://doi.org/10.1007/s10618-024-01034-x

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Data Mining and Knowledge Discovery

Sections

Discord-based counterfactual explanations for time series classification

Robust explainer recommendation for time series classification

GeoRF: a geospatial random forest

Modelling event sequence data by type-wise neural point process

Randomnet: clustering time series using untrained deep neural networks

Towards effective urban region-of-interest demand modeling via graph representation learning

Knowledge graph embedding closed under composition

On regime changes in text data using hidden Markov model of contaminated vMF distribution

Negative-sample-free knowledge graph embedding

Explainable decomposition of nested dense subgraphs

Bayesian network Motifs for reasoning over heterogeneous unlinked datasets

Gradient-based explanation for non-linear non-parametric dimensionality reduction

Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures

Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback

De-confounding representation learning for counterfactual inference on continuous treatment via generative adversarial network

Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN

Statistical methods utilizing structural properties of time-evolving networks for event detection

ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains

Detach-ROCKET: sequential feature selection for time series classification with random convolutional kernels

Efficient learning with projected histograms

Opinion dynamics in social networks incorporating higher-order interactions

Random walks with variable restarts for negative-example-informed label propagation

Evaluating the disclosure risk of anonymized documents via a machine learning-based re-identification attack

Regularization-based methods for ordinal quantification

FRUITS: feature extraction using iterated sums for time series classification

Bounding the family-wise error rate in local causal discovery using Rademacher averages

Model-agnostic variable importance for predictive uncertainty: an entropy-based approach

FRAPPE: fast rank approximation with explainable features for tensors

Correction to: Bias characterization, assessment, and mitigation in location-based recommender systems

Correction: Marginal effects for non-linear prediction functions