Discord-based counterfactual explanations for time series classification
The opacity inherent in machine learning models presents a significant hindrance to their widespread incorporation into decision-making processes. To address this challenge and foster trust among stakeholders while ensuring decision fairness, the ...
Robust explainer recommendation for time series classification
Time series classification is a task which deals with temporal sequences, a prevalent data type common in domains such as human activity recognition, sports analytics and general sensing. In this area, interest in explanability has been growing as ...
GeoRF: a geospatial random forest
The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and ...
Modelling event sequence data by type-wise neural point process
Event sequence data widely exists in real life, where each event is typically represented as a tuple, event type and occurrence time. Recently, neural point process (NPP), a probabilistic model that learns the next event distribution with events ...
Randomnet: clustering time series using untrained deep neural networks
Neural networks are widely used in machine learning and data mining. Typically, these networks need to be trained, implying the adjustment of weights (parameters) within the network based on the input data. In this work, we propose a novel ...
Towards effective urban region-of-interest demand modeling via graph representation learning
Identifying the region’s functionalities and what the specific Point-of-Interest (POI) needs is essential for effective urban planning. However, due to the diversified and ambiguity nature of urban regions, there are still some significant ...
Knowledge graph embedding closed under composition
- Zhuoxun Zheng,
- Baifan Zhou,
- Hui Yang,
- Zhipeng Tan,
- Zequn Sun,
- Chunnong Li,
- Arild Waaler,
- Evgeny Kharlamov,
- Ahmet Soylu
Knowledge Graph Embedding (KGE) has attracted increasing attention. Relation patterns, such as symmetry and inversion, have received considerable focus. Among them, composition patterns are particularly important, as they involve nearly all ...
On regime changes in text data using hidden Markov model of contaminated vMF distribution
This paper presents a novel methodology for analyzing temporal directional data with scatter and heavy tails. A hidden Markov model with contaminated von Mises-Fisher emission distribution is developed. The model is implemented using forward and ...
Negative-sample-free knowledge graph embedding
Recently, knowledge graphs (KGs) have been shown to benefit many machine learning applications in multiple domains (e.g. self-driving, agriculture, bio-medicine, recommender systems, etc.). However, KGs suffer from incompleteness, which motivates ...
Explainable decomposition of nested dense subgraphs
Discovering dense regions in a graph is a popular tool for analyzing graphs. While useful, analyzing such decompositions may be difficult without additional information. Fortunately, many real-world networks have additional information, namely ...
Bayesian network Motifs for reasoning over heterogeneous unlinked datasets
Modern data-oriented applications often require integrating data from multiple heterogeneous sources. When these datasets share attributes, but are otherwise unlinked, there is no way to join them and reason at the individual level explicitly. ...
Gradient-based explanation for non-linear non-parametric dimensionality reduction
Dimensionality reduction (DR) is a popular technique that shows great results to analyze high-dimensional data. Generally, DR is used to produce visualizations in 2 or 3 dimensions. While it can help understanding correlations between data, ...
Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures
An outlier probability is the probability that an observation is an outlier. Typically, outlier detection algorithms calculate real-valued outlier scores to identify outliers. Converting outlier scores into outlier probabilities increases the ...
Sequential query prediction based on multi-armed bandits with ensemble of transformer experts and immediate feedback
We study the problem of predicting the next query to be recommended in interactive data exploratory analysis to guide users to correct content. Current query prediction approaches are based on sequence-to-sequence learning, exploiting past ...
De-confounding representation learning for counterfactual inference on continuous treatment via generative adversarial network
Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the ...
Enhancing racism classification: an automatic multilingual data annotation system using self-training and CNN
- Ikram El Miqdadi,
- Soufiane Hourri,
- Fatima Zahra El Idrysy,
- Assia Hayati,
- Yassine Namir,
- Nikola S. Nikolov,
- Jamal Kharroubi
Accurate racism classification is crucial on social media, where racist and discriminatory content can harm individuals and society. Automated racism detection requires gathering and annotating a wide range of diverse and representative data as an ...
Statistical methods utilizing structural properties of time-evolving networks for event detection
With the advancement of technology, real-world networks have become vulnerable to many attacks such as cyber-crimes, terrorist attacks, and financial frauds. Accuracy and scalability are the two principal but contrary characteristics for ...
ArcMatch: high-performance subgraph matching for labeled graphs by exploiting edge domains
- Vincenzo Bonnici,
- Roberto Grasso,
- Giovanni Micale,
- Antonio di Maria,
- Dennis Shasha,
- Alfredo Pulvirenti,
- Rosalba Giugno
Consider a large labeled graph (network), denoted the target. Subgraph matching is the problem of finding all instances of a small subgraph, denoted the query, in the target graph. Unlike the majority of existing methods that are restricted to ...
Detach-ROCKET: sequential feature selection for time series classification with random convolutional kernels
Time Series Classification (TSC) is essential in fields like medicine, environmental science, and finance, enabling tasks such as disease diagnosis, anomaly detection, and stock price analysis. While machine learning models like Recurrent Neural ...
Efficient learning with projected histograms
High dimensional learning is a perennial problem due to challenges posed by the “curse of dimensionality”; learning typically demands more computing resources as well as more training data. In differentially private (DP) settings, this is further ...
Opinion dynamics in social networks incorporating higher-order interactions
The issue of opinion sharing and formation has received considerable attention in the academic literature, and a few models have been proposed to study this problem. However, existing models are limited to the interactions among nearest neighbors, ...
Random walks with variable restarts for negative-example-informed label propagation
Label propagation is frequently encountered in machine learning and data mining applications on graphs, either as a standalone problem or as part of node classification. Many label propagation algorithms utilize random walks (or network ...
Evaluating the disclosure risk of anonymized documents via a machine learning-based re-identification attack
The availability of textual data depicting human-centered features and behaviors is crucial for many data mining and machine learning tasks. However, data containing personal information should be anonymized prior making them available for ...
FRUITS: feature extraction using iterated sums for time series classification
We introduce a pipeline for time series classification that extracts features based on the iterated-sums signature (ISS) and then applies a linear classifier. These features are intrinsically nonlinear, capture chronological information, and, ...
Bounding the family-wise error rate in local causal discovery using Rademacher averages
Many algorithms have been proposed to learn local graphical structures around target variables of interest from observational data, focusing on two sets of variables. The first one, called Parent–Children (PC) set, contains all the variables that ...
Model-agnostic variable importance for predictive uncertainty: an entropy-based approach
In order to trust the predictions of a machine learning algorithm, it is necessary to understand the factors that contribute to those predictions. In the case of probabilistic and uncertainty-aware models, it is necessary to understand not only ...