Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- surveyDecember 2024
Systematic Review of Generative Modelling Tools and Utility Metrics for Fully Synthetic Tabular Data
ACM Computing Surveys (CSUR), Volume 57, Issue 4Article No.: 90, Pages 1–38https://doi.org/10.1145/3704437Sharing data with third parties is essential for advancing science, but it is becoming more and more difficult with the rise of data protection regulations, ethical restrictions, and growing fear of misuse. Fully synthetic data, which transcends ...
- ArticleNovember 2024
A Dynamic Evaluation Metric for Feature Selection
AbstractExpressive evaluation metrics are indispensable for informative experiments in all areas, and while several metrics are established in some areas, in others, such as feature selection, only indirect or otherwise limited evaluation metrics are ...
- ArticleNovember 2024
Robust Statistical Scaling of Outlier Scores: Improving the Quality of Outlier Probabilities for Outliers
AbstractOutlier detection algorithms typically assign an outlier score to each observation in a dataset, indicating the degree to which an observation is an outlier. However, these scores are often not comparable across algorithms and can be difficult for ...
- ArticleNovember 2024
On the Design of Scalable Outlier Detection Methods Using Approximate Nearest Neighbor Graphs
AbstractEfficient and reliable methods for distinguishing outliers in data remain crucial for data analysis. Although supervised methods based on neural networks have gained recent traction, unsupervised methods such as the kNN outlier method and local ...
- research-articleJuly 2024
Evaluating outlier probabilities: assessing sharpness, refinement, and calibration using stratified and weighted measures
Data Mining and Knowledge Discovery (DMKD), Volume 38, Issue 6Pages 3719–3757https://doi.org/10.1007/s10618-024-01056-5AbstractAn outlier probability is the probability that an observation is an outlier. Typically, outlier detection algorithms calculate real-valued outlier scores to identify outliers. Converting outlier scores into outlier probabilities increases the ...
-
- research-articleDecember 2023
Anomaly detection in streaming data: A comparison and evaluation study
Expert Systems with Applications: An International Journal (EXWA), Volume 233, Issue Chttps://doi.org/10.1016/j.eswa.2023.120994AbstractThe detection of anomalies in streaming data faces complexities that make traditional static methods unsuitable due to computational costs and nonstationarity. We test and evaluate eight state of the art algorithms against prominent challenges ...
Highlights- Comparison and study of 8 unsupervised outlier detection methods for streaming data.
- Evaluation with 180 synthetic datasets and 4 real/application datasets.
- Study of challenges related to space geometries, nonstationarity and ...
- ArticleOctober 2023
SDOclust: Clustering with Sparse Data Observers
AbstractSparse Data Observers (SDO) is an unsupervised learning approach developed to cover the need for fast, highly interpretable and intuitively parameterizable anomaly detection. We present SDOclust, an extension that performs clustering while ...
- research-articleOctober 2022
Unsupervised Representation Learning on Attributed Multiplex Network
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementPages 2610–2619https://doi.org/10.1145/3511808.3557486Embedding learning in multiplex networks has drawn increasing attention in recent years and achieved outstanding performance in many downstream tasks. However, most existing network embedding methods either only focus on the structured information of ...
- research-articleOctober 2022
A Simple Meta-path-free Framework for Heterogeneous Network Embedding
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementPages 2600–2609https://doi.org/10.1145/3511808.3557223Network embedding has recently attracted attention a lot since networks are widely used in various data mining applications. Attempting to break the limitations of pre-set meta-paths and non-global node learning in existing models, we propose a simple ...
- ArticleOctober 2022
Similarity-Based Unsupervised Evaluation of Outlier Detection
AbstractThe evaluation of unsupervised algorithm results is one of the most challenging tasks in data mining research. Where labeled data are not available, one has to use in practice the so-called internal evaluation, which is based solely on the data ...
- ArticleSeptember 2021
Handling Class Imbalance in k-Nearest Neighbor Classification by Balancing Prior Probabilities
AbstractIt is well known that recall rather than precision is the performance measure to optimize in imbalanced classification problems, yet most existing methods that adjust for class imbalance do not particularly address the optimization of recall. Here ...
- ArticleSeptember 2021
Non-parametric Semi-supervised Learning by Bayesian Label Distribution Propagation
AbstractSemi-supervised classification methods are specialized to use a very limited amount of labelled data for training and ultimately for assigning labels to the vast majority of unlabelled data. Label propagation is such a technique that assigns ...
- correctionNovember 2020
Correction to: A unified view of density-based methods for semi-supervised clustering and classification
Data Mining and Knowledge Discovery (DMKD), Volume 34, Issue 6Pages 1984–1985https://doi.org/10.1007/s10618-020-00707-7The article, A unified view of density-based methods for semi-supervised.
- research-articleSeptember 2020
Absolute Cluster Validity
IEEE Transactions on Pattern Analysis and Machine Intelligence (ITPM), Volume 42, Issue 9Pages 2096–2112https://doi.org/10.1109/TPAMI.2019.2912970The application of clustering involves the interpretation of objects placed in multi-dimensional spaces. The task of clustering itself is inherently submitted to subjectivity, the <italic>optimal</italic> solution can be extremely costly to discover and ...
- research-articleJune 2020
Internal Evaluation of Unsupervised Outlier Detection
ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 14, Issue 4Article No.: 47, Pages 1–42https://doi.org/10.1145/3394053Although there is a large and growing literature that tackles the unsupervised outlier detection problem, the unsupervised evaluation of outlier detection results is still virtually untouched in the literature. The so-called internal evaluation, based ...
- research-articleNovember 2019
A unified view of density-based methods for semi-supervised clustering and classification
Data Mining and Knowledge Discovery (DMKD), Volume 33, Issue 6Pages 1894–1952https://doi.org/10.1007/s10618-019-00651-1AbstractSemi-supervised learning is drawing increasing attention in the era of big data, as the gap between the abundance of cheap, automatically collected unlabeled data and the scarcity of labeled data that are laborious and expensive to obtain is ...
- ArticleOctober 2019
Subspace Determination Through Local Intrinsic Dimensional Decomposition
AbstractAxis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of ...
- ArticleOctober 2018
On the Correlation Between Local Intrinsic Dimensionality and Outlierness
AbstractData mining methods for outlier detection are usually based on non-parametric density estimates in various variations. Here we argue for the use of local intrinsic dimensionality as a measure of outlierness and demonstrate empirically that it is a ...