A k–Skyband Approach for Feature Selection

Marcos Bedo¹²,
Paolo Ciaccia¹³,
Davide Martinenghi¹⁴ &
…
Daniel de Oliveira¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11807))

Included in the following conference series:

International Conference on Similarity Search and Applications

1128 Accesses
2 Citations

Abstract

Distance concentration is a phantom menace for the labeling of high dimensional data by distance-based classifiers. Filter methods reduce data dimensionality, but they also add their ranking bias indirectly into the classification procedure. In this study, we examine the filtering problem from another perspective, in which multiple filters are aggregated according to classifiers’ constraints. Our approach, named S-Filter, is designed as a top-k skyline (k-skyband) search over multiple rankings by relying on the concept of \(\mathcal {F}\)–dominance for weighted and monotone linear functions. Unlike existing approaches, S-Filter provides a deterministic strategy for joining multiple filters and avoids the semantic problem of breaking top-k ties. S-Filter’s first stage uses labeling-driven measures, e.g., F1-Score, for assessing the quality of each filter with regards to a particular classifier, whereas range-tolerance intervals around the initial quality measures define the partial search weights. Next, S-Filter applies the FSA instance-optimal algorithm for selecting all the dimensions that can be among the top-k for a weight within the range-tolerance intervals. Experiments on high dimensional datasets show that S-Filter outperforms state-of-the-art filters in two scenarios: (i) exploratory analysis on varying k and range-tolerance intervals, and (ii) data reduction to its intrinsic dimensionality.

The authors thank the National Council for Scientific and Technological Development and Faperj (G. E-26/203.215/2016 and I. Sed. 2018) for their financial support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 47.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 59.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

Feature ranking based consensus clustering for feature subset selection

Article 21 June 2024

Notes

1.
The most relevant dimensions for a particular set of points are the most prominent data features. Accordingly, we use the terms dimensions and features alternately.
2.
archive.ics.uci.edu/ml/datasets.

References

Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8
Book MATH Google Scholar
Ciaccia, P., Martinenghi, D.: Reconciling skyline and ranking queries. PVLDB 10(11), 1454–1465 (2017)
Google Scholar
Ciaccia, P., Martinenghi, D.: \(FA + TA < FSA\): flexible score aggregation. In: CIKM, pp. 57–66. ACM (2018)
Google Scholar
Drotár, P., Gazda, M., Vokorokos, L.: Ensemble feature selection using election methods and ranker clustering. Inf. Sci. 480, 365–380 (2019)
Article MathSciNet Google Scholar
Fagin, R.: Combining fuzzy information from multiple systems. In: PODS, pp. 216–226 (1996)
Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: SIGMOD, pp. 301–312. ACM (2003)
Google Scholar
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, vol. 112. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7
Book MATH Google Scholar
Navarro, G., Paredes, R., Reyes, N., Bustos, C.: An empirical evaluation of intrinsic dimension estimators. Inf. Syst. 64, 206–218 (2017)
Article Google Scholar
Pestov, V.: An axiomatic approach to intrinsic dimension of a dataset. Neural Netw. 21(2–3), 204–213 (2008)
Article Google Scholar
Pestov, V.: Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions. Algorithmica 66(2), 310–328 (2013)
Article MathSciNet Google Scholar
Roffo, G., Melzi, S., Castellani, U., Vinciarelli, A.: Infinite latent feature selection: a probabilistic latent graph-based ranking approach. In: CVPR (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

INFES, Fluminense Federal University, Santo Antônio de Pádua, Brazil
Marcos Bedo
DISI, Università di Bologna, Bologna, Italy
Paolo Ciaccia
DEIB, Politecnico di Milano, Milan, Italy
Davide Martinenghi
IC, Fluminense Federal University, Niterói, Brazil
Daniel de Oliveira

Authors

Marcos Bedo
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Ciaccia
View author publications
You can also search for this author in PubMed Google Scholar
Davide Martinenghi
View author publications
You can also search for this author in PubMed Google Scholar
Daniel de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos Bedo .

Editor information

Editors and Affiliations

ISTI-CNR, Pisa, Italy
Giuseppe Amato
ISTI-CNR, Pisa, Italy
Claudio Gennaro
New Jersey Institute of Technology, Newark, NJ, USA
Vincent Oria
University of Novi Sad, Novi Sad, Serbia
Miloš Radovanović

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bedo, M., Ciaccia, P., Martinenghi, D., de Oliveira, D. (2019). A k–Skyband Approach for Feature Selection. In: Amato, G., Gennaro, C., Oria, V., Radovanović , M. (eds) Similarity Search and Applications. SISAP 2019. Lecture Notes in Computer Science(), vol 11807. Springer, Cham. https://doi.org/10.1007/978-3-030-32047-8_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-32047-8_15
Published: 23 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32046-1
Online ISBN: 978-3-030-32047-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A k–Skyband Approach for Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

Feature ranking based consensus clustering for feature subset selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A k–Skyband Approach for Feature Selection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ranking Based Unsupervised Feature Selection Methods: An Empirical Comparative Study in High Dimensional Datasets

A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

Feature ranking based consensus clustering for feature subset selection

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation