Abstract
nuggets is a framework for subgroup discovery, contrast and emerging patterns, association rules, and more. Developed as a package for the R statistical environment, nuggets provides a novel and extensible toolkit for performing rule-based analyses. Both crisp (Boolean) and fuzzy data are supported. The package generates conditions in the form of elementary conjunctions, evaluates them on a dataset, and checks the induced sub-data for interesting statistical properties. A user defined function may be evaluated on generated sub-dataset, which provides a novel generality. The aim of this paper is to present that free software to the soft computing community, as the tool could be useful to both researchers and analysts in the domain of pattern mining, as, besides searching for various existing pattern types, brand new ideas may be easily implemented and evaluated within that framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
CRAN is the Comprehensive R Archive Network, a network of servers providing identical and up-to-date versions of R and R packages.
- 2.
- 3.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Databases, pp. 487–499. AAAI Press, Chile (1994)
Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015). https://doi.org/10.1002/widm.1144, https://www.kde.cs.uni-kassel.de/atzmueller/paper/atzmueller-subgroup-discovery-advanced-review-wires-2015.pdf
Burda, M., tpnika, M.: lfl: an R package for linguistic fuzzy logic. Fuzzy Sets Syst. 431, 1–38 (2022). https://doi.org/10.1016/j.fss.2021.07.007
Chen, Y., Gan, W., Wu, Y., Yu, P.S.: Contrast pattern mining: A survey (2022). https://doi.org/10.48550/arXiv.2209.13556
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52. KDD ’99, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312129.312191
Eddelbuettel, D., François, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011). http://www.jstatsoft.org/v40/i08/
Glass, D.H.: Fuzzy confirmation measures. Fuzzy Sets Syst. 159(4), 475–490 (2008)
Hájek, P.: The question of a general concept of the GUHA method. Kybernetika 4, 505–515 (1968)
Hájek, P., Havránek, T.: Mechanizing Hypothesis Formation (Mathematical Foundations for a General Theory). Springer-Verlag (1978). https://doi.org/10.1007/978-3-642-66943-9
Hájek, P., Holeňa, M., Rauch, J.: The GUHA method and its meaning for data mining. J. Comput. Syst. Sci. 76, 34–48 (2010)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004). https://doi.org/10.1023/B:DAMI.0000005258.31418.83
Jesus, M.J.D., González, P., Herrera, F.: Subgroup discovery with linguistic rules. In: Bustince, H., Herrera, F., Montero, J. (eds.) Fuzzy Sets and Their Extensions: Representation, Aggregation and Models. Studies in Fuzziness and Soft Computing, vol. 220. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-73723-0_21
Kacprzyk, J., Wilbik, A., Zadrozny, S.: Linguistic summarization of trends: a fuzzy logic based approach. In: Proceedings of the 11th International Conference Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 2166–2172 (2006)
Klement, E., Mesiar, R., Pap, E.: Triangular Norms. Trends in Logic, Springer, Netherlands (2013). https://doi.org/10.1007/978-94-015-9540-7
Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper ii: General constructions and parameterized families. Fuzzy Sets Syst. 145, 411–438 (2004). https://doi.org/10.1016/S0165-0114(03)00327-0
Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM SIGMOD Rec. 27(1), 41–46 (1998)
Potvin, C., Lechowicz, M.J., Tardif, S.: The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures. Ecology 71(4), 1389–1400 (1990). http://www.jstor.org/stable/1938276
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2024). https://www.R-project.org/
Sudkamp, T.: Examples, counterexamples, and measuring fuzzy associations. Fuzzy Sets Syst. 149(1), 57–71 (2005)
Webb, G.: OPUS: an efficient admissible algorithm for unordered search. J. Artif. Intell. Res. (JAIR) 3, 431–465 (11 1995). https://doi.org/10.1613/jair.227
Yager, R.R.: A new approach to the summarization of data. Inf. Sci. 28(1), 69–86 (1982). https://doi.org/10.1016/0020-0255(82)90033-0, https://www.sciencedirect.com/science/article/pii/0020025582900330
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. Tech. rep, USA (1997)
Acknowledgment
The study described is from the project “Research of Excellence on Digital Technologies and Wellbeing CZ.02.01.01/00/22_008/0004583” which is co-financed by the European Union.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Burda, M. (2024). nuggets: Data Pattern Extraction Framework in R. In: Torra, V., Narukawa, Y., Kikuchi, H. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2024. Lecture Notes in Computer Science(), vol 14986. Springer, Cham. https://doi.org/10.1007/978-3-031-68208-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-68208-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68207-0
Online ISBN: 978-3-031-68208-7
eBook Packages: Computer ScienceComputer Science (R0)