[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

nuggets: Data Pattern Extraction Framework in R

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14986))

Abstract

nuggets is a framework for subgroup discovery, contrast and emerging patterns, association rules, and more. Developed as a package for the R statistical environment, nuggets provides a novel and extensible toolkit for performing rule-based analyses. Both crisp (Boolean) and fuzzy data are supported. The package generates conditions in the form of elementary conjunctions, evaluates them on a dataset, and checks the induced sub-data for interesting statistical properties. A user defined function may be evaluated on generated sub-dataset, which provides a novel generality. The aim of this paper is to present that free software to the soft computing community, as the tool could be useful to both researchers and analysts in the domain of pattern mining, as, besides searching for various existing pattern types, brand new ideas may be easily implemented and evaluated within that framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 39.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 49.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    CRAN is the Comprehensive R Archive Network, a network of servers providing identical and up-to-date versions of R and R packages.

  2. 2.

    https://github.com/beerda/nuggets.

  3. 3.

    https://cran.r-project.org/package=nuggets.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Databases, pp. 487–499. AAAI Press, Chile (1994)

    Google Scholar 

  2. Atzmueller, M.: Subgroup discovery - advanced review. WIREs: Data Min. Knowl. Discov. 5(1), 35–49 (2015). https://doi.org/10.1002/widm.1144, https://www.kde.cs.uni-kassel.de/atzmueller/paper/atzmueller-subgroup-discovery-advanced-review-wires-2015.pdf

  3. Burda, M., tpnika, M.: lfl: an R package for linguistic fuzzy logic. Fuzzy Sets Syst. 431, 1–38 (2022). https://doi.org/10.1016/j.fss.2021.07.007

  4. Chen, Y., Gan, W., Wu, Y., Yu, P.S.: Contrast pattern mining: A survey (2022). https://doi.org/10.48550/arXiv.2209.13556

  5. Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–52. KDD ’99, Association for Computing Machinery, New York, NY, USA (1999). https://doi.org/10.1145/312129.312191

  6. Eddelbuettel, D., François, R.: Rcpp: seamless R and C++ integration. J. Stat. Softw. 40(8), 1–18 (2011). http://www.jstatsoft.org/v40/i08/

  7. Glass, D.H.: Fuzzy confirmation measures. Fuzzy Sets Syst. 159(4), 475–490 (2008)

    Article  MathSciNet  Google Scholar 

  8. Hájek, P.: The question of a general concept of the GUHA method. Kybernetika 4, 505–515 (1968)

    Google Scholar 

  9. Hájek, P., Havránek, T.: Mechanizing Hypothesis Formation (Mathematical Foundations for a General Theory). Springer-Verlag (1978). https://doi.org/10.1007/978-3-642-66943-9

  10. Hájek, P., Holeňa, M., Rauch, J.: The GUHA method and its meaning for data mining. J. Comput. Syst. Sci. 76, 34–48 (2010)

    Article  MathSciNet  Google Scholar 

  11. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004). https://doi.org/10.1023/B:DAMI.0000005258.31418.83

  12. Jesus, M.J.D., González, P., Herrera, F.: Subgroup discovery with linguistic rules. In: Bustince, H., Herrera, F., Montero, J. (eds.) Fuzzy Sets and Their Extensions: Representation, Aggregation and Models. Studies in Fuzziness and Soft Computing, vol. 220. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-73723-0_21

  13. Kacprzyk, J., Wilbik, A., Zadrozny, S.: Linguistic summarization of trends: a fuzzy logic based approach. In: Proceedings of the 11th International Conference Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 2166–2172 (2006)

    Google Scholar 

  14. Klement, E., Mesiar, R., Pap, E.: Triangular Norms. Trends in Logic, Springer, Netherlands (2013). https://doi.org/10.1007/978-94-015-9540-7

    Book  Google Scholar 

  15. Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. position paper ii: General constructions and parameterized families. Fuzzy Sets Syst. 145, 411–438 (2004). https://doi.org/10.1016/S0165-0114(03)00327-0

  16. Kuok, C.M., Fu, A., Wong, M.H.: Mining fuzzy association rules in databases. ACM SIGMOD Rec. 27(1), 41–46 (1998)

    Article  Google Scholar 

  17. Potvin, C., Lechowicz, M.J., Tardif, S.: The statistical analysis of ecophysiological response curves obtained from experiments involving repeated measures. Ecology 71(4), 1389–1400 (1990). http://www.jstor.org/stable/1938276

  18. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2024). https://www.R-project.org/

  19. Sudkamp, T.: Examples, counterexamples, and measuring fuzzy associations. Fuzzy Sets Syst. 149(1), 57–71 (2005)

    Article  MathSciNet  Google Scholar 

  20. Webb, G.: OPUS: an efficient admissible algorithm for unordered search. J. Artif. Intell. Res. (JAIR) 3, 431–465 (11 1995). https://doi.org/10.1613/jair.227

  21. Yager, R.R.: A new approach to the summarization of data. Inf. Sci. 28(1), 69–86 (1982). https://doi.org/10.1016/0020-0255(82)90033-0, https://www.sciencedirect.com/science/article/pii/0020025582900330

  22. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. Tech. rep, USA (1997)

    Google Scholar 

Download references

Acknowledgment

The study described is from the project “Research of Excellence on Digital Technologies and Wellbeing CZ.02.01.01/00/22_008/0004583” which is co-financed by the European Union.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michal Burda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Burda, M. (2024). nuggets: Data Pattern Extraction Framework in R. In: Torra, V., Narukawa, Y., Kikuchi, H. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2024. Lecture Notes in Computer Science(), vol 14986. Springer, Cham. https://doi.org/10.1007/978-3-031-68208-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-68208-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-68207-0

  • Online ISBN: 978-3-031-68208-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics