[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3646548.3672595acmconferencesArticle/Chapter ViewAbstractPublication PagessplcConference Proceedingsconference-collections
research-article

Tracing and Fixing Inconsistencies in Clone-and-Own Tabular Data Models

Published: 02 September 2024 Publication History

Abstract

Many data-intensive applications handle tabular data with more advanced structuring and processes than spreadsheets, enabling end-users to copy and adapt tabular data and processes to create new templates or datasets anytime. Recent research advances demonstrated that, in such clone-and-own scenarios, actions performed on the data structure, together with cloning and adaptation actions, can be captured within an operation-based model to prevent the drift of the internal tabular data model. However, this approach is limited by the assumption that each operation must maintain consistency regarding dependencies generated by the domain-specific languages that connect the observed and computed data.
To address this challenge, this paper first introduces an evolved operation-based model that is designed to capture inconsistent tabular data while keeping a fine-grained trace of what part of the model is inconsistent. We then define specific trace operations to either fix a dependency in a model or remove one if its creating process is no longer relevant to the user. These operations support high-level editing scenarios on the tabular data, which enables easily fixing the equivalent of a spreadsheet formula or a process statement, or making the user aware that some part of the model is inconsistent while it is cloned. Additionally, we report on a positive scalability experiment on the tracing of large tabular data models with inconsistencies.

References

[1]
Kerstin Altmanninger, Martina Seidl, and Manuel Wimmer. 2009. A survey on model versioning approaches. International Journal of Web Information Systems 5, 3 (2009), 271–304.
[2]
Michał Antkiewicz, Wenbin Ji, Thorsten Berger, Krzysztof Czarnecki, Thomas Schmorleiz, Ralf Lämmel, Ștefan Stănciulescu, Andrzej Wąsowski, and Ina Schaefer. 2014. Flexible product line engineering with a virtual platform. In Companion Proceedings of the 36th International Conference on Software Engineering. 532–535.
[3]
Wesley KG Assunção, Roberto E Lopez-Herrejon, Lukas Linsbauer, Silvia R Vergilio, and Alexander Egyed. 2017. Reengineering legacy applications into software product lines: a systematic mapping. Empirical Software Engineering 22, 6 (2017), 2972–3016.
[4]
Wesley KG Assunção, Silvia R Vergilio, and Roberto E Lopez-Herrejon. 2020. Automatic extraction of product line architecture and feature models from UML class diagram variants. Information and Software Technology 117 (2020), 106198.
[5]
Thorsten Berger, Jan-Philipp Steghöfer, Tewfik Ziadi, Jacques Robin, and Jabier Martinez. 2020. The state of adoption and the challenges of systematic variability management in industry. Empirical Software Engineering 25 (2020), 1755–1797.
[6]
Enrico Biermann, Claudia Ermel, and Gabriele Taentzer. 2012. Formal foundation of consistent EMF model transformations by algebraic graph transformation. Software & Systems Modeling 11 (2012), 227–250.
[7]
Xavier Blanc, Isabelle Mounier, Alix Mougenot, and Tom Mens. 2008. Detecting model inconsistency through operation-based model construction. In Proceedings of the 30th international conference on Software engineering. 511–520.
[8]
Nassim Bounouas, Mireille Blay-Fornarino, and Philippe Collet. 2023. An Action-based Model to Handle Cloning and Adaptation in Tabular Data Applications. In Proceedings of the 27th ACM International Systems and Software Product Line Conference-Volume A. 201–212.
[9]
Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, and Maristella Matera. 2003. Morgan Kaufmann series in data management systems: Designing data-intensive Web applications. Morgan Kaufmann.
[10]
Yolande E Chan and Veda C Storey. 1996. The use of spreadsheets in organizations: Determinants and consequences. Information & Management 31, 3 (1996), 119–134.
[11]
Zhe Chen and Michael Cafarella. 2013. Automatic web spreadsheet data extraction. In Proceedings of the 3rd International Workshop on Semantic Search over the Web. 1–8.
[12]
Samuel Clemens. 2011. Five Ways To Tell You Have Outgrown Excel.https://www.insightsquared.com/blog/5-ways-to-tell-you-have-outgrown-excel/
[13]
Rob Collie. 2012. Big Data is Just Data, Why Excel “Sucks”, and 1,000 Miles of Data.http://www.powerpivotpro.com/2012/10/big-data-is-just-data-why-excel-sucks-and-1000-miles-of-data/
[14]
Jácome Cunha, Martin Erwig, Jorge Mendes, and João Saraiva. 2016. Model inference for spreadsheets. Automated Software Engineering 23 (2016), 361–392.
[15]
Jácome Cunha, Martin Erwig, and Joao Saraiva. 2010. Automatically inferring classsheet models from spreadsheets. In 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 93–100.
[16]
Jácome Cunha, João P Fernandes, Jorge Mendes, Hugo Pacheco, and Joao Saraiva. 2012. Bidirectional transformation of model-driven spreadsheets. In Theory and Practice of Model Transformations: 5th International Conference, ICMT 2012, Prague, Czech Republic, May 28-29, 2012. Proceedings 5. Springer, 105–120.
[17]
Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature Location in Source Code: A Taxonomy and Survey. Journal of Software: Evolution and Process 25, 1 (2013), 53–95. https://doi.org/10.1002/smr.567
[18]
Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, and Dongmei Zhang. 2019. Semantic structure extraction for spreadsheet tables with a multi-task learning architecture. In Workshop on Document Intelligence at NeurIPS 2019.
[19]
Wensheng Dou, Shi Han, Liang Xu, Dongmei Zhang, and Jun Wei. 2018. Expandable group identification in spreadsheets. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 498–508.
[20]
Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, and Dongmei Zhang. 2021. TabularNet: A neural network architecture for understanding semantic structures of tabular data. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 322–331.
[21]
Yael Dubinsky, Julia Rubin, Thorsten Berger, Slawomir Duszynski, Martin Becker, and Krzysztof Czarnecki. 2013. An exploratory study of cloning in industrial software product lines. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 25–34.
[22]
Gregor Engels and Martin Erwig. 2005. ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 124–133.
[23]
Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake. 2017. Variant-preserving refactorings for migrating cloned products to a product line. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 316–326.
[24]
Stefan Fischer, Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. 2014. Enhancing clone-and-own with systematic reuse for developing software variants. In 2014 IEEE International conference on software maintenance and evolution. IEEE, 391–400.
[25]
Stefan Fischer, Lukas Linsbauer, Roberto E Lopez-Herrejon, and Alexander Egyed. 2015. The ECCO tool: Extraction and composition for clone-and-own. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 665–668.
[26]
Felienne Hermans, Bas Jansen, Sohon Roy, Efthimia Aivaloglou, Alaaeddin Swidan, and David Hoepelman. 2016. Spreadsheets are code: An overview of software engineering approaches applied to spreadsheets. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 5. IEEE, 56–65.
[27]
Tony Hey. 2012. The Fourth Paradigm–Data-Intensive Scientific Discovery. In E-Science and Information Management: Third International Symposium on Information Management in a Changing World, IMCW 2012, Ankara, Turkey, September 19-21, 2012. Proceedings, Vol. 317. Springer, 1.
[28]
Christian Kästner, Alexander Dreiling, and Klaus Ostermann. 2013. Variability mining: Consistent semi-automatic detection of product-line features. IEEE Transactions on Software Engineering 40, 1 (2013), 67–82.
[29]
Timo Kehrer, Thomas Thüm, Alexander Schultheiß, and Paul Maximilian Bittner. 2021. Bridging the gap between clone-and-own and software product lines. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 21–25.
[30]
Dimitrios S Kolovos, Richard F Paige, and Fiona AC Polack. 2006. Model comparison: a foundation for model composition and model transformation testing. In Proceedings of the 2006 international workshop on Global integrated model management. 13–20.
[31]
Jacob Krüger and Thorsten Berger. 2020. Activities and costs of re-engineering cloned variants into an integrated platform. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems. 1–10.
[32]
Jacob Krüger and Thorsten Berger. 2020. An empirical analysis of the costs of clone-and platform-oriented software reuse. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 432–444.
[33]
Yuehua Lin, Jing Zhang, and Jeff Gray. 2004. Model comparison: A key challenge for transformation testing and version control in model driven software development. In OOPSLA Workshop on Best Practices for Model-Driven Software Development, Vol. 108. Citeseer, 6.
[34]
Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. 2018. Variability extraction and modeling for product variants. In Proceedings of the 22nd International Systems and Software Product Line Conference-Volume 1. 250–250.
[35]
Ernst Lippe and Norbert Van Oosterom. 1992. Operation-based merging. In Proceedings of the fifth ACM SIGSOFT symposium on Software development environments. 78–87.
[36]
Roberto Erick Lopez-Herrejon, Sheny Illescas, and Alexander Egyed. 2018. A systematic mapping study of information visualization for software product line engineering. Journal of software: evolution and process 30, 2 (2018), e1912.
[37]
Jabier Martinez, Tewfik Ziadi, Tegawendé F Bissyandé, Jacques Klein, and Yves Le Traon. 2015. Automating the extraction of model-based software product lines from model variants (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 396–406.
[38]
Gabriela K Michelon, Lukas Linsbauer, Wesley KG Assunção, Stefan Fischer, and Alexander Egyed. 2021. A Hybrid Feature Location Technique for Re-engineering Single Systems into Software Product Lines. In 15th International Working Conference on Variability Modelling of Software-Intensive Systems. 1–9.
[39]
Celina M Olszak and Ewa Ziemba. 2007. Approach to building and implementing business intelligence systems. Interdisciplinary Journal of Information, Knowledge, and Management 2, 1 (2007), 135–148.
[40]
Klaus Pohl, Günter Böckle, and Frank J van Der Linden. 2005. Software Product Line Engineering: Foundations, Principles and Techniques. Springer Science & Business Media.
[41]
Julia Rubin, Krzysztof Czarnecki, and Marsha Chechik. 2013. Managing cloned variants: a framework and experience. In Proceedings of the 17th International Software Product Line Conference. 101–110.
[42]
Julia Rubin, Andrei Kirshin, Goetz Botterweck, and Marsha Chechik. 2012. Managing forked product variants. In Proceedings of the 16th International Software Product Line Conference-Volume 1. 156–160.
[43]
Christopher Scaffidi, Mary Shaw, and Brad Myers. 2005. Estimating the numbers of end users and end user programmers. In 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05). IEEE, 207–214.
[44]
Matthew Stephan and James R Cordy. 2013. A Survey of Model Comparison Approaches and Applications.Modelsward (2013), 265–277.
[45]
Christof Tinnes, Timo Kehrer, Mitchell Joblin, Uwe Hohenstein, Andreas Biesdorf, and Sven Apel. 2021. Learning domain-specific edit operations from model repositories with frequent subgraph mining. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 930–942.
[46]
Christof Tinnes, Timo Kehrer, Mitchell Joblin, Uwe Hohenstein, Andreas Biesdorf, and Sven Apel. 2023. Mining domain-specific edit operations from model repositories with applications to semantic lifting of model differences and change profiling. Automated Software Engineering 30, 2 (2023), 17.
[47]
Christof Tinnes, Wolfgang Rössler, Uwe Hohenstein, Torsten Kühn, Andreas Biesdorf, and Sven Apel. 2022. Sometimes you have to treat the symptoms: tackling model drift in an industrial clone-and-own software product line. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1355–1366.
[48]
Yinxing Xue. 2011. Reengineering legacy software products into software product line based on automatic variability analysis. In Proceedings of the 33rd International Conference on Software Engineering. 1114–1117.
[49]
Tewfik Ziadi, Luz Frias, Marcos Aurélio Almeida da Silva, and Mikal Ziane. 2012. Feature identification from the source code of product variants. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 417–422.

Index Terms

  1. Tracing and Fixing Inconsistencies in Clone-and-Own Tabular Data Models

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SPLC '24: Proceedings of the 28th ACM International Systems and Software Product Line Conference
    September 2024
    103 pages
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 September 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Tabular data
    2. agronomy
    3. clone-and-own
    4. model-driven engineering
    5. operation-based modeling
    6. variability management

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    SPLC '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 167 of 463 submissions, 36%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 20
      Total Downloads
    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media