More Web Proxy on the site http://driver.im/

research-article

Tracing and Fixing Inconsistencies in Clone-and-Own Tabular Data Models

Authors:

Nassim Bounouas,

Mireille Blay-Fornarino,

Philippe ColletAuthors Info & Claims

SPLC '24: Proceedings of the 28th ACM International Systems and Software Product Line Conference

Pages 191 - 202

https://doi.org/10.1145/3646548.3672595

Published: 02 September 2024 Publication History

Abstract

Many data-intensive applications handle tabular data with more advanced structuring and processes than spreadsheets, enabling end-users to copy and adapt tabular data and processes to create new templates or datasets anytime. Recent research advances demonstrated that, in such clone-and-own scenarios, actions performed on the data structure, together with cloning and adaptation actions, can be captured within an operation-based model to prevent the drift of the internal tabular data model. However, this approach is limited by the assumption that each operation must maintain consistency regarding dependencies generated by the domain-specific languages that connect the observed and computed data.

To address this challenge, this paper first introduces an evolved operation-based model that is designed to capture inconsistent tabular data while keeping a fine-grained trace of what part of the model is inconsistent. We then define specific trace operations to either fix a dependency in a model or remove one if its creating process is no longer relevant to the user. These operations support high-level editing scenarios on the tabular data, which enables easily fixing the equivalent of a spreadsheet formula or a process statement, or making the user aware that some part of the model is inconsistent while it is cloned. Additionally, we report on a positive scalability experiment on the tracing of large tabular data models with inconsistencies.

References

[1]

Kerstin Altmanninger, Martina Seidl, and Manuel Wimmer. 2009. A survey on model versioning approaches. International Journal of Web Information Systems 5, 3 (2009), 271–304.

[2]

Michał Antkiewicz, Wenbin Ji, Thorsten Berger, Krzysztof Czarnecki, Thomas Schmorleiz, Ralf Lämmel, Ștefan Stănciulescu, Andrzej Wąsowski, and Ina Schaefer. 2014. Flexible product line engineering with a virtual platform. In Companion Proceedings of the 36th International Conference on Software Engineering. 532–535.

Digital Library

[3]

Wesley KG Assunção, Roberto E Lopez-Herrejon, Lukas Linsbauer, Silvia R Vergilio, and Alexander Egyed. 2017. Reengineering legacy applications into software product lines: a systematic mapping. Empirical Software Engineering 22, 6 (2017), 2972–3016.

Digital Library

[4]

Wesley KG Assunção, Silvia R Vergilio, and Roberto E Lopez-Herrejon. 2020. Automatic extraction of product line architecture and feature models from UML class diagram variants. Information and Software Technology 117 (2020), 106198.

Digital Library

[5]

Thorsten Berger, Jan-Philipp Steghöfer, Tewfik Ziadi, Jacques Robin, and Jabier Martinez. 2020. The state of adoption and the challenges of systematic variability management in industry. Empirical Software Engineering 25 (2020), 1755–1797.

Digital Library

[6]

Enrico Biermann, Claudia Ermel, and Gabriele Taentzer. 2012. Formal foundation of consistent EMF model transformations by algebraic graph transformation. Software & Systems Modeling 11 (2012), 227–250.

Digital Library

[7]

Xavier Blanc, Isabelle Mounier, Alix Mougenot, and Tom Mens. 2008. Detecting model inconsistency through operation-based model construction. In Proceedings of the 30th international conference on Software engineering. 511–520.

Digital Library

[8]

Nassim Bounouas, Mireille Blay-Fornarino, and Philippe Collet. 2023. An Action-based Model to Handle Cloning and Adaptation in Tabular Data Applications. In Proceedings of the 27th ACM International Systems and Software Product Line Conference-Volume A. 201–212.

Digital Library

[9]

Stefano Ceri, Piero Fraternali, Aldo Bongio, Marco Brambilla, Sara Comai, and Maristella Matera. 2003. Morgan Kaufmann series in data management systems: Designing data-intensive Web applications. Morgan Kaufmann.

[10]

Yolande E Chan and Veda C Storey. 1996. The use of spreadsheets in organizations: Determinants and consequences. Information & Management 31, 3 (1996), 119–134.

Digital Library

[11]

Zhe Chen and Michael Cafarella. 2013. Automatic web spreadsheet data extraction. In Proceedings of the 3rd International Workshop on Semantic Search over the Web. 1–8.

Digital Library

[12]

Samuel Clemens. 2011. Five Ways To Tell You Have Outgrown Excel.https://www.insightsquared.com/blog/5-ways-to-tell-you-have-outgrown-excel/

[13]

Rob Collie. 2012. Big Data is Just Data, Why Excel “Sucks”, and 1,000 Miles of Data.http://www.powerpivotpro.com/2012/10/big-data-is-just-data-why-excel-sucks-and-1000-miles-of-data/

[14]

Jácome Cunha, Martin Erwig, Jorge Mendes, and João Saraiva. 2016. Model inference for spreadsheets. Automated Software Engineering 23 (2016), 361–392.

Digital Library

[15]

Jácome Cunha, Martin Erwig, and Joao Saraiva. 2010. Automatically inferring classsheet models from spreadsheets. In 2010 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 93–100.

Digital Library

[16]

Jácome Cunha, João P Fernandes, Jorge Mendes, Hugo Pacheco, and Joao Saraiva. 2012. Bidirectional transformation of model-driven spreadsheets. In Theory and Practice of Model Transformations: 5th International Conference, ICMT 2012, Prague, Czech Republic, May 28-29, 2012. Proceedings 5. Springer, 105–120.

Digital Library

[17]

Bogdan Dit, Meghan Revelle, Malcom Gethers, and Denys Poshyvanyk. 2013. Feature Location in Source Code: A Taxonomy and Survey. Journal of Software: Evolution and Process 25, 1 (2013), 53–95. https://doi.org/10.1002/smr.567

[18]

Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, and Dongmei Zhang. 2019. Semantic structure extraction for spreadsheet tables with a multi-task learning architecture. In Workshop on Document Intelligence at NeurIPS 2019.

[19]

Wensheng Dou, Shi Han, Liang Xu, Dongmei Zhang, and Jun Wei. 2018. Expandable group identification in spreadsheets. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 498–508.

Digital Library

[20]

Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, and Dongmei Zhang. 2021. TabularNet: A neural network architecture for understanding semantic structures of tabular data. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 322–331.

Digital Library

[21]

Yael Dubinsky, Julia Rubin, Thorsten Berger, Slawomir Duszynski, Martin Becker, and Krzysztof Czarnecki. 2013. An exploratory study of cloning in industrial software product lines. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 25–34.

Digital Library

[22]

Gregor Engels and Martin Erwig. 2005. ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 124–133.

Digital Library

[23]

Wolfram Fenske, Jens Meinicke, Sandro Schulze, Steffen Schulze, and Gunter Saake. 2017. Variant-preserving refactorings for migrating cloned products to a product line. In 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 316–326.

[24]

Stefan Fischer, Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. 2014. Enhancing clone-and-own with systematic reuse for developing software variants. In 2014 IEEE International conference on software maintenance and evolution. IEEE, 391–400.

Digital Library

[25]

Stefan Fischer, Lukas Linsbauer, Roberto E Lopez-Herrejon, and Alexander Egyed. 2015. The ECCO tool: Extraction and composition for clone-and-own. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 665–668.

[26]

Felienne Hermans, Bas Jansen, Sohon Roy, Efthimia Aivaloglou, Alaaeddin Swidan, and David Hoepelman. 2016. Spreadsheets are code: An overview of software engineering approaches applied to spreadsheets. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 5. IEEE, 56–65.

[27]

Tony Hey. 2012. The Fourth Paradigm–Data-Intensive Scientific Discovery. In E-Science and Information Management: Third International Symposium on Information Management in a Changing World, IMCW 2012, Ankara, Turkey, September 19-21, 2012. Proceedings, Vol. 317. Springer, 1.

[28]

Christian Kästner, Alexander Dreiling, and Klaus Ostermann. 2013. Variability mining: Consistent semi-automatic detection of product-line features. IEEE Transactions on Software Engineering 40, 1 (2013), 67–82.

Digital Library

[29]

Timo Kehrer, Thomas Thüm, Alexander Schultheiß, and Paul Maximilian Bittner. 2021. Bridging the gap between clone-and-own and software product lines. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 21–25.

Digital Library

[30]

Dimitrios S Kolovos, Richard F Paige, and Fiona AC Polack. 2006. Model comparison: a foundation for model composition and model transformation testing. In Proceedings of the 2006 international workshop on Global integrated model management. 13–20.

Digital Library

[31]

Jacob Krüger and Thorsten Berger. 2020. Activities and costs of re-engineering cloned variants into an integrated platform. In Proceedings of the 14th International Working Conference on Variability Modelling of Software-Intensive Systems. 1–10.

Digital Library

[32]

Jacob Krüger and Thorsten Berger. 2020. An empirical analysis of the costs of clone-and platform-oriented software reuse. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 432–444.

Digital Library

[33]

Yuehua Lin, Jing Zhang, and Jeff Gray. 2004. Model comparison: A key challenge for transformation testing and version control in model driven software development. In OOPSLA Workshop on Best Practices for Model-Driven Software Development, Vol. 108. Citeseer, 6.

[34]

Lukas Linsbauer, Roberto Erick Lopez-Herrejon, and Alexander Egyed. 2018. Variability extraction and modeling for product variants. In Proceedings of the 22nd International Systems and Software Product Line Conference-Volume 1. 250–250.

Digital Library

[35]

Ernst Lippe and Norbert Van Oosterom. 1992. Operation-based merging. In Proceedings of the fifth ACM SIGSOFT symposium on Software development environments. 78–87.

Digital Library

[36]

Roberto Erick Lopez-Herrejon, Sheny Illescas, and Alexander Egyed. 2018. A systematic mapping study of information visualization for software product line engineering. Journal of software: evolution and process 30, 2 (2018), e1912.

[37]

Jabier Martinez, Tewfik Ziadi, Tegawendé F Bissyandé, Jacques Klein, and Yves Le Traon. 2015. Automating the extraction of model-based software product lines from model variants (T). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 396–406.

Digital Library

[38]

Gabriela K Michelon, Lukas Linsbauer, Wesley KG Assunção, Stefan Fischer, and Alexander Egyed. 2021. A Hybrid Feature Location Technique for Re-engineering Single Systems into Software Product Lines. In 15th International Working Conference on Variability Modelling of Software-Intensive Systems. 1–9.

Digital Library

[39]

Celina M Olszak and Ewa Ziemba. 2007. Approach to building and implementing business intelligence systems. Interdisciplinary Journal of Information, Knowledge, and Management 2, 1 (2007), 135–148.

[40]

Klaus Pohl, Günter Böckle, and Frank J van Der Linden. 2005. Software Product Line Engineering: Foundations, Principles and Techniques. Springer Science & Business Media.

Digital Library

[41]

Julia Rubin, Krzysztof Czarnecki, and Marsha Chechik. 2013. Managing cloned variants: a framework and experience. In Proceedings of the 17th International Software Product Line Conference. 101–110.

Digital Library

[42]

Julia Rubin, Andrei Kirshin, Goetz Botterweck, and Marsha Chechik. 2012. Managing forked product variants. In Proceedings of the 16th International Software Product Line Conference-Volume 1. 156–160.

Digital Library

[43]

Christopher Scaffidi, Mary Shaw, and Brad Myers. 2005. Estimating the numbers of end users and end user programmers. In 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05). IEEE, 207–214.

Digital Library

[44]

Matthew Stephan and James R Cordy. 2013. A Survey of Model Comparison Approaches and Applications.Modelsward (2013), 265–277.

[45]

Christof Tinnes, Timo Kehrer, Mitchell Joblin, Uwe Hohenstein, Andreas Biesdorf, and Sven Apel. 2021. Learning domain-specific edit operations from model repositories with frequent subgraph mining. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 930–942.

Digital Library

[46]

Christof Tinnes, Timo Kehrer, Mitchell Joblin, Uwe Hohenstein, Andreas Biesdorf, and Sven Apel. 2023. Mining domain-specific edit operations from model repositories with applications to semantic lifting of model differences and change profiling. Automated Software Engineering 30, 2 (2023), 17.

Digital Library

[47]

Christof Tinnes, Wolfgang Rössler, Uwe Hohenstein, Torsten Kühn, Andreas Biesdorf, and Sven Apel. 2022. Sometimes you have to treat the symptoms: tackling model drift in an industrial clone-and-own software product line. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1355–1366.

Digital Library

[48]

Yinxing Xue. 2011. Reengineering legacy software products into software product line based on automatic variability analysis. In Proceedings of the 33rd International Conference on Software Engineering. 1114–1117.

Digital Library

[49]

Tewfik Ziadi, Luz Frias, Marcos Aurélio Almeida da Silva, and Mikal Ziane. 2012. Feature identification from the source code of product variants. In 2012 16th European Conference on Software Maintenance and Reengineering. IEEE, 417–422.

Digital Library

Index Terms

Tracing and Fixing Inconsistencies in Clone-and-Own Tabular Data Models
1. Software and its engineering
  1. Software notations and tools
    1. Software configuration management and version control systems

Recommendations

An Action-based Model to Handle Cloning and Adaptation in Tabular Data Applications
SPLC '23: Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A

Many software systems require diverse data gathering and handling through processes that manipulate tabular data, often with a spreadsheet orientation. Variability in tabular data cannot be captured in a complete up-front analysis as everything is done ...
Fixing inconsistencies of fuzzy spatiotemporal XML data

Fuzzy spatiotemporal data models have been used to support spatial and temporal knowledge representation and reasoning in the presence of fuzziness. In the meantime, XML is expected to become the next generation standard language for exchanging data ...
SCULPT: A Schema Language for Tabular Data on the Web
WWW '15: Proceedings of the 24th International Conference on World Wide Web

Inspired by the recent working effort towards a recommendation by the World Wide Web Consortium (W3C) for tabular data and metadata on the Web, we present in this paper a concept for a schema language for tabular web data called SCULPT. The language ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SPLC '24: Proceedings of the 28th ACM International Systems and Software Product Line Conference

September 2024

103 pages

ISBN:9798400705939

DOI:10.1145/3646548

Copyright © 2024 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 September 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SPLC '24

Sponsor:

SIGSOFT

SPLC '24: 28th ACM International Systems and Software Product Line Conference

September 2 - 6, 2024

Dommeldange, Luxembourg

Acceptance Rates

Overall Acceptance Rate 167 of 463 submissions, 36%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
20
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)5

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents