A study of the performance of general compressors on log files
Large-scale software systems and cloud services continue to produce a large amount of log data. Such log data is usually preserved for a long time (e.g., for auditing purposes). General compressors, like the LZ77 compressor used in gzip, are ...
On the relationship between bug reports and queries for text retrieval-based bug localization
As societal dependence on software continues to grow, bugs are becoming increasingly costly in terms of financial resources as well as human safety. Bug localization is the process by which a developer identifies buggy code that needs to be fixed ...
A practical guide on conducting eye tracking studies in software engineering
For several years, the software engineering research community used eye trackers to study program comprehension, bug localization, pair programming, and other software engineering tasks. Eye trackers provide researchers with insights on software ...
Detection, assessment and mitigation of vulnerabilities in open source dependencies
Open source software (OSS) libraries are widely used in the industry to speed up the development of software products. However, these libraries are subject to an ever-increasing number of vulnerabilities that are publicly disclosed. It is thus ...
The practitioners’ point of view on the concept of technical debt and its causes and consequences: a design for a global family of industrial surveys and its first results from Brazil
Studying the causes of technical debt (TD) could aid in TD prevention, thus easing the job of TD management. On the other hand, better understanding of the effects of TD could also aid in TD management by facilitating more informed ...
Standing on shoulders or feet? An extended study on the usage of the MSR data papers
The establishment of the Mining Software Repositories (MSR) data showcase conference track has encouraged researchers to provide data sets as a basis for further empirical studies. The objective of this study is to examine the usage of data papers ...
Do code review measures explain the incidence of post-release defects?: Case study replications and bayesian networks
In contrast to studies of defects found during code review, we aim to clarify whether code review measures can explain the prevalence of post-release defects.
MethodWe replicate McIntosh et al.’s (Empirical Softw. Engg. 21(5): 2146–2189, 2016) ...
A comprehensive study on software aging across android versions and vendors
This paper analyzes the phenomenon of software aging – namely, the gradual performance degradation and resource exhaustion in the long run – in the Android OS. The study intends to highlight if, and to what extent, devices from different vendors, ...
An empirical study of the characteristics of popular Minecraft mods
It is becoming increasingly difficult for game developers to manage the cost of developing a game, while meeting the high expectations of gamers. One way to balance the increasing gamer expectation and development stress is to build an active ...
The ‘as code’ activities: development anti-patterns for infrastructure as code
The ‘as code’ suffix in infrastructure as code (IaC) refers to applying software engineering activities, such as version control, to maintain IaC scripts. Without the application of these activities, defects that can have serious ...
Learning actionable analytics from multiple software projects
The current generation of software analytics tools are mostly prediction algorithms (e.g. support vector machines, naive bayes, logistic regression, etc). While prediction is useful, after prediction comes planning about what actions to take in ...
Towards an evidence-based theoretical framework on factors influencing the software development productivity
Productivity refers to the rate at which a company produces goods, and its observation takes into account the number of people and the amount of other necessary resources to deliver such goods. However, it is not clear how to observe ...
Automated issue assignment: results and insights from an industrial case
We automate the process of assigning issue reports to development teams by using data mining approaches and share our experience gained by deploying the resulting system, called IssueTAG, at Softtech. Being a subsidiary of the largest private bank ...
The impact of automated feature selection techniques on the interpretation of defect models
The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. ...
A tailored participatory action research for foss communities
Participatory Action Research (PAR) is an established method to implement change in organizations. However, it cannot be applied in the open source (FOSS) communities, without adaptation to their particularities, especially to the specific control ...
Towards a fictional collective programming scenario: an approach based on the EIF loop
In this paper, we base our research on a fictional collective programming scenario: A group of physically-distributed programmers try to collaboratively solve a programming problem in a web-based development environment, through a continually ...
Automating system test case classification and prioritization for use case-driven testing in product lines
Product Line Engineering (PLE) is a crucial practice in many software development environments where software systems are complex and developed for multiple customers with varying needs. At the same time, many development processes are use case-...
Wait for it: identifying “On-Hold” self-admitted technical debt
Self-admitted technical debt refers to situations where a software developer knows that their current implementation is not optimal and indicates this using a source code comment. In this work, we hypothesize that it is possible to develop ...
What to share, when, and where: balancing the objectives and complexities of open source software contributions
Software-intensive organizations’ rationale for sharing Open Source Software (OSS) may be driven by both idealistic, strategic and commercial objectives, and include both monetary as well as non-monetary benefits. To gain the potential ...
Data-driven software design with Constraint Oriented Multi-variate Bandit Optimization (COMBO)
Software design in e-commerce can be improved with user data through controlled experiments (i.e. A/B tests) to better meet user needs. Machine learning-based algorithmic optimization techniques extends the approach to large number of ...
CGT-FL: using cooperative game theory to effective fault localization in presence of coincidental correctness
In this article we emphasize that most of the faults, appearing in real-world programs, are complicated and there exists a high interaction between faulty and other correlated statements, that is likely to cause coincidental correctness in many ...
The Teamwork Process Antecedents (TPA) questionnaire: developing and validating a comprehensive measure for assessing antecedents of teamwork process quality
Most models of teamwork describe team behavior and effectiveness using an Input-Process-Output approach. In software engineering, the use of such models has focused on understanding and operationalizing the Process-Output components while ...
On the assessment of software defect prediction models via ROC curves
Software defect prediction models are classifiers often built by setting a threshold t on a defect proneness model, i.e., a scoring function. For instance, they classify a software module non-faulty if its defect proneness is below t and positive ...
An empirical investigation on the relationship between design and architecture smells
Architecture of a software system represents the key design decisions and therefore its quality plays an important role to keep the software maintainable. Code smells are indicators of quality issues in a software system and are classified ...
Information correspondence between types of documentation for APIs
Documentation for programming languages and their APIs takes many forms, such as reference documentation, blog posts or other textual and visual media. Prior research has suggested that developers switch between reference and tutorial-like ...
The who, what, how of software engineering research: a socio-technical framework
Software engineering is a socio-technical endeavor, and while many of our contributions focus on technical aspects, human stakeholders such as software developers are directly affected by and can benefit from our research and tool innovations. In ...
Using black-box performance models to detect performance regressions under varying workloads: an empirical study
- Lizhi Liao,
- Jinfu Chen,
- Heng Li,
- Yi Zeng,
- Weiyi Shang,
- Jianmei Guo,
- Catalin Sporea,
- Andrei Toma,
- Sarah Sajedi
Performance regressions of large-scale software systems often lead to both financial and reputational losses. In order to detect performance regressions, performance tests are typically conducted in an in-house (non-production) environment using ...
Evaluating the agreement among technical debt measurement tools: building an empirical benchmark of technical debt liabilities
- Theodoros Amanatidis,
- Nikolaos Mittas,
- Athanasia Moschou,
- Alexander Chatzigeorgiou,
- Apostolos Ampatzoglou,
- Lefteris Angelis
Software teams are often asked to deliver new features within strict deadlines leading developers to deliberately or inadvertently serve “not quite right code” compromising software quality and maintainability. This non-ideal state of software is ...
Systematic mapping study on domain-specific language development tools
- Aníbal Iung,
- João Carbonell,
- Luciano Marchezan,
- Elder Rodrigues,
- Maicon Bernardino,
- Fabio Paulo Basso,
- Bruno Medeiros
Domain-specific languages (DSL) are programming or modeling languages devoted to a given application domain. There are many tools used to support the implementation of a DSL, making hard the decision-making process for one or another. In this ...
Too many images on DockerHub! How different are images for the same system?
Containerization is a technique used to encapsulate a software system and its dependencies into one isolated package, which is called a container. The goal of these containers is to deploy or replicate a software system on various platforms and ...