Abstract
A major problem faced by software project managers is to develop good quality software products within tight schedules and budget constraints [1]. Predictive modeling, in the context of software engineering relates to construction of models for estimation of software quality attributes such as defect-proneness, maintainability and effort amongst others. For developing such models, software metrics act as predictor variables as they signify various design characteristics of a software such as coupling, cohesion, inheritance and polymorphism. A number of techniques such as statistical and machine learning are available for developing predictive models.
However, conducting effective empirical studies, which develop successful predictive models, is not possible if proper research methodology and steps are not followed. This work introduces a successful stepwise procedure for efficient application of various techniques to predictive modeling. A number of research issues which are important to be addressed while conducting empirical studies such as data collection, validation method, use of statistical tests, use of an effective performance evaluator etc. are also discussed with the help of an example.
The tutorial presents an overview of the research process and methodology followed in an empirical research [2]. All steps that are needed to perform an effective empirical study are described. The tutorial would demonstrate the research methodology with the help of an example based on a data set for defect prediction.
In this work we focus on various research issues that are stated below:
RQ1: Which repositories are available for extracting software engineering data?
RQ2: What type of data pre-processing and feature selection techniques should be used before developing predictive models?
RQ3: Which possible tools are freely available for mining and analysis of data for developing software quality predictive models?
RQ4: Which techniques are available for developing software quality predictive models?
RQ5: Which metrics should be used for performance evaluation for models developed for software?
RQ6: Which statistical tests can be effectively used for hypothesis testing using search-based techniques?
RQ7: How can we effectively use search-based techniques for predictive modeling?
RQ8: What are possible fitness functions while using search-based techniques for predictive modeling?
RQ9: How would researchers account for the stochastic nature of search-based techniques?
The reasons for relevance of this study are manifold. Empirical validation of OO metrics is a critical research area in the present day scenario, with a large number of academicians and research practitioners working towards this direction to predict software quality attributes in the early phases of software development. Thus, we explore the various steps involved in development of an effective software quality predictive model using a modeling technique with an example data set. Performing successful empirical studies in software engineering is important for the following reasons:
• To identify defective classes at the initial phases of software development so that more resources can be allocated to these classes to remove errors.
• To analyze the metrics which are important for predicting software quality attributes and to use them as quality benchmarks so that the software process can be standardized and delivers effective products.
• To efficiently plan testing, walkthroughs, reviews and inspection activities so that limited resources can be properly planned to provide good quality software.
• To use and adapt different techniques (statistical, machine learning & search-based) in predicting software quality attributes.
• To analyze existing trends for software quality predictive modeling and suggest future directions for researchers.
• To document the research methodology so that effective replicated studies can be performed with ease.