Statistical analysis on faults localization

Date

2014-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Faults need to be removed from software programs, and faults localization is one of the most time consuming and expensive activities in software testing. Therefore, any improvement in the automatic faults localization can reduce the cost of software testing significantly. Also, any results obtained by the empirical study on this topic can provide useful information for the software testing community.

Although the existing fault localization techniques, especially coverage based fault localization, have been reported being effective, it is still an open research question to explore and study more effective techniques and algorithms better fitted in the fault localization scenario. For example, some data mining and feature selection algorithms, which have been applied in bioinformation problems, could be adopted. On the other hand, it is believed that coverage based fault localization techniques are less effective in the presence of multiple faults. Also, the type and complexity of faults in a given program has influence on the performance of these fault localization techniques. The immediate research question that arises is to what extent fault interactions are influential? How different the performance is for different kinds of faults? Furthermore, fault localizations techniques may be degraded due to presence {\it coincidental correctness} where one or more {\it passing} test cases exercise a faulty statement. The fault localization based on coverage can be improved if all possible instances of coincidental correctness are identified and proper strategies should be employed to either a) avoid these troublesome test cases or b) flip their test status.

This dissertation focuses on three aspects of effective faults localization: 1) proposing more effective techniques; 2) statistically analyzing some factors in faults localization; 3) identifying coincidental correctness. Two novel fault localization techniques were proposed in this dissertation. First, this dissertation introduces the odds ratio metric and its application to the fault localization problem. Second, the notion of program's features is introduced for the purpose of debugging, then feature selection algorithms and in particular feature ranking methods based on statistical hypothesis testing are applied to mining bug signatures. This dissertation statistically measures the influence and significance of fault interactions on the performance of coverage-based fault localizations, and investigates the effect of fault types on coverage-based fault localizations. Finally, this dissertation proposes a technique to effectively identify coincidentally correct test cases. The proposed technique combines support vector machines and ensemble learning to detect mislabeled cases, i.e. coincidentally correct test cases. The ensemble-based support vector machine then can be used to trim a test suite or flip the test status of the coincidental correctness and thus improving the effectiveness of fault localizations.

Description

Keywords

Fault localization, Statistical analysis

Citation