Feature selection and embedding based cross project framework for identifying crashing fault residence

Xu, Z., Zhang, T., Keung, J., Yan, M., Luo, X., Zhang, X., Xu, L. and Tang, Y. (2021) Feature selection and embedding based cross project framework for identifying crashing fault residence. Information and Software Technology, 131, 106452. (doi: 10.1016/j.infsof.2020.106452)

Full text not currently available from Enlighten.


Context: The automatically produced crash reports are able to analyze the root of fault causing the crash (crashing fault for short) which is a critical activity for software quality assurance. Objective: Correctly predicting the existence of crashing fault residence in stack traces of crash report can speed up program debugging process and optimize debugging efforts. Existing work focused on the collected label information from bug-fixing logs, and the extracted features of crash instances from stack traces and source code for Identification of Crashing Fault Residence (ICFR) of newly-submitted crashes. This work develops a novel cross project ICFR framework to address the data scarcity problem by using labeled crash data of other project for the ICFR task of the project at hand. This framework removes irrelevant features, reduces distribution differences, and eases the class imbalance issue of cross project data since these factors may negatively impact the ICFR performance. Method: The proposed framework, called FSE, combines Feature Selection and feature Embedding techniques. The FSE framework first uses an information gain ratio based feature ranking method to select a relevant feature subset for cross project data, and then employs a state-of-the-art Weighted Balanced Distribution Adaptation (WBDA) method to map features of cross project data into a common space. WBDA considers both marginal and conditional distributions as well as their weights to reduce data distribution discrepancies. Besides, WBDA balances the class proportion of each project data to alleviate the class imbalance issue. Results: We conduct experiments on 7 projects to evaluate the performance of our FSE framework. The results show that FSE outperforms 25 methods under comparison. Conclusion: This work proposes a cross project learning framework for ICFR, which uses feature selection and embedding to remove irrelevant features and reduce distribution differences, respectively. The results illustrate the performance superiority of our FSE framework.

Item Type:Articles
Additional Information:This work is supported by the National Key Research and Development Project (No. 2018YFB2101200), the National Natural Science Foundation of China (No. 62002034), China Postdoctoral Science Foundation (No. 2020M673137, No. 2017M621247), the Natural Science Foundation of Chongqing in China (No. cstc2020jcyj-bshX0114), the Science and Technology Development Fund of Macau (No. 0047/2020/A1), Faculty Research Grant Projects of MUST (No. FRG-20-008-FI), Hong Kong Research Grant Council Project (No. 152239/18E), the General Research Fund of the Research Grant Council of Hong Kong (No. 11208017), the Fundamental Research Funds for the Central Universities (No. 2020CDJQY-A021, No. 2019CDYGYB014).
Glasgow Author(s) Enlighten ID:Tang, Dr Yutian
Creator Roles:
Tang, Y.Writing – review and editing
Authors: Xu, Z., Zhang, T., Keung, J., Yan, M., Luo, X., Zhang, X., Xu, L., and Tang, Y.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Information and Software Technology
ISSN (Online):1873-6025
Published Online:15 October 2020

University Staff: Request a correction | Enlighten Editors: Update this record