Speaker: Dr. Sunghun Kim
Time:2008-07-17 10:00:00
Place:Room 309-1, Bldg 302, SNU

Abstract

Almost all software contains undiscovered bugs, ones that have not yet been exposed by testing or by users. What is the location of these bugs? This talk presents two approaches for predicting the location of bugs by analyzing software history. First, the bug cache contains 10% of the files in a software project. Through an analysis of the software's development history and the location of bugs, files are added and removed from the cache based on four bug localities: temporal, spatial, changed-entity, and new-entity locality. After processing, files in the bug cache contain 73-95% of undiscovered bugs. Second, to further improve the localization of predicted bugs, automatic change classification uses information from the configuration management commit transactions. Using machine learning techniques (Bayes Net, Support Vector Machines), we classify commits as being likely to have a fault, or unlikely to have a fault. The best precision and recall figures for each project are typically in the mid-70's. Hence, it is possible for a configuration management system to inform a developer, post-commit, that they have just created a bug (with approximately 94% likelihood).

Resources


[ List ]