Abstract:
Nowadays, software is very influential on various sectors of life, both to solve business needs, as well as personal needs. To have a Software with high quality, testing is needed to avoid software defect. Research on software defects involving Machine Learning is currently being carried out by many researchers. This method contains one important step, which is called feature selection. In this study, researchers conducted a feature selection based on the software metric category to determine the level of accuracy of the prediction of software defects by utilizing 13 (thirteen) datasets from NASA MDP namely CM1, JM1, KC1, KC3, KC4, MC1, MC2, MW1, PC1, PC2, PC3, PC4, and PC5. To classify, the researchers involved 5 (five) classifiers, namely Naive Bayes, Decision Trees, Random Forests, K-Nearest Neighbor, and Support Vector Machines. The research result shows that each attribure on software metric categories has effect on each dataset. Naive Bayes Algorithm and Random Forest Algorithm can give better performance than other algorithm in classifieng software defect with feature selection based on metrics. On the other hand, the best metrics category on each classifier algorithm is metric Misc. From average AUC value, it can be concluded that metrics category which can give best performance is metric LoC, followed by metric Misc. Both categories have achieved highest AUC value in Random Forest classifier.
Keywords: Software Defect Prediction, Metrics Based Feature Selection, Software Quality