Exploring Principal Component Analysis in Defect Prediction: A Survey

Authors

  • Varsha G Palatse Department of Computer Engineering, Government Polytechnic Pune, Maharashtra, India

DOI:

https://doi.org/10.5281/zenodo.3974580

Keywords:

Defect Prediction Model, Principal Component Analysis, Feature Selection, Dimension Reduction, Machine Learning Algorithms, Eigenvector, Eigenvalue

Abstract

The performance of the defect prediction is solely based on the dataset which consist of software metrics. The software metrics or features sometimes very huge in number that makes the dataset complicated and it impacts the classifier or regressor efficiency. The dimension reduction technique is used to solve the problem of massive dimension of features. One of the most commonly used methods to deal with this issue is Principal Component Analysis (PCA). It is a statistical technique used for dimensionality reduction of the vast dataset in machine learning. Large number of research has been taken to predict the defective modules using principal component analysis. Its main function is to reduce the large number of features by extracting the uncorrelated features into groups. It helps to get simpler dataset, easy to handle and visualize, sometime in the cost of accuracy. In line with this, computation complexity of the predicators reduces and takes less time for execution. This survey paper is the exploration of the various studies conducted using principal component analysis for the defect prediction. The survey presented based on 28 studies from 2002 to 2020. It includes topics related to pre-release or post-release defects in software and machine equipment’s related defects. The primary focus of this study is to examine the significance of principal component analysis, its contribution in defect prediction and to identify the expansion in this topic. The study will provide helpful outlines of this subject to on-topic scholars and experts.

Downloads

Download data is not yet available.

References

P. Cunningham, “Dimension reduction,” Cogn. Technol., vol. 1, pp. 91–112, 2008, doi: 10.1201/b18358-4.

N. Kambhatla and T. K. Leen, “Dimension Reduction by Local Principal Component Analysis,” Neural Comput., vol. 9, no.7, pp. 1493–1516, 1997, doi:10.1162/neco.1997.9.7.1493.

I. K. Fodor, “A survey of dimension reduction techniques,” Library (Lond)., vol. 18, no. 1, pp. 1–18, 2002, doi: 10.2172/15002155.

M. Carreira-Perpinán, “A review of dimension reduction techniques,” Dep. Comput. Sci. Univ. Sheffield. Tech. Rep. CS-96-09, pp. 1–69, 1997.

J. Li et al., “Feature selection: A data perspective,” ACM Comput. Surv., vol. 50, no. 6, 2017, doi: 10.1145/3136625.

C. Bartenhagen, H. U. Klein, C. Ruckert, X. Jiang, and M. Dugas, “Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data,” BMC Bioinformatics, vol. 11, 2010, doi: 10.1186/1471-2105-11-567.

A. Hyvarinen, “Survey on Independent Component Analysis,”Neural computing surveys vol. 3, no. 2, pp. 54–67, 1999.

P. Comon. Independent Component Analysis, a new concept?.Signal Processing, Elsevier, 1994, 36, pp.287-314. 10.1016/0165-1684(94)90029-9. hal-00417283.

N. E. Benton and M. Neil, “A critique of software defect prediction models,” IEEE Trans. Softw. Eng., vol. 25, no. 5, pp. 675–689, 1999, doi: 10.1109/32.815326.

P. M. Jain and V.K. Shandliya, “A survey paper on comparative study between Principal Component Analysis ( PCA ) and Exploratory Factor Analysis ( EFA ),” Int. J. Comput. Sci. Appl., vol. 6, no. 2, pp. 373–375, 2013.

A. Kalsoom, M. Maqsood, M. A. Ghazanfar, F. Aadil, and S. Rho, A dimensionality reduction-based efficient software fault prediction using Fisher linear discriminant analysis (FLDA), vol.74, no. 9. 2018.

D. E. Neumann, “An enhanced neural network technique for software risk analysis,” IEEE Trans. Softw. Eng., vol. 28, no. 9, pp. 904–912, 2002, doi: 10.1109/TSE.2002.1033229.

V. Palatse, “Feature selection techniques for software defect prediction: A literature review,” Test Eng. Manag., vol. 83, no.2245, pp. 2245–2253, 2020.

H.Abdi and L. J. Williams, “Principal Component Analysis,”Wiley interdisciplinary reviews: computational statistics, pp. 1–10, 2010.

J Lever, M Krzywinski, N Altman,” POINTS OF SIGNIFICANCE Principal component analysis,” © Nature America, Inc., part of Springer Nature. – 2017

G. Der and B. Everitt, “Correspondence Analysis,” Handb. Stat. Anal. Using SAS, Second Ed., 2001, doi: 10.1201/9781420057553.ch16.

H. Abdi, L. J. Williams, and D. Valentin, “Multiple factor analysis: Principal component analysis for multitable and multiblock datasets,” Wiley Interdiscip. Rev. Comput. Stat., vol. 5, no. 2, pp. 149-179, 2013, doi: 10.1002/wics.1246.

J.Shlens,”A tutorial on principal component analysis,” arXiv preprint arXiv:1404.1100(2014)

B. S. Everitt and J. E. Jackson, “A User’s Guide to Principal Components.,” Biometrics, vol. 48, no. 3, p. 974, 1992, doi: 10.2307/2532367.

L.Smith “A tutorial on Principal Components Analysis ,”Department of Computer Science, University of Otago, Artif. Intell., pp. 1–4, 2004.

H.Abdi , "The Eigen-decomposition: Eigenvalues and eigenvectors." Encyclopedia of measurement and statistics (2007): 304-308.

T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A Systematic Literature Review on Fault Prediction Performance in Software Engineering,” pp. 1–31, 2011.

P. Milano, “An Empirical Evaluation of Fault-Proneness Models Giovanni D e n a r o,” pp. 241–251.

L. C. Briand, “W. L..Melo, and J. Wust. "Assessing the applicability of fault-proneness models across object-oriented software projects." IEEE transactions on Software Engineering 28.7 (2002): 706-720.

T. M. Khoshgoftaar and N. Seliya, “Fault prediction modeling for software quality estimation: Comparing commonly used techniques,” Empir. Softw. Eng., vol. 8, no. 3, pp. 255–283, 2003, doi: 10.1023/A:1024424811345.

A. Malhi and R. X. Gao, “PCA-based feature selection scheme for machine defect classification,” IEEE Trans. Instrum. Meas., vol. 53, no. 6, pp. 1517–1525, 2004, doi: 10.1109/TIM.2004.834070.

A. Widodo, E. Y. Kim, J-D Son , B-S Yang , A. C.C. Tan , D-S Gu , B-K Choi , J. Mathew , “Fault diagnosis of low speed bearing based on relevance vector machine and support vector machine,” Expert Syst. Appl., vol. 36, no. 3 PART 2, pp. 7252–7261, 2009, doi: 10.1016/j.eswa.2008.09.033.

O.R. Seryasat , H. G. Zadeh , M. Ghane , Z. Abooalizadeh , A. Taherkhani , F. Maleki , “Fault Diagnosis of Ball-bearings Using Principal Component Analysis and Support-Vector Machine,” Emerg. Infect. Dis., vol. 4, no. 1, pp. 1–7, 2013, doi: 10.1016/S0304-4017(96)01152-1.

D. You, X. Gao, and S. Katayama, “WPD-PCA-based laser welding process monitoring and defects diagnosis by using FNN and SVM,” IEEE Trans. Ind. Electron., vol. 62, no. 1, pp. 628–636, 2015, doi: 10.1109/TIE.2014.2319216.

L. Saidi, J. Ben Ali, and F. Fnaiech, “Application of higher order spectral features and support vector machines for bearing faults classification,” ISA Trans., vol. 54, pp. 193–206, 2015, doi: 10.1016/j.isatra.2014.08.007.

N. Zuber and R. Bajrić, “Application of artificial neural networks and principal component analysis on vibration signals for automated fault classification of roller element bearings,” Eksploat. i Niezawodn. - Maint. Reliab., vol. 18, no. 2, pp. 299–306, 2016, doi: 10.17531/ein.2016.2.19.

M. Zair, C. Rahmoune, and D. Benazzouz, “Multi-fault diagnosis of rolling bearing using fuzzy entropy of empirical mode decomposition, principal component analysis, and SOM neural network,” Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci., vol.233, no. 9, pp. 3317–3328, 2019, doi: 10.1177/0954406218805510.

A. Panichella, R. Oliveto, and A. De Lucia, “Cross-project defect prediction models: L’Union fait la force,” 2014 Softw. Evol. Week- IEEE Conf. Softw. Maintenance, Reengineering, Reverse Eng. CSMR-WCRE 2014 - Proc., pp. 164–173, 2014, doi: 10.1109/CSMR-WCRE.2014.6747166.

N. Nagappan and T. Ball, “Use of relative code churn measures to predict system defect density,” Proc. - 27th Int. Conf. Softw. Eng. ICSE05, pp. 284–292, 2005, doi: 10.1145/1062455.1062514.

N.Nachiappan and T.Ball,”Using software dependencies and churn metrics to predict field failures: An empirical case study”. First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).IEEE 2007.

A. Yamashita and L. Moonen, “Exploring the impact of inter-smell relations on software maintainability: An empirical study,” Proc. -Int. Conf. Softw. Eng., pp. 682–691, 2013, doi: 10.1109/ICSE.2013.6606614.

S. Aouabdi, M. Taibi, S. Bouras, and N. Boutasseta, “Using multiscale entropy and principal component analysis to monitor gears degradation via the motor current signature analysis,” Mech. Syst. Signal Process., vol. 90, pp. 298–316, 2017, doi: 10.1016/j.ymssp.2016.12.027.

G. P. Bhandari and R. Gupta, “Machine learning based software fault prediction utilizing source code metrics,” Proc. 2018 IEEE 3rd Int. Conf. Comput. Commun. Secur. ICCCS 2018, pp. 40–45, 2018, doi: 10.1109/CCCS.2018.8586805.

D. Di Nucci, F. Palomba, G. De Rosa, G. Bavota, R. Oliveto, and A. De Lucia, “A Developer Centered Bug Prediction Model,” IEEE Trans. Softw. Eng., vol. 44, no. 1, pp. 5–24, 2018, doi: 10.1109/TSE.2017.2659747.

T. Zimmermann and N. Nagappan, “Predicting defects using network analysis on dependency graphs,” Proc. - Int. Conf. Softw. Eng., pp. 531–540, 2008, doi: 10.1145/1368088.1368161.

N. Nagappan, T. Ball, and A. Zeller, “Mining metrics to predict component failures,” Proc. - Int. Conf. Softw. Eng., vol. 2006, pp. 452–461, 2006, doi: 10.1145/1134285.1134349.

X. Yang, K. Tang, and X. Yao, “A learning-to-rank approach to software defect prediction,” IEEE Trans. Reliab., vol. 64, no. 1, pp. 234–246, 2015, doi: 10.1109/TR.2014.2370891.

M. D’Ambros, M. Lanza, and R. Robbes, “Evaluating defect prediction approaches: A benchmark and an extensive comparison,” Empir. Softw. Eng., vol. 17, no. 4–5, pp. 531–577, 2012, doi: 10.1007/s10664-011-9173-9.

H. He , X. Zhang, Q. Wang, J. Ren , J.Liu, X..Zhao And Y. Cheng, “Ensemble MultiBoost Based on RIPPER Classifier for Prediction of Imbalanced Software Defect Data,” IEEE Access, vol. 7, pp. 110333–110343, 2019, doi: 10.1109/access.2019.2934128.

L. Kumar and S. K. Rath, “Hybrid functional link artificial neural network approach for predicting maintainability of object-oriented software,” J. Syst. Softw., vol. 121, pp. 170–190, 2016, doi: 10.1016/j.jss.2016.01.003.

N. T. Hadi and S. Rochimah, “Enhancing Software Defect Prediction Using Principle Component Analysis and Self-Organizing Map,” 2018 Electr. Power, Electron. Commun. Control. Informatics Semin. EECCIS 2018, pp. 320–325, 2018, doi: 10.1109/EECCIS.2018.8692889.

C. M. Pak, T. T. Wang, and X. H. Su, “Software Defect Prediction Using Propositionalization Based Data Preprocessing: An Empirical Study,” Proc. - 2nd Int. Conf. Data Sci. Bus. Anal. ICDSBA 2018, pp. 71–77, 2018, doi: 10.1109/ICDSBA.2018.00021

T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok, “Local vs. global models for effort estimation and defect prediction,” 2011 26th IEEE/ACM Int. Conf. Autom. Softw. Eng. ASE 2011, Proc., pp. 343–351, 2011, doi: 10.1109/ASE.2011.6100072.

N.Yao , J. Yuejun, and T. Kai. "Rail Weld Defect Prediction and Related Condition-Based Maintenance." IEEE Access 8 (2020): 103746-103758.

Y. B. Wang, D. G. Chang, S. R. Qin, Y. H. Fan, H. B. Mu, and G. J. Zhang, “Separating multi-source partial discharge signals using linear prediction analysis and isolation forest algorithm,” IEEE Trans. Instrum. Meas., vol. 69, no. 6, pp. 2734–2742, 2020, doi: 10.1109/TIM.2019.2926688.

Y. M. Chang, C. C. Wei, J. Chen, and P. Hsieh, “An Implementation of Health Prediction in SMT Solder Joint via

Machine Learning,” 2019 IEEE Int. Conf. Big Data Smart Comput. BigComp 2019 - Proc., pp. 12–15, 2019, doi: 10.1109/BIGCOMP.2019.8679428.

Downloads

Published

2020-08-05

How to Cite

[1]
V. G. Palatse, “Exploring Principal Component Analysis in Defect Prediction: A Survey”, pices, vol. 4, no. 4, pp. 56-63, Aug. 2020.

URN