Improving the Accuracy of the Logistic Regression Algorithm Model using SelectKBest in Customer Prediction Based on Purchasing Behavior Patterns

Main Article Content

Rofik Rofik
Nurul Hidayat

Abstract

The development of increasingly sophisticated science and technology allows anyone to easily create and run a business. This provides convenience to consumers with a variety of shopping options that are more numerous in this era but also poses challenges in increasingly fierce business competition. Therefore, companies need to develop effective marketing strategies to achieve profitability and sustainable growth. The right marketing strategy should be aimed at meeting customer needs. Several studies have been conducted to classify customers based on their purchasing patterns, but have not applied a fixed combination of features so the accuracy obtained is still not optimal. The purpose of this research is to improve the accuracy of the logistic regression model in predicting customers based on their purchasing behavior patterns with SelectKBest. The proposed new algorithm model is Logistic Regression using feature selection in the form of chi-square scores to improve the combination of the use of features to better fit the characteristics of the predicted model. The first research process is pre-processing, namely performing feature selection with chi-square scores and normalizing data with a standard scaler. The second process is split data, dividing the data into training and testing data. The third process is modeling. Modeling is done with 7 algorithms, namely KNN, Gradient Boosting, Logistic Regression, Decision Tree, Naïve Bayes, SVM, and Random Forest to compare performance. And the fourth is model evaluation. The model is tested using datasets from the UCI Machine Learning repository platform. The evaluation results show that the Logistic Regression algorithm can produce the greatest accuracy of 93.18% with precision, recall, and f1-score of 95% each. This research shows that optimizing the Logistic Regression model with SelectKBest can improve the accuracy of predicting customers based on their purchasing patterns.

Article Details

How to Cite
Rofik, R., & Hidayat, N. (2023). Improving the Accuracy of the Logistic Regression Algorithm Model using SelectKBest in Customer Prediction Based on Purchasing Behavior Patterns. Future Computer Science Journal, 1(1), 9–17. Retrieved from https://asasijournal.com/index.php/fcsj/article/view/8
Section
Articles

References

G. Chaubey, P. R. Gavhane, D. Bisen, and S. K. Arjaria, “Customer purchasing behavior prediction using machine learning classification techniques,” J. Ambient Intell. Humaniz. Comput., no. 0123456789, 2022, https://doi.org/10.1007/s12652-022-03837-6.

C. O. Sakar, S. O. Polat, M. Katircioglu, and Y. Kastro, “Real-time prediction of online shoppers’ purchasing intention using multilayer perceptron and LSTM recurrent neural networks,” Neural Comput. Appl., vol. 31, no. 10, pp. 6893–6908, 2019, https://doi.org/10.1007/s00521-018-3523-0.

A. De Mauro, A. Sestino, and A. Bacconi, “Machine learning and artificial intelligence use in marketing: a general taxonomy,” Ital. J. Mark., vol. 2022, no. 4, pp. 439–457, 2022, https://doi.org/10.1007/s43039-022-00057-w.

L. Fan, “Research on Precision Marketing Strategy of Commercial Consumer Products Based on Big Data Mining of Customer Consumption,” J. Inst. Eng. Ser. C, vol. 104, no. 1, pp. 163–168, 2023, https://doi.org/10.1007/s40032-022-00908-7.

A. Alghamdi, “A Hybrid Method for Big Data Analysis Using Fuzzy Clustering, Feature Selection and Adaptive Neuro-Fuzzy Inferences System Techniques: Case of Mecca and Medina Hotels in Saudi Arabia,” Arab. J. Sci. Eng., vol. 48, no. 2, pp. 1693–1714, 2023, https://doi.org/10.1007/s13369-022-06978-0.

S. Shamshoddin, J. Khader, and S. Gani, “Predicting consumer preferences in electronic market based on IoT and Social Networks using deep learning based collaborative filtering techniques,” Electron. Commer. Res., vol. 20, no. 2, pp. 241–258, 2020, https://doi.org/10.1007/s10660-019-09377-0.

L. Zhao, Y. Zuo, and K. Yada, “Sequential classification of customer behavior based on sequence-to-sequence learning with gated-attention neural networks,” Adv. Data Anal. Classif., 2022, https://doi.org/10.1007/s11634-022-00517-3.

Z. Gharibshah, X. Zhu, A. Hainline, and M. Conway, “Deep Learning for User Interest and Response Prediction in Online Display Advertising,” Data Sci. Eng., vol. 5, no. 1, pp. 12–26, 2020, https://doi.org/10.1007/s41019-019-00115-y.

A. Alsayat, “Customer decision-making analysis based on big social data using machine learning: a case study of hotels in Mecca,” Neural Comput. Appl., vol. 35, no. 6, pp. 4701–4722, 2023, https://doi.org/10.1007/s00521-022-07992-x.

A. Mitra, A. Jain, A. Kishore, and P. Kumar, “A Comparative Study of Demand Forecasting Models for a Multi-Channel Retail Company: A Novel Hybrid Machine Learning Approach,” Oper. Res. Forum, vol. 3, no. 4, pp. 1–22, 2022, https://doi.org/10.1007/s43069-022-00166-4.

S. xia Chen, X. kang Wang, H. yu Zhang, and J. qiang Wang, “Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine,” Expert Syst. Appl., vol. 173, no. January, p. 114756, 2021, https://doi.org/10.1016/j.eswa.2021.114756.

C. Wang, “Efficient customer segmentation in digital marketing using deep learning with swarm intelligence approach,” Inf. Process. Manag., vol. 59, no. 6, p. 103085, 2022, https://doi.org/10.1016/j.ipm.2022.103085.

Y. Zhao, Z. Shao, W. Zhao, J. Han, Q. Zheng, and R. Jing, “Combining unsupervised and supervised classification for customer value discovery in the telecom industry: a deep learning approach,” Computing, 2023, https://doi.org/10.1007/s00607-023-01150-4.

J. Joung and H. Kim, “Interpretable machine learning-based approach for customer segmentation for new product development from online product reviews,” Int. J. Inf. Manage., vol. 70, no. February, p. 102641, 2023, https://doi.org/10.1016/j.ijinfomgt.2023.102641.

Y. Suh, Machine learning based customer churn prediction in home appliance rental business, vol. 10, no. 1. Springer International Publishing, 2023. https://doi.org/10.1186/s40537-023-00721-8.

P. R. Srivastava, P. Eachempati, R. Panigrahi, A. Behl, and V. Pereira, “Analyzing online consumer purchase psychology through hybrid machine learning,” Ann. Oper. Res., 2022, https://doi.org/10.1007/s10479-022-05023-5.

A. Byrne, E. Bonfiglio, C. Rigby, and N. Edelstyn, “A systematic review of the prediction of consumer preference using EEG measures and machine-learning in neuromarketing research,” Brain Informatics, vol. 9, no. 1, 2022, https://doi.org/10.1186/s40708-022-00175-3.

R. A. de Lima Lemos, T. C. Silva, and B. M. Tabak, “Propension to customer churn in a financial institution: a machine learning approach,” Neural Comput. Appl., vol. 34, no. 14, pp. 11751–11768, 2022, https://doi.org/10.1007/s00521-022-07067-x.

D. T. Tran and J. H. Huh, Building a model to exploit association rules and analyze purchasing behavior based on rough set theory, vol. 78, no. 8. Springer US, 2022. https://doi.org/10.1007/s11227-021-04275-5.

N. Hidayat, M. F. Al Hakim, and J. Jumanto, “Halal Food Restaurant Classification Based on Restaurant Review in Indonesian Language Using Machine Learning,” Sci. J. Informatics, vol. 8, no. 2, pp. 314–319, 2021, https://doi.org/10.15294/sji.v8i2.33395.

J. Nagaraju and J. Vijaya, “Boost customer churn prediction in the insurance industry using meta-heuristic models,” Int. J. Inf. Technol., vol. 14, no. 5, pp. 2619–2631, 2022, https://doi.org/10.1007/s41870-022-01017-5.

L. Zhou, H. Fujita, H. Ding, and R. Ma, “Credit risk modeling on data with two timestamps in peer-to-peer lending by gradient boosting,” Appl. Soft Comput., vol. 110, p. 107672, 2021, https://doi.org/10.1016/j.asoc.2021.107672.

S. Isak-Zatega, A. Lipovac, and V. Lipovac, “Logistic regression based in-service assessment of mobile web browsing service quality acceptability,” Eurasip J. Wirel. Commun. Netw., vol. 2020, no. 1, 2020, https://doi.org/10.1186/s13638-020-01708-2.

N. Chaudhuri, G. Gupta, V. Vamsi, and I. Bose, “On the platform but will they buy? Predicting customers’ purchase behavior using deep learning,” Decis. Support Syst., vol. 149, no. May, p. 113622, 2021, https://doi.org/10.1016/j.dss.2021.113622.

Abdullah-All-Tanvir, I. Ali Khandokar, A. K. M. Muzahidul Islam, S. Islam, and S. Shatabda, “A gradient boosting classifier for purchase intention prediction of online shoppers,” Heliyon, vol. 9, no. 4, p. e15163, 2023, https://doi.org/10.1016/j.heliyon.2023.e15163.

S. Baghla and G. Gupta, “Performance Evaluation of Various Classification Techniques for Customer Churn Prediction in E-commerce,” Microprocess. Microsyst., vol. 94, no. September, p. 104680, 2022, https://doi.org/10.1016/j.micpro.2022.104680.

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020, https://doi.org/10.1186/s12864-019-6413-7.