Follow-up Period Classification of Type 2 Diabetes Patients using Data Mining Techniques
Abstract
Objective: This study investigates the use of high-performance data mining techniques to predict the follow-up period of diabetes patients.
Material and Methods: The diabetes dataset was obtained from Pak Phanang hospital in Nakhon Si Thammarat, Thailand. The hospital acquired the data between January 1 and December 31, 2022. The hospital-based retrospective study was based on 2,042 records, featuring 14 independent factors; including age, gender, systolic blood pressure, diastolic blood pressure, body mass index, pulse, weight, height, waist, smoking, drinking, parental history of diabetes, and fasting blood sugar and creatinine levels. To predict the follow-up period of diabetes patients, six well-known classification models were employed: Random Forest (RF), Extra Trees Classifier (ETC), Adaptive Boosting (Adaboost), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Artificial Neural Network (ANN). Class imbalances were addressed by using the Synthetic Minority Oversampling Technique (SMOTE), and feature importance was handled using the RF model.
Results: The experimental results demonstrated that, by applying SMOTE together with Random forest feature selection, the Support vector machine outperformed the other models; exhibiting the highest performances with a weighted precision of 0.9296.
Conclusion: The results indicated that incorporating both SMOTE and feature selection resulted in significantly improved accuracy in predicting the follow-up period of diabetes patients for most models. Therefore, doctors and related healthcare providers could employ our proposed web-based tool to effectively schedule follow-up care for diabetes patients.
Keywords
Full Text:
PDFReferences
International Diabetes Federation. IDF Diabetes Atlas. 8th ed. [homepage on the Internet]. Brussels: International Diabetes Federation; 2017 [cited 2024 Sep 4]. Available from: https://diabetesatlas.org/eighth-edition/
International Diabetes Federation. IDF Diabetes Atlas. 10th ed. [homepage on the Internet]. Brussels: International Diabetes Federation; 2021 [cited 2023 Jul 24]. Available from:https://diabetesatlas.org/tenth-edition/
Strategy and Planning Division. Public Health Statistics A.D. 2021 [homepage on the Internet]. Nonthaburi: Strategy and Planning Division; 2021 [cited 2023 Jul 24]. Available from:https://dmsic.moph.go.th/index/detail/9127
Chumpong S, Chumpong K. A retrospective analysis of the relationship of non-communicable diseases. Princess Naradhiwas Univ J 2023;15:193–210.
ElSayed NA, Aleppo G, Aroda VR, Bannuru RR, Brown FM, Bruemmer D, et al. 13 older adults: standards of care in diabetes—2023. Diabetes Care 2022;46:S216–29.
Gregg EW, Geiss LS, Saaddine J, Fagot-Campagna A, Beckles G, Parker C, et al. Use of diabetes preventive care and complications risk in two African-American communities. Am J Prev Med 2001;21:197–202.
Hu M, Zhou Z, Zeng F, Sun Z. Effects of frequency of follow-up on quality of life of type 2 diabetes patients on oral hypoglycemics. Diabetes Technol Ther 2012;14:777–82.
Zhao Q, Li H, Ni Q, Dai Y, Zheng Q, Wang Y, et al. Follow-up frequency and clinical outcomes in patients with type 2 diabetes: a prospective analysis based on multicenter real-world data. J Diabetes 2022;14:306–14.
Diabetes Association of Thailand. Medical Practice Guidelines for Diabetes 2023 [homepage on the Internet]. Bangkok: Diabetes Association of Thailand; 2023 [cited 2023 Aug 12]. Available from: https://www.dmthai.org/new/index.php/activities-and-news/news-pr/naewthang-wech-ptibatisahrab-rokh-bea-hwan-2566
Perveen S, Shahbaz M, Guergachi A, Keshavjee K. Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput Sci 2016;82:115–21.
Mujumdar A, Vaidehi V. Diabetes prediction using machine learning algorithms. Procedia Comput Sci 2019;165:292–9.
Kazerouni F, Bayani A, Asadi F, Saeidi L, Parvizi N, Mansoori Z. Type 2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches. BMC bioinformatics 2020;21:1-13.
Pranto B, Mehnaz SM, Mahid EB, Sadman IM, Rahman A, Momen S. Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information 2020;11:374.
Vijayan V, Ravikumar A. Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. IJCA 2014;95:17.
Sooklal S, Hosein P. A benefit optimization approach to the evaluation of classification algorithms. In artificial intelligence and applied mathematics in engineering problems: proceedings of the international conference on artificial intelligence and applied mathematics in engineering. Antalya: Springer International Publishing 2019;2020:35-46.
Nnamoko N, Korkontzelos I. Efficient treatment of outliers and class imbalance for diabetes prediction. Artif Intell Med 2020;104:101815.
Hairani H, Saputro KE, Fadli S. K-means-SMOTE for handling class imbalance in the classification of diabetes with C4. 5, SVM, and naive Bayes. J Teknol dan Sist Komput 2020;8:89-93.
Erlin, Marlim YN, Junadhi, Suryati L, Agustina N. Early detection of diabetes using machine learning with logistic regression algorithm. JNTETI 2022;11,88-96.
Pradhan M, Bamnote GR. Design of classifier for detection of diabetes mellitus using genetic programming. In proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications. FICTA 2015;1:763-70.
Saxena R, Sharma SK, Gupta M, Sampada GC. A novel approach for feature selection and classification of diabetes mellitus: machine learning methods. Comput Intell Neurosci 2022;1:3820360.
Ilango BS, Ramaraj N. A hybrid prediction model with F-score feature selection for type II diabetes databases. In proceedings of the 1st Amrita ACM-W celebration on women in computing in India; 2010 Sep 16-17; Coimbatore, India. New York: Association for Computing Machinery; 2010.p.1-4.
Astuti LW, Saluza I, Yulianti E, Dhamayanti D. Feature selection menggunakan binary wheal optimizaton algorithm (BWOA) pada klasifikasi penyakit diabetes. J Ilm Inform Glob 2022;13:7-12.
Gu Q, Wang XM, Wu Z, Ning B, Xin CS. An improved SMOTE algorithm based on genetic algorithm for imbalanced data classification. J Digit Inf Manag 2016;14:92–103.
Chen YF, Lin CS, Wang KA, Rahman LOA, Lee DJ, Chung WS, et al. Design of a clinical decision support system for fracture prediction using imbalanced dataset. J Healthc Eng 2018;2018:13.
Ishaq A, Sadiq S, Umer M, Ullah S, Mirjalili S, Rupapara V, et al. Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access 2021;9:39707–16.
Kumar R, Arora R, Bansal V, Sahayasheela VJ, Buckchash H, Imran J, et al. Accurate prediction of COVID-19 using chest X-Ray images through deep feature learning model with SMOTE and machine learning classifiers. medRxiv 2020; doi: https://doi.org/10.1101/2020.04.13.20063461.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. JLMR 2011;12:2825-30.
Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, et al. A comparison of random forest and its gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 2009;10:213.
Breiman L. Random forests. Mach Learn 2001;45:5–32.
Sharaff A, Gupta H. Extra-tree classifier with metaheuristics approach for email classification. In: Bhatia SK, Tiwari S, Mishra KK, Trivedi MC, editors. Advances in intelligent systems and computing. Bangkok: Springer link; 2019;p.189-97.
Freund Y, Schapire R, Abe N. A short introduction to boosting. J-Jpn Soc Artif Intell 1999;14:771–80.
Suthaharan S. Support vector machine. In: Machine learning models and algorithms for big data classification. MA: Springer US; 2016;p.207–35.
Gou J, Ma H, Ou W, Zeng S, Rao Y, Yang H. A generalized mean distance-based k-nearest neighbor classifier. Expert Syst Appl 2019;115:356–72.
Majid AM, Utomo WH. Application of discretization and adaboost method to improve accuracy of classification algorithms in predicting diabetes mellitus. ICIC express letters. Part B, Applications: an international journal of research and surveys 2021;12:1177-84.
Alfian G, Syafrudin M, Ijaz MF, Syaekhoni MA, Fitriyani NL, Rhee J. A personalized healthcare monitoring system for diabetic patients by utilizing BLE-based sensors and real-time data processing. Sensors 2018;18:2183.
Alfian G, Syafrudin M, Fahrurrozi I, Fitriyani NL, Atmaji FTD, Widodo T, et al. Predicting breast cancer from risk factors using SVM and extra-trees-based feature selection method. Computers 2022;11:136.
Fitriyani NL, Syafrudin M, Alfian G, Rhee J. HDPM: an effective heart disease prediction model for a clinical decision support system. IEEE Access 2020;8:133034–50.
Krebs J, Negatsch V, Berg C, Aigner A, Opitz-Welke A, Seidel P, et al. Applicability of two violence risk assessment tools in a psychiatric prison hospital population. Behav Sci Law 2020;38:471-81.
Syafrudin M, Alfian G, Fitriyani NL, Anshari M, Hadibarata T, Fatwanto A, et al. A self-care prediction model for children with disability based on genetic algorithm and extreme gradient boosting. Mathematics 2020;8:1590.
Yu CS, Lin YJ, Lin CH, Lin SY, Wu JL, Chang SS. Development of an online health care assessment for preventive medicine: a machine learning approach. J Med Internet Res 2020;22:e18585.
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.