Impact of Principal Component Analysis on the Performance of Machine Learning Models for the Prediction of Length of Stay of Patients
Abstract
Patient inflow, limited resources, criticality of diseases and service quality factors have made it essential for the hospital administration to predict the length of stay (LOS) for inpatients as well as outpatients. An efficient and effective LOS prediction tool can improve the patient care and minimize the cost of service by increasing the efficiency of the system through optimal allocation of available resources in the hospital. For predicting patient’s LOS, machine learning (ML) models can have encouraging results. In this paper, five ML algorithms, namely linear regression, k- nearest neighbours, decision trees, random forest, and gradient boosting regression, have been used to predict the LOS for the patients admitted to the hospital with some medical history, laboratory measurements, and vital signs collected before admission. Additionally, the impact of principal component analysis (PCA) has been analyzed on the predictive performance of all ML algorithms. A five-fold cross-validation technique has been used to validate the results of proposed ML model. The results concluded that the RF and GB model performs better with score of 0.856 and 0.855 respectively among all the ML models without using PCA. However, the accuracy of all the models increased with the PCA except KNN and LR. The GB model when used with principal components has score and MSE approximate to 0.908 and 0.49 respectively compared to the model that incorporates with the original data. Additionally, PCA has an advantageous effect on the DT, RF and GB models. Therefore, LOS for new patients can be predicted effectively using the proposed tree-based RF and GB model with using PCA.
Downloads
References
Oksuzyan A, Höhn A, Pedersen JK, Rau R, Lindahl-Jacobsen R, Christensen K. Preparing for the future: The changing demographic composition of hospital patients in Denmark between 2013 and 2050. PLoS One, Vol.15, pp. 1–12, 2020, doi: 10.1371/journal.pone.0238912. DOI: https://doi.org/10.1371/journal.pone.0238912
Guidet B, van der Voort PHJ, Csomos A. Intensive care in 2050: healthcare expenditure. Intensive Care Med, Vol. 43, pp. 1141–1143, 2017, doi:10.1007/s00134-017-4679-2. DOI: https://doi.org/10.1007/s00134-017-4679-2
Bsbiology VJC, Cristian A. Inpatient Rehabilitation Outcome Mea- sures in Persons With Brain and Spinal Cord Cancer. Cent Nerv Syst Cancer Rehabil 2019.
Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. Proc - 2014 13th Int Conf Mach Learn Appl ICMLA 2014 2014; pp. 428–431, 2014, doi:10.1109/ICMLA.2014.76. DOI: https://doi.org/10.1109/ICMLA.2014.76
Mitchell R, Banks C. Emergency departments and the COVID-19 pandemic: Making the most of limited resources. Emerg Med J, Vol. 37, pp. 258–259, 2020, doi:10.1136/emermed-2020-209660. DOI: https://doi.org/10.1136/emermed-2020-209660
Nhdi N Al, Asmari H Al, Thobaity A Al. Investigating indicators of waiting time and length of stay in emergency departments. Open Access Emerg Med Vol. 13, pp. 311–318, 2021, doi:10.2147/OAEM.S316366. DOI: https://doi.org/10.2147/OAEM.S316366
Zhuang Z, Cao P, Zhao S, Han L, He D, Yang L. The shortage of hospital beds for COVID-19 and non-COVID-19 patients during the lockdown of Wuhan, China. Ann Transl Med, Vol. 9, pp. 200–200, 2021, doi:10.21037/atm-20-5248. DOI: https://doi.org/10.21037/atm-20-5248
Baek H, Cho M, Kim S, Hwang H, Song M, Yoo S. Analysis of length of hospital stay using electronic health records: A statistical and data mining approach. PLoS One, Vol. 13, pp.1–16, 2018, doi: 10.1371/journal.pone.0195901. DOI: https://doi.org/10.1371/journal.pone.0195901
Lequertier V, Wang T, Fondrevelle J, Augusto V, Duclos A. Hospital Length of Stay Prediction Methods: A Systematic Review. Med Care, Vol. 59, pp. 929–938, 2021, doi:10.1097/ MLR.0000000000001596. DOI: https://doi.org/10.1097/MLR.0000000000001596
Mittal H, Sharma N. A Probabilistic Model for the Assessment of Queuing Time of Coronavirus Disease (COVID-19) Patients using Queuing Model. Int J Adv Res Eng Technol., Vol.11, pp. 22–31, 2020, doi:10.34218/IJARET.11.8.2020.004.
Khosravizadeh O, Vatankhah S, Bastani P, Kalhor R, Alirezaei S, Doosty F. Factors affecting length of stay in teaching hospitals of a middle-income country. Electron Physician, Vol. 8, pp. 3042–3047, 2016, doi:10.19082/3042. DOI: https://doi.org/10.19082/3042
Maulud D, Abdulazeez AM. A Review on Linear Regression Comprehensive in Machine Learning. J Appl Sci Technol Trends, Vol.1, pp.140–147, 2020, doi:10.38094/jastt1457. DOI: https://doi.org/10.38094/jastt1457
Uddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep., Vol. 12, pp.1–11, 2022, doi:10.1038/s41598-022-10358-x. DOI: https://doi.org/10.1038/s41598-022-10358-x
Nsenge Mpia H, Kasolen MK, Baraka VM, Inipaivudu Baelani N. Stacking Regression-Based Model for Predicting Patient’s Length of Stay in a Semi Urban Hospital. Int J Res Publ. Rev., Vol. 04, pp. :273–285, 2023, doi:10.55248/gengpi.2023.4212. DOI: https://doi.org/10.55248/gengpi.2023.4212
Biau∗ G. Analysis of a Random Forests Model. J Of Machine Learn Res., Vol.13, pp. 1063–1095, 2012.
Wu Y. Linear regression in machine learning. Anal Vidhya, Vol. 161, 2022, doi:10.1117/12.2628053. DOI: https://doi.org/10.1117/12.2628053
Timbers T, Trevor C, Lee M, Peng R. Chapter 7 Regression I: K-nearest neighbors | Data Science. Chapter 7 Regres I K-Nearest Neighbors | Data Sci n.d. https://datasciencebook.ca.
Goantiya R. Tree Based Modeling Techniques Applied to Hospital Length of Stay. Rochester Inst Technol., Vol. 81, 2018.
Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. IJCSI Int J Comput. Sci Issues Vol. 9, pp. 272–278, 2012.
Aziz N, Akhir EAP, Aziz IA, Jaafar J, Hasan MH, Abas ANC. A Study on Gradient Boosting Algorithms for Development of AI Monitoring and Prediction Systems. 2020 Int Conf Comput Intell ICCI 2020 pp.11–16, 2020, doi:10.1109/ICCI51257.2020.9247843. DOI: https://doi.org/10.1109/ICCI51257.2020.9247843
Zhang C, Cao L, Romagnoli A. On the feature engineering of building energy data mining. Sustain Cities Soc., Vol. 39, pp. 508–518, 2018, doi:10.1016/j.scs.2018.02.016. DOI: https://doi.org/10.1016/j.scs.2018.02.016
Sophian A, Tian GY, Taylor D, Rudlin J. A feature extraction technique based on principal component analysis for pulsed Eddy current NDT. NDT E Int., Vol. 36, pp. 37–41, 2003, doi:10.1016/S0963-8695(02)00069-5. DOI: https://doi.org/10.1016/S0963-8695(02)00069-5
Rodríguez JD, Pérez A, Lozano JA. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans Pattern Anal Mach Intell, Vol. 32, pp. 569–575, 2010, doi:10.1109/TPAMI.2009.187. DOI: https://doi.org/10.1109/TPAMI.2009.187
Binieli M. Machine learning: an introduction to mean squared error and regression lines, pp. 1–21, 2020.
Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci., Vol, 7, pp.1–24, 2021, doi:10.7717/PEERJ-CS.623. DOI: https://doi.org/10.7717/peerj-cs.623
Gutierrez JMP, Sicilia MA, Sanchez-Alonso S, Garcia-Barriocanal E. Predicting Length of Stay across Hospital Departments. IEEE Access, Vol.9, pp. 44671–44680, 2021, doi:10.1109/ ACCESS.2021.3066562. DOI: https://doi.org/10.1109/ACCESS.2021.3066562
Andersson O. Predicting Patient Length Of Stay at Time of Admission Using Machine Learning. Stock SWEDEN 2019.
Gentimis T, Alnaser AJ, Durante A, Cook K, Steele R. Predicting hospital length of stay using neural networks on MIMIC III data. Proc - 2017 IEEE 3rd Int Conf Big Data Intell Comput n.d., pp. 1194–1201, 2017, doi:10.1109/DASC-PICom-DataComCyberSciTec.2017.191. DOI: https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.191
Hijry H, Olawoyin R. Application of machine learning algorithms for patient length of stay prediction in emergency department during hajj. Proc Annu Conf Progn Heal Manag Soc PHM 2020, June 2020, doi:10.1109/ICPHM49022.2020.9187055. DOI: https://doi.org/10.1109/ICPHM49022.2020.9187055
Bacchi S, Tan Y, Oakden-Rayner L, Jannes J, Kleinig T, Koblar S. Machine learning in the prediction of medical inpatient length of stay. Intern Med J Vol. 2022, pp. 52:176–185, doi:10.1111/imj.14962. DOI: https://doi.org/10.1111/imj.14962
Naemi A, Schmidt T, Mansourvar M, Ebrahimi A, Wiil UK. Quantifying the impact of addressing data challenges in prediction of length of stay. BMC Med Inform Decis Mak Vol. 21, pp. 1–13, 2021, doi:10.1186/s12911-021-01660-1. DOI: https://doi.org/10.1186/s12911-021-01660-1
Siddiqa A, Zilqurnain Naqvi SA, Ahsan M, Ditta A, Alquhayz H, Khan MA, et al. Robust length of stay prediction model for indoor patients. Comput Mater Contin., Vol. 70, pp. 5519–5536, 2022, doi:10.32604/cmc.2022.021666. DOI: https://doi.org/10.32604/cmc.2022.021666
Aghajani S, Kargari M. Determining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department. Hosp Pract Res., Vol. 1, pp. 51–56, 2016, doi:10.20286/hpr-010251. DOI: https://doi.org/10.20286/hpr-010251
López-cheda A, Jácome M, Cao R, Salazar PM De. Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia ( Spain ), 2021. DOI: https://doi.org/10.1101/2020.09.04.20187963
Chen Y. Prediction and Analysis of Length of Stay Based on Nonlinear Weighted XGBoost Algorithm in Hospital. J Healthc Eng 2021;2021, doi:10.1155/2021/4714898. DOI: https://doi.org/10.1155/2021/4714898
MEKHALDI RN, CAULIER P, CHAABANE S, CHRAIBI A, PIECHOWIAK S. A comparative study of machine learning models for predicting length of stay in hospitals. J Inf Sci Eng., Vol. 37, pp.1025–1038, 2021, doi:10.6688/JISE.202109_37(5).0003.
Adawiyah R, Badriyah T, Syarif I, Rabiatul Adawiyah, Badriyah T, Syarif I. Hospital Length of Stay Prediction based on Patient Examination Using General features. Emit Int J Eng Technol., Vol. 9, pp. 169–181, 2021, doi:10.24003/emitter.v9i1.609. DOI: https://doi.org/10.24003/emitter.v9i1.609
Wan Z, Xu Y, Šavija B. On the use of machine learning models for prediction of compressive strength of concrete: Influence of dimensionality reduction on the model performance. Materials (Basel), Vol.14, pp.1–23, 2021, doi:10.3390/ma14040713. DOI: https://doi.org/10.3390/ma14040713
Gupta I, Sharma V, Kaur S, Singh AK. PCA-RF: An Efficient Parkinson’s Disease Prediction Model based on Random Forest Classification 2022.
Choudhury A. Hospital Length of Stay Dataset Microsoft 2022. https://www.kaggle.com/datasets/aayushchou/hospital-length-of-stay-dataset-microsoft.
Fan C, Chen M, Wang X, Wang J, Huang B. A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery from Building Operational Data., Front, Vol. 9, pp.1–17, 2021, doi:10.3389/fenrg.2021.652801. DOI: https://doi.org/10.3389/fenrg.2021.652801
Yusuf AB, Dima RM, Aina SK. Optimized Breast Cancer Classification using Feature Selection and Outliers Detection. J Niger Soc Phys Sci., Vol. 3, pp. 298–307, 2021, doi:10.46481/jnsps.2021.331. DOI: https://doi.org/10.46481/jnsps.2021.331
Gulati A. Dealing with Outliers Using the IQR Method - Analytics Vidhya. Anal Vidhya 2022.
Pei J, Lin X, Chen Q. Prediction of Patients ’ Length of Stay at Hospital During COVID-19 Pandemic Prediction of Patients ’ Length of Stay at Hospital During COVID-19 Pandemic, pp. 0–10, 2021, doi:10.1088/1742-6596/1802/3/032038. DOI: https://doi.org/10.1088/1742-6596/1802/3/032038
Bhandari A. Feature Engineering: Scaling, Normalization, and Standardization (Updated 2023). Anal Vidhya, Vol. 03, Apr 2020.
Cha GW, Choi SH, Hong WH, Park CW. Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis. Int J Environ Res Public Health, Vol. 20, 2023, doi:10.3390/ijerph20043159. DOI: https://doi.org/10.3390/ijerph20043159
Yao L. Improved Models for Diabetes Prediction by Integrating PCA Technique, Vol. 47, pp. 106–115, 2023. DOI: https://doi.org/10.54097/hset.v47i.8172
Mekhaldi RN, Caulier P, Chaabane S, Chraibi A, Piechowiak S. Using Machine Learning Models to Predict the Length of Stay in a Hospital Setting. World Conf Inf Syst Technol., Vol. 1159, pp. 202–211, 2020, doi:10.1007/978-3-030-45688-7_21. DOI: https://doi.org/10.1007/978-3-030-45688-7_21
Chuang M Te, Hu YH, Lo CL. Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res., Vol. 25, pp.75–90, 2018, doi:10.1111/itor.12298. DOI: https://doi.org/10.1111/itor.12298
Abd-Elrazek MA, Eltahawi AA, Elaziz MHA, Abd-Elwhab MN, Abd Elaziz MH, Abd-Elwhab MN. Predicting length of stay in hospitals intensive care unit using general admission features. Ain Shams Eng J., Vol.12, pp. 3691–3702, 2021, doi:10.1016/j.asej.2021.02.018. DOI: https://doi.org/10.1016/j.asej.2021.02.018
Copyright (c) 2024 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.