If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Biostatistics and Epidemiology, Social Determinants of Health Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
Department of Environmental Health Engineering, School of Public Health AND Environmental Technologies Research Center (ETRC), Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
Assessing the possibility of patient discharge based on data-mining models is one of the common, user-friendly approaches to optimally exploit the limited capacity of hospital beds.
Objective
The aim of this study was to determine the predictors of length of stay (LOS) in cardiologic care wards developed and carried out based on data-mining approaches.
Methods
Data from 136 patient records were evaluated using data-mining analysis approaches including the Multilayer perceptron artificial neural network (MLP-ANN),Quick unbiased and efficient statistical tree (QUEST), support vector machines (SVM), classification and regression tree (CRT), Advanced decision tree (C5.0), Auto Classifier (AC) and Logistic Regression models.
Results
The median and mean LOS was 4 and 4.15 days (95% CI [3.99, 4.30]), respectively. Predictors are associated with increase in LOS (more than 4 days) were: the ST segment elevation myocardial infarction (STEMI) diagnosis at the time of referral, being in the 50–70 years old group, history of smoking, high blood lipids, history of hypertension, hypertension at the time of admission, and high serum troponin levels.
Conclusion
Using classical models to explain the predictors of aoutcome is inefficient when the number of predictors is high and sample size is low. Therefore, the analysis based on new data-mining approaches is a desirable alternative solution. Behavioral factors, especially smoking, are among the important factors in determining the long-term stay in the heart care ward.
In many countries, non-communicable diseases have a rising trend for reasons such as increasing longevity, increasing and prolonging exposure to risk factors and changing lifestyles. Also, in relation to the global burden of disease, non-communication diseases ranked first since 1990.
Among non-communicable diseases, ischemic heart disease (IHD), in the developed countries, causes the highest rates of death, disability and financial burden compared to other diseases.
As World Health Organization (WHO) reports, at the end of the 2nd millennium, much of the healthcare budget in the developing countries is earmarked for cardiovascular diseases.
In this study, ischemic heart disease refers to a series of cardiovascular diseases whose common characteristic is imbalance in the need for and absorption of oxygen in the heart tissue. Cardiac ischemic diseases include two major groups called chronic coronary artery disease (often showing themselves as stable angina) and acute coronary syndromes (ACSs). The second group consists of patients with ST-segment elevation myocardial infarction (STEMI), unstable angina (UA), and non-ST-segment elevation myocardial infarction (NSTEMI).
One of the important issues in the management of patients with acute coronary syndrome, especially in developing countries, is the management of cardiovascular beds. Considering the high costs of such beds, the optimal use and proper utilization of these beds is indispensable for providing services that are desirable and appropriate to patients' needs. Unnecessary hospitalization, lack of scientific protocols at the time of discharge, non-admission of referrals in the coronary care units, the lack of identical guideline for the optimal length of stay of patients in these wards will lead into longer hospitalization and thus additional charge on patients as well as hospitals. The variety of factors affecting the length of stay (LOS) and the inseparable role of each of these factors has led the researchers to formulate the present study as the first academic approach to answer the above issues. On the other hand, under conditions such as the low sample size and the large number of predictors variables, due to the limitations of the use of classical model, the design of new data-mining approaches areessential, which is somewhat neglected. Although various studies have been conducted on the factors related to the LOS of patients in the coronary care unit (CCU), the intensive care unit (ICU) and other wards of hospitals with classical approaches, we focused on the issue to fill the gap on the lack of a study similar to that of data-mining analysis.
Hospital mortality, length of stay, and preventable complications among critically ill patients before and after tele-ICU reengineering of critical care processes.
Data-mining analysis refers to a variety of different approaches that can be categorized in two areas of classification approaches and clustering approaches. These approaches have been used frequently in various areas such as survival of kidney transplantation, survival of dialysis patients, survival of the heart and lung transplantation, organ failure rate, cancer diagnosis and prognosis, and classification of breast cancer.
The approaches used in a recent study are a combination of approaches such as MPL-ANN, logistic, C5.0, CRT, QUIST, and SVM models. This study was designed to determine the factors associated with LOS in the CCU.
2. Methods
The data of this study included information about 136 patients with acute coronary syndrome diagnosed in CCU of Emam Sajjad Hospital in Yasuj, Kohgiluyeh and Boyer-Ahmad province, Southwest of Iran between 2014 and 2015. Due to the limited number of patients, all people were included in the study by census method. The data needed for this study were collected using a researcher-designed questionnaire and collected in two areas related to patient information and information related to the nature of the disease.
Patient information includes two three-level variables: Age group (Code 0 = less than 50 years old, Code 1 = 50–70 years and Code 2 = over 70 years), electrocardiogram (ECG) based diagnosis (Code 0 = Unstable angina; Code 1 = NSTEMI and Code 2 = STEMI) and 10 two-level variables (Code 0 = with attribute, Code 1 = no attribute) including gender, history of smoking, history of diabetes, history of hypertension, history of anti platelet medicines (aspirin, clopidogrel), blood lipid controller, Beta blocker drugs, history of taking Angiotensin converting enzyme (ACE) drugs, history of taking calcium channel blocker (CCB) medications, history of heart attack, history of Prcutaneous coronary intervention (PCI) and Coronary artery bypass grafting (CABG) diagnostic and therapeutic interventions. Disease information includes six two-level variables (Code0 = yes; Code1 = none). The time between the start of the pain to referral to the emergency ward, the initial diagnosis based on changes in heartbeat, hypertension at the time of admission, the patient's initial troponin status, Creatine kinase-MB (CK-MB) status, and the need to receive streptokinase.
The outcome variable in this study was the LOS calculated and used in the model as a binary variable (less than 4 days and more than 4 days). The median and mean the LOS of patients in this study and some other studies were about 4 days. Since, there was no definitive cut-off point, this value was chosen as the cut-off point. Here after, conventional names of early or late discharge are used to refer to the levels of outcome variable of the study.
Data analysis was performed using SPSS modeler V.14. The data flow designed in the study consists of 10 process nodes including, the data file node, the type node, the filter node, the Auto Data Preparation node, the SVM node, the C5.0 node, the CRT node, the QUEST node and the Artificial Neural Network (NNT) node. In order to use the analysis and evaluation of the outputs of each node, two evaluation and analysis nodes were used. We used the type node to determine the characteristics of the variables and prepare to enter the analytic nodes, the Filter node in order to control the input flow of the variables into the analysis nodes, and utilized the Auto Preparation node to manage the outlier data, the missing data in the predictor and target variables.
Artificial neural network (Multi-Layer Perceptron Artificial Neural Network)(ANN):As a classification technique, an artificial neural network is used to study the complex relationships between predictors and dependent variables like nonlinear and multiple functions.
In this study, multilayer perceptron neural networks, with back propagation architecture of supervised learning type, were used. The neural network used in this study included an input layer (18 neurons), a bias node, an latent layer with 3 neurons and a bias, and the output layer containing a binary variable (0 and 1).
Support Vector Machine (SVM): Another technique for classifying data-mining is a backup vector machine.
Using this technique has led to good modeling successes, which in some cases covers the neural network defects.
Model C5.0: This node is in the decision tree techniques and can represent a decision tree or correlation rules. The target field should be classified. This model makes it possible to classify more than two groups.
Classification and regression tree model (CRT): This node is a subset of decision tree techniques that, in addition to classifying, provides the possibility of prediction and association rules (if …… … Then). Input and target fields in this model can be numeric or class. In the output tree, predictors are classified and represented as binary groups.
Quick unbiased and efficient statistical tree (QUEST) model: This node is a subset of the decision tree techniques and provides a binary classification for the decision tree. The performance of this model for long-standing trees is better than the CRT model. The input field can be continuous, but the output field must be a class. In the decision tree, it is possible to divide predictors into more than two classes which allows for its presentation.
Auto Classifier node (AC): This model is in the category of classifier models and it generates and compares a different number of models for binary outputs and allows the researcher to choose the best approach for analysis.
In the present study, this node was used after auto preparation data to extract the best proposed models and used in the process of data flow. The significance level for the tests was less than 0.05. The statistics used to measure the model's goodness‐of‐fit or the strength and accuracy of the model's predictions include the level below the rock curve and, if necessary, the Gini coefficient. The criterion for determining the best models were conventional cut off above 75% for the surface below the ROC curve.
After determining the predictive percentage of each variable for each model, a approach The adjusted predicted percent was used and Based on the scores obtained, various variables were ranked. By convention, predictive value of 3% and above is defined as acceptable predictors, predictive value of 2–3%, moderate predictor and less than 2% predicted as poor predictor.
Ethical issues (including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed and considered by the authors. The Ethics Committee of Yasuj University of Medical Sciences approved the study protocol (Ethic Code: IR.YUMS.1394.19).
3. Results
The mean and standard deviation of patients' age were 60.47 ± 13.4 years, with a range of 32–90 years. 63.2% of the samples were male and 60% had history of smoking before admission. The median and mean the LOS of patients in the CCU was estimated to be 4 and 4.15 days, respectively. Sixteen percent of the patients were hospitalized with U.D. diagnosis, 18% with NSTEMI and 65.5% of the patients with STEMI diagnosis.
LOS longer than four days for UA was 4.5% and 44% NSTEMI and 42.7% for STEMI patients. Eleven percent of the patients referred to the hospital between half and 3 h after the onset of pain and the rest referred less than half an hour after the pain. Nine point five percent of the patients had a history of PCI or CABG in their medical record. Fifty seven point three percent of the patients had serum troponin levels higher than normal. Thirty eight point nine percent of the patients needed streptokinase during hospitalization. Table 1 shows the distribution of the frequency of the LOS in the ward classified according to the primary variables.
Table 1The frequency distribution of predictors and outcome in the study samples.
Variable
LOS in the CCU N(%)
Total(%)
Variable
LOS in the CCU N(%)
Total(%)
≤4 day
>4 Day
≤4 day
>4 Day
Sex
Male
52(60.5)
34(39.5)
86(100)
History of ACE drugs
No
63(67.3)
33(35.7)
96(100)
Female
34(68.0)
16(32.0)
50(100)
Yes
23(60.5)
15(39.5)
38(100)
Total
86(63.2)
50(36.8)
136(100)
Total
86(63.2)
50(36.8)
136(100)
History of Smoking
Yes
47(57.3)
35(42.7)
82(100)
History of β-blockers drugs
Yes
18(64.3)
10(35.7)
28(100)
No
39(72.2)
15(27.8)
54(100)
No
68(63.0)
40(37.0)
108(100)
Total
86(63.8)
50(36.8)
136(100)
Total
86(63.2)
50(36.8)
136(100)
History of IHD
Yes
63(61.2)
40(38.8)
103(100)
History of antiplatelet drugs
Yes
20(69.0)
9(31.0)
29(100)
No
23(69.7)
10(30.3)
33(100)
No
66(61.7)
41(38.3)
107(100)
Total
86(63.2)
50(36.8)
136(100)
Total
86(63.2)
50(36.8)
136(100)
History of PCI or CABG
No
77(62.6)
46(37.4)
123(100)
Serum Ck-MB at the time of admission
High
59(60.2)
39(39.8)
98(100)
Yes
9(69.2)
4(30.8)
13(100)
Normal
27(71.1)
11(28.9)
38(100)
Total
86(63.2)
50(36.8)
136(100)
Total
86(63.2)
50(36.8)
136(100)
Serum troponin at the time of admission
High
43(55.1)
35(44.9)
78(100)
History of hypertension
Yes
39(58.2)
28(41.8)
67(100)
Normal
42(73.7)
15(26.3)
57(100)
No
47(68.1)
22(31.9)
69(100)
Total
85(63.0)
50(37.0)
135(100)
Total
86(63.2)
50(36.8)
136(100)
Hypertension at the time of admission
Normal
67(65.7)
35(34.3)
102(100)
Time interval between the onset of pain and the referral to hospital
Table 2 shows the accuracy indicators for the AUC and ROC for different models, along with the predictive percentage of each model and their variables. Of the 19 input variables of the model, 7 variables include: having a diagnosis of STEMI at the time of referral, the age group of 50–70 years, history of smoking, history of antihyperlipidemic, history of hypertension, hypertension at the time of admission, and high levels of serum troponin which were identified as strong predictors. Five variables, history of beta blocker intake, calcium channel blockers, need for streptokinase at the beginning of treatment, history of hyperlipidemia, and history of Type-2 diabetes were also determined as mid-range predictors of LOS in wards. The first seven variables (strong predictors) had 55.5% predictive power, five variables with moderate strengths had 11%, and seven weak variables had only 8.9% predictive power. All 19 input variables had 75.5% predictive power.
Table 2Power and degree of importance (%) of predictors used in different models.
Category
SVM
C5.0
LOGIT
CRT
ANN
QUEST
Score
Predictive percentage
Strength of predictor
EKG Diagnosis at the time of admission
9
41
13
31
9
10
86.03
14.40
Strong
Background: Age
16
9
12
21
11
32
78.27
13.04
Strong
Behavioral: History of Smoking
13
16
6
9
7
11
48.98
8.10
Strong
History of anti-hyperlipidemic drug
7
20
19
0.01
8
7
47.25
7.88
Strong
Hypertension at the time of admission
13
4
2
8
6
3
29.37
4.89
Strong
History of hypertension
3
0.01
4
4
7
11
22.02
3.67
Strong
Serum troponin at the time of admission
3
0.01
4
4
12
5
21.09
3.50
Strong
History of β-blockers
4
0.01
5
4
6
0.1
14.85
2.48
Moderate
History of CCB drug
0.1
0.01
5
4
8
2
13.94
2.30
Moderate
Use of streptokinase in the CCU
3
0.01
6
4
4
0.01
13.14
2.19
Moderate
History of hyperlipidemia
7
0.01
4
0.01
2
2
12.68
2.11
Moderate
History of Diabetes mellitus
0.001
8
2
4
2
0.1
12.00
2.00
Moderate
Background: Sex
0.1
0.01
3
4
6
3
11.77
1.96
Week
Serum Ck-MB at the time of admission
3
0.01
5
4
3
0.01
11.68
1.94
Week
Time interval between the onset of pain and the referral to hospital
Data from Table 3 shows that behavioral indicators (smoking and seeking help) attributed 9.5% of predictive power, background (age and sex) 15%, medical and laboratory history 25%, and diagnostic-therapeutic interventions during the hospitalization 26.92% of predictive power for long-term stay in the CCU.
Table 3The association rules for C5.0, CRT and rule association models.
The results of the study indicate the predictive role of 7 variables among the 19 input variables in the model: Having a diagnosis of STEMI at the time of referral, the age group of 50–70 years, history of smoking, history of hyperlipidemia, history of hypertension, hypertension at the time of admission, and high serum troponin levels identified as potent predictors. The studies conducted by researchers has revealed that limited studies are carried out with a similar approach and the target population.
In a study with a data-mining approach conducted by Rezaei et al., with the purpose of identifying the role of three data-mining models in predicting the LOS of heart disease patients in the CCU, which was carried out on 4948 patients, the researchers used three approaches: C5.0, SVM and ANN. In this study, the outcome variable of the hospital stay were three-part (less than 5 days, 6–9 days and more than 10 days), and 36 variables were used as predictors. The accuracy of SVM, ANN, C5.0 models was 83.5, 53.9, and 96.4, respectively. According to the SVM model, researchers introduced 16 predictors on the patients’ LOS in the hospitals: History of the use of anticoagulants, history of use of nitrate compounds, diagnosis at admission, diastolic blood pressure, cardiac file in echocardiography, disabilities, marital status, chest pain, gender, high-density lipoprotein (HDL), hemoglobin levels, smoking, insurance type, cholesterol, age, and changes in the ST part of the ECG were introduced as predictors.
EKG-based diagnosis and smoking history are common predictors for the LOS in the CCU in the present study and the other study. Of course, the difference in sample size of the two studies is one of the points that can be considered in justifying the differences between the two studies.
In a study by Heller et al., which evaluated the LOS after heart attack in 438 patients, the results showed that the mean hospitalization time was 13.6 days and 74% of patients stayed in the hospital for more than 10 days. The average LOS in the CCU was 4.5 days. The findings of this study indicated that the increase in CK enzymes, stroke with Q wave, stroke in the anterior vein, and the use of digoxin and nitrates during admission is accompanied with an increase in the LOS in the hospital with a regression model.
Although the mean LOS of the patients in the CCU obtained in this study is similar to our study, but our results are not very consistent with the predictor of longer LOS in a ward. The difference between the predictors of the model and different analytical approaches could be among the most important justifications for this discrepancy. However, EKG-based diagnosis is one of the common predictors in the two studies.
In a study to investigate predictors of LOS in a post-stroke hospital in Japan on 4113 patients between 1998 and 2003, Kinjo reported a mean LOS of 31.2 days and the effect of clinical variables (patient characteristics, severity of stroke, treatment and intra-hospital complications) predicted a 26-percent increase in LOS, a finding that contradicts the results of the present study that shows a 75% role for the predictive power of the variables entered into the model.
Laurencet et al. reviewed 370 patients with ACS diagnosis due to delays in the discharge of low risk patients and reported the most important reasons for delay in discharge is due to the need for additional testing, regular decrease in medication and admission during holidays.
Saczynski et al. studied the effect of reducing the LOS on mortality and reduction of hospitalization in 4184 patients between 1995 and 2005. The results of their study revealed that during the 10 year period of the study, hospitalization was reduced by one third from 7.2 days to 5days. Younger patients, men and non-complicated patients have a shorter admission period.
Among other studies on the LOS in cardiac patients is the study by Li et al., showed that the median and mean LOS of patients with myocardial infarction was 13 and 14.6 days in 2001, and the same values in 2011 were 11 and 11.9 days, respectively.
On the other hand, Karabulut et al. attempted to determine the optimal LOS and the predictors of LOS for 267 patients admitted with diagnosis of STEMI. In this study, patients were classified according to the length of one-day, two-day, three-day and four-day and more and compared with respect to other complications. Killip classification, left ventricular ejection, multiple coronary artery disease, and diabetes are the most important predictors of LOS in the study.
Swaminathan et al., in a study on 33920 patients diagnosed with STEMI, classified and reported delay in hospital discharge as three short, medium and long term categories of 26.9%, .46.3% and 26.8%, respectively, arguing that the long-term predictors associated with long-term discharge include high age, female gender, patients with cardiogenic shock and having to deal with several coronary arteries.
In a 2001 study on 296 CCU patients, Di Chiara et al. reported a percentage of STEMI, NSTEMI and Undefined cases of 65%, 30% and 5% respectively, which is consistent with the findings of the present study. The median LOS was 10 days, which was reported for STEMI, NSTEMI, and Undefined, respectively, as 9,10 and 11 days, respectively.
Finally, in a study by Every et al. (1996) on 11,932 patients with myocardial infarction in 19 regional hospitals between 1988 and 1994, showed an LOS mean in the ward decreased from 8.5 days to 6 days in the final year (1994). The demographic and individual characteristics of the patients accounted for 6% of this decrease, and the hospital complications and treatment approaches for 27% of the rest.
In studies conducted in Iran, Vejdaniet al., in a study on 3330 elderly patients, showed that the mean of patients' LOS in all wards and the CCU ward was 4.8 and 5.1 days, respectively which is consistent with our results. Regarding the factors related to the LOS in the hospital, there was no relationship between sex and LOS, but there was a direct relationship between the age and LOS in the hospital.
Also, FarajiKhayavi, in a study reported an mean LOS in CCU ward, 4.15 and 4.22 days in public and private hospitals, respectively. In this study, a significant relationship was found between the type of stroke, the patient's blood pressure during admission (inverse relation) and age (direct relation).
Finally, although the low sample size and the large number of input predictors of the models are considered to be limitations of this study, the goodness-of-fit of the models and predictive capability of the data show the efficiency of data-mining models and algorithms at a time when classical approaches face analytical challenges.
5. Conclusion
No similar study was found particularly one that uses data-mining approaches. The important point in this matter is the simplicity and importance of exploiting these approaches in the field of medical that have gained considerable prominence over the last few decades. The next important point is attention to the factors that determine the delay in the discharge of patients from the CCU. The top factor here is smoking as a behavioral predictor. Clinical diagnosis along with behavioral tendency to smoking, history of hypertension, and eventually serum troponin marker are the most important predictors. Planning to control smoking and hypertension needs to be emphasized.
Ethics approval and consent to participate
Ethical issues (including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed and considered by the authors. The Ethics Committee of Yasuj University of Medical Sciences approved the study protocol (Ethic Code: IR.YUMS.1394.19).
Funding
This work was funded by Yasuj University of Medical Sciences approved the study protocol (Grant number: 1916).
Declaration of competing interest
The authors declare that they have no competing interests.
Acknowledgements
The authors are grateful to all respondents in this study.
Hospital mortality, length of stay, and preventable complications among critically ill patients before and after tele-ICU reengineering of critical care processes.