Abstract
Heart disease remains one of the leading causes of mortality worldwide, making early and accurate prediction vital for effective treatment and prevention. Machine learning and deep learning models have shown promise in medical diagnosis; however, their performance often suffers due to noisy or redundant features in clinical datasets. This study proposes a hybrid feature selection and deep neural network (DNN) framework to enhance the accuracy of heart disease prediction. The hybrid feature selection combines filter methods (e.g., correlation analysis, chi-square test, mutual information) with wrapper or embedded approaches (e.g., random forest, recursive feature elimination, LASSO regularization) to identify the most relevant subset of attributes. The selected features are then used to train a deep neural network optimized with regularization and hyperparameter tuning techniques to prevent overfitting. Benchmark datasets such as the UCI Cleveland Heart Disease dataset and the Framingham Heart Study dataset will be used to evaluate the system. Performance will be measured in terms of accuracy, precision, recall, F1-score, and AUC, and results will be compared with baseline traditional machine learning classifiers. The proposed hybrid framework is expected to reduce model complexity, improve generalization, and achieve superior predictive performance. This approach demonstrates the potential of combining feature selection with deep learning to support clinical decision-making in cardiovascular healthcare.
Keywords: Heart disease, Deep Neural Network (DNN), Machine Learning, Hybrid Feature Selection, Deep Learning, baseline models.
1. Introduction
Cardiovascular diseases (CVDs) remain one of the foremost causes of mortality and long-term morbidity worldwide, accounting for a significant proportion of premature deaths and healthcare expenditure. The persistent rise in heart disease incidence is closely linked to lifestyle transitions, aging populations, and the growing prevalence of metabolic disorders such as hypertension, diabetes, and obesity. These trends underscore the urgent need for early detection and accurate risk assessment mechanisms to support timely clinical intervention and reduce adverse outcomes [1]. Although conventional diagnostic practices rely on clinical examinations, expert judgment, and rule-based scoring systems, their effectiveness is often limited by subjectivity and reduced adaptability across diverse patient populations and complex clinical profiles [2].
With the increasing availability of digitized clinical records and large-scale heart disease–related datasets, machine learning (ML) and deep learning (DL) techniques have emerged as powerful tools for computer-aided cardiac diagnosis. A wide range of ML models, including support vector machines, decision trees, random forests, and probabilistic classifiers, have demonstrated improved predictive capability compared with traditional statistical approaches [3], [4]. These techniques leverage clinical attributes such as age, blood pressure, cholesterol levels, chest pain characteristics, and heart rate–related parameters to identify patterns associated with cardiac risk. However, the performance of such models is highly dependent on the relevance and quality of the input features used for training. Clinical datasets frequently contain redundant, weakly informative or noisy attributes that increase model complexity and adversely affect learning efficiency. The presence of irrelevant features can lead to prolonged training time, reduced interpretability, and a higher risk of overfitting, particularly when models are applied to heterogeneous or moderately sized heart study datasets. Feature selection therefore plays a crucial role in identifying clinically meaningful variables that contribute most significantly to disease prediction. Filter-based approaches offer computational simplicity but often ignore feature interdependencies, while wrapper and embedded methods provide better feature evaluation at the cost of higher computational demand [5]. When applied independently, these techniques may yield suboptimal feature subsets.
Deep Neural Networks (DNNs) have gained increasing attention in medical decision support systems due to their ability to capture complex non-linear relationships among clinical variables. Despite their strong representational power, DNNs are sensitive to high-dimensional inputs and irrelevant features, which can negatively impact generalization performance and model stability. These limitations become more pronounced when dealing with imbalanced or diverse heart disease datasets, where predictive accuracy may degrade without appropriate feature optimization. Consequently, integrating feature selection with deep learning has become an important research direction for improving diagnostic reliability [6], [7]. In response to these challenges, this study proposes a hybrid feature selection framework integrated with a Deep Neural Network to enhance heart disease prediction. The proposed approach combines statistical filter methods with wrapper and embedded techniques to effectively reduce feature redundancy while preserving clinically relevant information. By refining the input feature space prior to deep learning–based classification, the framework aims to improve predictive accuracy, robustness, and computational efficiency. Experimental evaluation on representative heart disease and longitudinal heart study datasets demonstrates that the hybrid feature selection–based DNN consistently outperforms baseline models. The findings highlight the potential of combining optimized feature selection strategies with deep learning to support reliable and interpretable clinical decision-making in cardiovascular healthcare.
Literature Review
In this, we have reviewed the recent and relevant studies on heart disease prediction using machine learning and deep learning techniques. We examine existing approaches focusing on feature selection strategies, classification models, and performance evaluation across commonly used cardiovascular datasets. The analysis highlights current achievements while identifying methodological limitations that motivate the proposed hybrid feature selection and deep learning framework.
R. Parthiban et al. [3] analyzed key cardiovascular risk factors using two heterogeneous datasets and multiple ML classifiers. After pre-processing and feature selection, SVM achieved the best performance (91.67% accuracy), highlighting the importance of optimized feature subsets in prediction tasks.
Nithya Shree A. P. et al. [8] proposed an optimized deep multilayer perceptron combined with modified random forest-based feature selection. Validation on the Cleveland dataset showed superior accuracy (97.89%) with reduced error, demonstrating the benefit of integrating feature selection with deep learning.
Andrés Bell-Navasa et al. [9] introduced a deep learning system for heart failure time prediction from echocardiography videos. By combining HODMD-based feature extraction with a self-supervised Vision Transformer, the approach outperformed conventional CNN-based methods.
Zeinab Noroozi et al. [5] evaluated filter, wrapper, and evolutionary feature selection techniques on the Cleveland dataset across multiple classifiers. Results showed that filter-based methods improved accuracy, while wrapper and evolutionary approaches enhanced sensitivity and specificity.
Jalil Nourmohammadi-Khiarak et al. [10] applied a metaheuristic imperialist competitive algorithm for feature selection with KNN classification. The approach effectively reduced dimensionality and improved predictive accuracy compared to conventional optimization techniques.
Bhanu Prakash Doppala et al. [11] proposed a GA-based feature selection method integrated with an RBF neural network for coronary disease detection. Feature reduction significantly improved accuracy from 85.40% to 94.20%, emphasizing the role of attribute optimization.
Iman S. Al-Mahdi et al. [7] introduced a hybrid framework combining genetic algorithm-based feature selection with an ensemble deep learning model optimized by swarm intelligence. The model achieved high accuracy (97.5%) with improved computational efficiency on large datasets.
Mana Saleh Al Reshan et al. [12] designed a hybrid deep neural network integrating CNN, LSTM, and dense layers for heart disease prediction. Evaluation across multiple datasets showed strong performance (98.86% accuracy), validating the effectiveness of hybrid deep architectures.
Senthilkumar Mohan et al. [6] proposed a hybrid random forest–linear model to identify significant cardiovascular features. The method improved prediction accuracy to 88.7%, demonstrating the benefit of combining ensemble and linear techniques.
Muhammad Salman Pathan et al. [13] studied the impact of filter-based feature selection on Cleveland and Framingham datasets. The reduced feature sets improved classification accuracy and reduced training time, especially for complex datasets.
Sushree Chinmayee Patra et al. [4] developed a two-stage ensemble framework using feature-weighted meta-models and hybrid voting classifiers. With only seven features, the system achieved 95.87% accuracy, reducing complexity while maintaining strong predictive performance.
Despite advances in heart disease prediction, current studies often lack systematic feature optimization, rely on single datasets, and apply feature selection methods in isolation. Limited attention to complex datasets and insufficient use of clinically refined features reduce model generalization and interpretability. To address these issues, in this work, we apply a hybrid feature selection approach with an optimized deep neural network.
Methodology
In this study, we developed a structured methodology for heart disease prediction by integrating hybrid feature selection with a deep neural network. We first selected appropriate clinical datasets and performed comprehensive preprocessing to handle noise, redundancy, and high dimensionality. We then applied a hybrid feature selection approach to retain only the most relevant attributes, which helped simplify the feature space and strengthen learning efficiency. Based on the selected features, we built a DNN model and evaluated its performance using standard metrics to ensure reliability and generalization.
3.1 Datasets
To evaluate the effectiveness of the proposed hybrid framework, two widely recognized and benchmark heart disease datasets are utilized: one is UCI Cleveland Heart Disease dataset, and second one is Framingham Heart Study (FHS) dataset. Both datasets consist of essential clinical indicators commonly associated with cardiovascular health, enabling robust classification of disease and healthy classes.
3.1.1 UCI Cleveland Heart Disease Dataset
We used the UCI Cleveland Heart Disease dataset [14], which consists of 303 patient records described by 14 clinical attributes, as summarized in Table 1. The outcome variable indicates the presence or absence of heart disease and was treated as a binary class label, where 1 represents disease and 0 denotes normal cases, to support effective classification and performance evaluation.
| Feature | Description |
| Age | Age of the patient in years |
| Sex | Gender (1=male, 0=female) |
| Cp | Chest pain type (4 categories) |
| Trestbps | Resting blood pressure |
| Chol | Serum cholesterol (mg/dl) |
| Fbs | Fasting blood sugar > 120 mg/dl |
| Restecg | Resting electrocardiography results |
| Thalach | Maximum heart rate |
| Exang | Exercise-induced angina |
| Oldpeak | ST depression induced by exercise |
| Slope | Slope of the peak exercise ST segment |
| Ca | Number of major vessels colored by fluoroscopy |
| Thal | Thallium stress test results |
| Target | Heart disease status (1=present, 0=not present) |
3.1.2 Framingham Heart Study Dataset
We utilized the Framingham Heart Study dataset [15], which contains 4240 patient records described by 16 clinical attributes as presented in Table 2.
| Attribute | Category | Medical Relevance |
| Male | Demographic | Gender influences cardiovascular risk; males generally exhibit higher early-onset CVD prevalence |
| Age | Demographic | Aging is a significant CVD risk factor |
| Education | Socio-economic | Lifestyle & awareness correlate with CVD |
| Current Smoker | Lifestyle | Smoking contributes to arterial damage |
| cigsPerDay | Lifestyle | Smoking intensity directly increases cardiovascular risk and accelerates plaque formation |
| BPMeds | Medication | Impact on blood pressure regulation |
| Prevalent Stroke | Clinical history | Key past cardiac-related conditions |
| prevalentHyp | Clinical history | Key past cardiac-related conditions |
| Diabetes | Metabolic disorder | Diabetes significantly elevates cardiovascular risk due to vascular inflammation |
| totChol | Biochemical | High LDL strongly linked to CVD |
| SysBP | Blood pressure | Hypertension-driving factors |
| diaBP | Blood pressure | Hypertension-driving factors |
| BMI | Physical profile | Obesity correlates with heart disease |
| Heart Rate | Cardiovascular control | Stress and heart workload indicator |
| Glucose | Diabetes indicator | Diabetes increases CVD risk |
| TenYearCHD | Target variable | Binary indicator representing the presence or absence of 10-year coronary heart disease risk |
3.2 Data Preprocessing
In this study, we applied several preprocessing procedures prior to baseline models train, hybrid feature selection and DNN training. This includes handling missing values, data normalization, and encoding categorical attributes.
3.2.1 Handling Missing Values
In the Cleveland dataset, there are 6 missing values in that 4 in ca and 2 in thal. Similarly in Framingham dataset, there are totally 645 missing values in that 105 education, 29 cigsPerDay, 53 BPMeds, 50 totChol, 19 BMI, 1 heartRate and 388 glucose.
In this study, the missing values in the Cleveland dataset are handled using KNN imputation for ca and mode imputation for thal to ensure clinically meaningful and distribution-preserving estimates. We implement a hybrid imputation framework to handle missing values in the Framingham dataset. The attributes education and BPMeds are imputed using mode, totChol and cigsPerDay are imputed using median, BMI and heartRate are imputed using mean and KNN-based imputation is applied for glucose. This strategy effectively minimized information loss and improved dataset quality for subsequent analysis.
3.2.2 Normalization Techniques
In this study, we apply Min–Max normalization to scale all feature values into the range of 0 to 1. This approach allowed us to preserve the original data distribution while minimizing the dominance of high-magnitude features during model training.
F1-score=2*Precision*RecallPrecision+Recall(1)
Here X= Original feature value, Xmin = Minimum value of the feature, Xmax= Maximum value of the feature, and Xscaled= Value after scaling (range: 0 to 1).
3.2.3 Balancing Datasets


3.2.4 Features Correlation
We analyzed the correlation patterns in both the Cleveland and Framingham datasets to understand the individual influence of clinical attributes on heart disease outcomes. As illustrated in Figure 3, the Cleveland dataset exhibits strong to moderate associations between the target variable and features such as thal, ca, exang, oldpeak, and chest pain type, indicating their significant role in disease prediction. In contrast, the Framingham dataset shows relatively weaker correlations overall, as presented in Figure 4, where age, systolic and diastolic blood pressure, and prevalent hypertension emerge as the most influential predictors. Other metabolic and lifestyle factors, including cholesterol, BMI, smoking-related variables, and heart rate, demonstrate limited standalone impact. These observations highlight the distinct feature–outcome relationships across the two datasets and justify the need for dataset-specific modeling and feature selection strategies.


3.3 Hybrid Feature Selection Framework
Clinical cardiovascular datasets often contain redundant and noisy attributes that increase model complexity, training time, and the risk of overfitting. To overcome these limitations, in this study, we applied a two-phase hybrid feature selection strategy to identify the most relevant attributes for heart disease prediction as shown in figure 5.
In the first phase, we employed filter-based methods that evaluate features using statistical relationships with the target variable. We used correlation analysis to remove features with very low association to the disease outcome and to eliminate highly correlated variables that introduce redundancy. The chi-square test was applied to assess the significance of categorical attributes, retaining only those with strong predictive relevance. In addition, mutual information was used to capture both linear and nonlinear dependencies, allowing us to preserve clinically important features that may not show simple linear relationships. Based on these filter techniques, we selected thalach, oldpeak, sex, cp, exang, thal, ca, slope, and chol from the Cleveland dataset, and cigsPerDay, diaBP, BPMeds, sysBP, male, BMI, diabetes, age, prevalentStroke, prevalentHyp, and glucose from the Framingham dataset.
In the second phase, we refined the feature set using wrapper and embedded methods that consider feature interactions and model performance. Random Forest feature importance was used to rank attributes based on their contribution to impurity reduction, capturing nonlinear relationships among risk factors. Recursive Feature Elimination iteratively removed less informative features to retain only the most influential predictors, while LASSO regularization further simplified the feature space by shrinking non-contributing coefficients toward zero. After applying these methods, the final selected features are thalach, oldpeak, cp, thal, ca, and chol for the Cleveland dataset, and cigsPerDay, diaBP, sysBP, male, BMI, age, and glucose for the Framingham dataset. These hybrid-selected feature subsets effectively reduce dimensionality and redundancy while preserving the most clinically and statistically significant predictors, thereby providing a robust foundation for subsequent machine learning and deep learning model training.

3.4 Proposed Deep Neural Network (DNN) Architecture
In this study, we designed and applied a Deep Neural Network (DNN) based binary classification model to predict the presence or absence of heart disease using the most informative features selected through the hybrid feature selection framework as shown in the figure 6. By reducing input dimensionality before training, we ensured that the model learns only clinically relevant information, which improves generalization and supports reliable prediction. The proposed DNN architecture consists of an input layer aligned with the selected clinical features, followed by two fully connected hidden layers with 64 and 32 neurons, respectively. ReLU activation in equation 2 is employed in the hidden layers to effectively capture both simple and complex nonlinear relationships among clinical attributes. To control overfitting and enhance robustness, we applied dropout with a rate of 0.3 along with L2 regularization. The output layer uses a single neuron with a sigmoid activation function in Equation 3 to generate probabilistic predictions for binary heart disease classification.

We trained the model using forward and backward propagation, optimizing the network weights through binary cross-entropy loss using equation 4 and gradient-based learning. To further improve performance and training stability, we employed the Adam optimizer due to its fast convergence on clinical data. Hyperparameters such as learning rate, batch size, and number of epochs are tuned through systematic experimentation, and early stopping is applied to prevent overtraining when validation performance ceased to improve. Overall, this optimized DNN framework enabled more accurate and reliable heart disease prediction while maintaining computational efficiency.
f(x)=max(0,x) (2)
Here f(x)= ReLU output, x= input to the neuron. If x is positive then f(x)=x and If x is negative or zero then f(x)=0.
F1-score=2*Precision*RecallPrecision+Recall(3)
Here σ(x) = output of sigmoid function, x = input to the neuron, e = Euler’s constant (~2.71828). This function squashes the input into a range between 0 and 1, which is ideal for binary classification tasks like heart disease prediction.
Loss=-F1-score=2*Precision*RecallPrecision+Recall(4)
Here N = total number of samples, yi= true label of sample i (0 or 1), y^i= predicted probability of class 1 for sample i. This function measures how well the predicted probabilities match the actual binary class labels.

3.5 Performance Evaluation Metrics
The proposed hybrid feature selection and Deep Neural Network framework is evaluated using the most widely recognized classification performance indicators in the healthcare domain. These evaluation metrics are derived using the confusion matrix, which is represented in figure 8.
| Actual Positive Negative | PredictedPositive Negative | |
| TP | FN | |
| FP | TN | |
| Fig 8: Presents confusion matrix |
Here True Positives (TP) represent the patients correctly identified as having heart disease, True Negatives (TN) indicate correctly classified healthy individuals, False Positives (FP) occur when healthy persons are incorrectly predicted as diseased, and False Negatives (FN), where heart disease patients are wrongly classified as healthy. Based on these parameters, the following evaluation metrics are computed:
Accuracy: It is the proportion of correct predictions among all classified cases.
F1-score=2*Precision*RecallPrecision+Recall (5)
Precision: It is the proportion of correctly identified positive cases out of all cases predicted as positive.
F1-score=2*Precision*RecallPrecision+Recall (6)
Recall: Recall is also known as sensitivity or true positive rate and it is the proportion of correctly identified positive cases out of all actual positive cases.
F1-score=2*Precision*RecallPrecision+Recall (7)
This metric is highly critical because missing a positive case can lead to severe medical consequences. A higher recall indicates that the model effectively identifies high-risk patients, reducing life-threatening misdiagnoses.
F1-Score: The F1-score is the harmonic mean of Precision and Recall. It balances the trade-off between avoiding false positives and false negatives.
F1-score=2*Precision*RecallPrecision+Recall (8)
AUC-ROC: The ROC (Receiver Operating Characteristic) curve plots True Positive Rate vs. False Positive Rate at different classification thresholds. The AUC (Area Under the Curve) measures the ability of the model to distinguish between positive and negative classes. An AUC value close to 1.0 indicates excellent separability between heart disease-positive and negative cases, whereas an AUC value near 0.5 suggests that the model is performing no better than random guessing.
Experimental Setup and Experimental Results Analysis
4.1 Experimental Setup
In this research, we carefully designed the experimental setup to implement and evaluate the proposed hybrid feature selection with DNN-based classification model for heart disease prediction. The entire implementation is developed using Python, where TensorFlow and Keras are used for the DNN model, Scikit-Learn for preprocessing and hybrid feature selection, and Pandas, NumPy, and visualization libraries for data handling and performance analysis. The experiments are executed on a suitable hardware setup to ensure efficient computation and reproducible results. Both datasets are divided into training and testing as 80%–20% ensuring reliable assessment under different data utilization scenarios.
4.2 Results Analysis: With vs. Without Hybrid Feature Selection
In this work, we evaluated the performance of the DNN model both before and after applying the proposed hybrid feature selection approach. We carried out a comparative analysis to clearly examine how feature optimization influences the predictive capability of the model in heart disease classification. By analyzing the results obtained with and without hybrid feature selection, we demonstrate a noticeable improvement in prediction accuracy after feature refinement. The detailed performance outcomes for both datasets are reported in Table 4. Furthermore, Figure 4 illustrates the performance of the DNN model without feature selection, while Figure 5 shows the improved results achieved using the proposed hybrid feature selection combined with the DNN classifier.




| Model Setting | Dataset | Accuracy | Precision | Recall | F1score | AUC |
| DNN without Feature Selection | UCI Cleveland Heart Disease | 88.33% | 90.57% | 87.81% | 88.52% | 0.9620 |
| Framingham Heart Study | 87.06% | 81.54% | 84.79% | 85.89% | 0.9334 | |
| DNN with Hybrid Feature Selection | UCI Cleveland Heart Disease | 97. 67% | 95.50% | 93.33% | 95.37% | 0.9780 |
| Framingham Heart Study | 94.70% | 87.86% | 89.38% | 91.49% | 0.9422 |


4.3 Comparative Analysis with Baseline Models
In this work, we applied commonly used machine learning classifiers, including SVM, Random Forest, KNN, and Naïve Bayes, as baseline models to evaluate the effectiveness of our proposed framework. All models were trained using the same dataset configuration, identical preprocessing steps, and consistent evaluation metrics to ensure a fair and unbiased comparison. The evaluation was conducted in two stages: first, by training the models without feature selection, and second, by incorporating the proposed hybrid feature selection approach. This structured comparison clearly demonstrates the performance improvements achieved after feature selection and highlights the effectiveness of the Hybrid Feature Selection combined with the DNN model. The comparative results, summarized in Table 5, confirm that the proposed approach outperforms conventional machine learning methods for heart stroke prediction.
| Algorithm | Dataset | Accuracy | Precision | Recall | F1-score | AUC |
| Naïve Bayes | UCI Cleveland Heart Disease | 87.78% | 87.80% | 85.71% | 86.75% | 0.9345 |
| KNN | 75.82% | 72.73% | 76.73% | 74.42% | 0.8379 | |
| SVM | 68.89% | 71.88% | 54.76% | 62.16% | 0.7897 | |
| RF | 83.33% | 86.49% | 76.19% | 81.01% | 0.9330 | |
| DNN(Proposed) | 88.33% | 90.57% | 87.81% | 88.52% | 0.9620 | |
| Naïve Bayes | Framingham Heart Study | 82.88% | 79.18% | 82.75% | 78.79% | 0.8113 |
| KNN | 84.94% | 75.00% | 76.22% | 89.96% | 0. 8140 | |
| SVM | 84.79% | 77.80% | 78.70% | 79.90% | 0.8499 | |
| RF | 84.97% | 62.50% | 62.99% | 65.71% | 0.7890 | |
| DNN(Proposed) | 87.06% | 81.54% | 84.79% | 85.89% | 0.9334 |
| Algorithm | Dataset | Accuracy | Precision | Recall | F1-score | AUC |
| Naïve Bayes | UCI Cleveland Heart Disease | 85.56% | 85.37% | 83.33% | 84.34% | 0.9142 |
| KNN | 72.22% | 67.35% | 78.57% | 72.53% | 0.7765 | |
| SVM | 83.33% | 86.49% | 76.19% | 81.016% | 0.9281 | |
| RF | 81.11% | 83.78% | 73.81% | 78.48% | 0.9187 | |
| Hybrid FS + DNN (Proposed) | 97. 67% | 95.50% | 93.33% | 95.37% | 0.9780 | |
| Naïve Bayes | Framingham Heart Study | 82.79% | 77.50% | 79.76% | 80.88% | 0.7921 |
| KNN | 83.52% | 79.04% | 81.98% | 88.10% | 0.8388 | |
| SVM | 84.88% | 1.000% | 80.60% | 81.19% | 0.8821 | |
| RF | 84.34% | 82.42% | 88.38% | 84.00% | 0.8956 | |
| Hybrid FS + DNN (Proposed) | 94.70% | 87.86% | 89.38% | 91.49% | 0.9422 |




On the Cleveland dataset, the DNN achieved an accuracy of 88.33%, precision of 90.57%, recall of 87.81%, F1-score of 88.52%, and an AUC of 0.9620, outperforming Naïve Bayes, KNN, SVM, and Random Forest classifiers. Similarly, on the Framingham dataset, the DNN attained an accuracy of 87.06%, precision of 81.54%, recall of 84.79%, F1-score of 85.89%, and AUC of 0.9334. The inclusion of hybrid feature selection resulted in substantial performance improvements on both datasets. On the UCI Cleveland dataset, the proposed Hybrid Feature Selection with DNN model achieved a significantly higher accuracy of 97.67%, precision of 95.50%, recall of 93.33%, F1-score of 95.37%, and AUC of 0.9780, clearly surpassing all baseline classifiers and the DNN without feature selection. Likewise, on the Framingham dataset, the optimized model recorded an accuracy of 94.70%, precision of 87.86%, recall of 89.38%, F1-score of 91.49%, and AUC of 0.9422. These results show that our proposed hybrid feature selection with DNN model demonstrates state-of-the-art capability for heart disease prediction than traditional machine learning models.
Conclusion and Future Work
In this study, we present a Deep Neural Network–based framework for heart disease prediction and analyzed the impact of hybrid feature selection using the UCI Cleveland and Framingham Heart Study datasets. Without feature selection, the proposed DNN achieved strong performance, obtaining 88.33% accuracy on the Cleveland dataset and 87.06% accuracy on the Framingham dataset, outperforming conventional classifiers such as Naïve Bayes, KNN, SVM, and Random Forest. The integration of hybrid feature selection significantly enhanced predictive performance by reducing redundant and irrelevant features. The optimized Hybrid Feature Selection with Deep Neural Network model achieved an accuracy of 97.67% with an AUC of 0.9780 on the Cleveland dataset and 94.70% accuracy with an AUC of 0.9422 on the Framingham dataset, demonstrating consistent improvements in precision, recall, and F1-score across both datasets. These results confirm that feature optimization plays a crucial role in improving learning efficiency, generalization, and clinical relevance in deep learning–based cardiac risk prediction. Future work will focus on validating the proposed framework on larger and more diverse clinical datasets and extending it to multimodal data sources such as ECG signals and medical imaging. Additionally, incorporating explainable AI techniques will improve interpretability and support clinical decision-making, enabling the proposed model to evolve into a practical and reliable tool for early heart disease detection.
References
- World Health Organization. (2023), cardiovascular diseases (CVDs): Fact sheet. DOI: 10.1097/grh.0000000000000052
- D’Agostino Sr, R.B., Vasan, R.S., Pencina, M.J., Wolf, P.A., Cobain, M., Massaro, J.M. and Kannel, W.B. (2008), General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation, 117, 743-753. DOI: 10.1161/CIRCULATIONAHA.107.699579
- R. Parthiban, K. Santhosh Kumar, “Improved Heart Disease Prediction Accuracy Using Optimized Feature Selection Techniques In Neural Networks”, NeuroQuantology, November 2022, Volume 20, ISSUE 15, PAGE 1341-1349, DOI: . DOI: 10.14704/NQ.2022.20.15.NQ88122
- SUSHREE CHINMAYEE PATRA, B. UMA MAHESWARI, PEETA BASA PATI, “Forecasting Coronary Heart Disease Risk With a 2-Step Hybrid Ensemble Learning Method and Forward Feature Selection Algorithm”, IEEE Access, 30 November 2023, DOI:. DOI: 10.1109/ACCESS.2023.3338369
- Zeinab Noroozi, Azam Orooji and Leila Erfannia,” Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction”, www.nature.com/scientificreports, (2023) 13:22588, DOI: 10.1038/s41598-023-49962-w
- SENTHILKUMAR MOHAN, CHANDRASEGAR THIRUMALAI, GAUTAM SRIVASTAVA, “Effective Heart Disease Prediction Using Hybrid Machine Learning Techniques”, IEEE Access, June 19, 2019, DOI: . DOI: 10.1109/ACCESS.2019.2923707
- Iman S. Al-Mahdi1, Saad M. Darwish1, Magda M. Madbouly, “Heart Disease PredictionModel Using Feature Selection and Ensemble Deep Learning with OptimizedWeight”, Computer Modeling in Engineering & Sciences, Doi:, 11 April 2025. DOI: 10.32604/cmes.2025.061623
- Nithya Shree A. P, Dr. R. Kannan, “A Novel Heart Disease Prediction System using Deep Multi-Layer Perceptron and Optimal Feature Selection Mechanism”, International Journal of Membrane Science and Technology, 2023, Vol. 10, No. 1, pp 1813-1822. DOI: 10.15379/ijmst.v10i1.3410
- Andr´es Bell-Navasa, Mar´ıa Villalba-Orero, Enrique Lara-Pezzi, Jes´us Garicano-Mena, Soledad Le Clainche,” Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases”, arXiv:2504.07606v2 [eess.IV] ,May 7, 2025. DOI: 10.1016/j.eswa.2024.125849
- Jalil Nourmohammadi-Khiarak , Mohammad-Reza Feizi-Derakhshi, Khadijeh Behrouzi, Samaneh Mazaheri, Yashar Zamani-Harghalani, Rohollah Moosavi Tayebi,”New hybrid method for heart disease diagnosis utilizing optimization algorithm in feature selection “, Health and Technology (2020) 10:667–678 DOI: 10.1007/s12553-019-00396-3
- Bhanu Prakash Doppala, Debnath Bhattacharyya, Midhun Chakkravarthy, Tai‑hoon Kim,” A hybrid machine learning approach to identify coronary diseases using feature selection mechanism on heart disease dataset”, Distributed and Parallel Databases (2023) 41:1–20, DOI: 10.1007/s10619-021-07329-y
- MANA SALEH AL RESHAN, SAMINA AMIN, MUHAMMAD ALI ZEB, ADEL SULAIMAN, HANI ALSHAHRANI, ASADULLAH SHAIKH, “A Robust Heart Disease Prediction System Using Hybrid Deep Neural Networks”, IEEE Access, 31 October 2023, DOI: . DOI: 10.1109/ACCESS.2023.3328909
- Muhammad Salman Pathan, Avishek Nag, Muhammad Mohisn Pathan, Soumyabrata Dev, “Analyzing the impact of feature selection on the accuracy of heart disease Prediction”, www.elsevier.com/locate/health, Healthcare Analytics 2 (2022) 100060, 28 April 2022, DOI: 10.1016/j.health.2022.100060
- Heart Disease Data Set from UCI data repository, DOI: 10.5220/0013516000004619
- Framingham heart disease dataset, DOI: 10.4159/harvard.9780674492097.c13