Abstract

According to the study, performance of machine learning based data leakage detection and prevention techniques was investigated under enterprise-oriented conditions. The pre-processing pipeline of supervised and unsupervised models is similar. Feature engineering, model implementation, and evaluation metrics were similarly matched up on the same experimental design to ensure reproducibility of the model. The models, which realistically simulate insider leakage, were engineered through the amalgamation of the TF-IDF representations, behavioural ratios, and contextual indicators. The researchers evaluated the performance of supervised models (Logistic Regression, Support Vector Machine, and Random Forest), unsupervised models (Isolation Forest, K-Means), and deep learning models (Deep Neural Network (DNN) and Long Short-Term Memory (LSTM)). This evaluated occurred under 70:30 train-test split and 5-fold cross validation. According to experimental findings, deep learning models performed better than traditional approaches, with LSTM yielding the best performance, namely a detection accuracy of 96.1 percent, an F1-score of 96.1 percent and a ROC–AUC of 0.97 at a low false positive rate of 4.5 percent. The performance of the resulting hybrid ML framework was also enhanced, resulting in an accuracy of 96.8%, an F1-score of 96.6% and a false alarm rate lowered to 3.9%. In fact, the Random Forest algorithm created a trade-off between accuracy and interpretability (94.5%). Subsequently, the unsupervised Isolation Forest had many false positives of 13.5% and could not be used for real-time prevention for this reason. An important trade-off of accuracy and latency was revealed in the study which raised the necessity of hybrid, explanation ready and scalable frameworks for real-time data leakage detection and prevention in enterprise scenarios.

Keywords

  • Data Leakage Detection
  • Data Leakage Prevention DLP
  • Machine Learning
  • Deep Learning
  • Insider Threat

References

  1. Al-Mhiqani, M. N., Ahmad, R., Abidin, Z. Z., Yassin, W., Hassan, A., & Mohammad, A. N. (2020). New insider threat detection method based on recurrent neural networks. Indones. J. Electr. Eng. Comput. Sci, 17(3), 1474–1479.
  2. 5) Borah, K. (2025). AI-Driven Threat Detection in Enterprise Email Systems. Journal of Computer Science and Technology Studies, 7(10), 128–136.