A SYSTEMATIC LITERATURE REVIEW ON MACHINE LEARNING MODELS FOR ACTIVE DIRECTORY ATTACK DETECTION USING WINDOWS SECURITY EVENT LOGS
DOI:
https://doi.org/10.66571/tsarka-3134-6057-04Keywords:
Active Directory, threat detection, hybrid machine learning, deep learning, intrusion detection, Windows Event Log, Kerberoasting, lateral movement, provenance graph, ensemble learning, anomaly detection, SIEM, APT detectionAbstract
Active Directory (AD) serves as the foundational identity and access management infrastructure in the vast majority of enterprise Windows environments and has become the primary target of sophisticated cyberattacks, including Kerberoasting, Pass-the-Hash, Golden Ticket, DCSync, lateral movement, and Advanced Persistent Threat (APT) campaigns. Traditional signature-based detection mechanisms have demonstrated systematic inadequacy against stealthy, behaviorally adaptive adversaries who exploit legitimate protocol functionality to evade detection. This literature review systematically examines 30 peer-reviewed and technically validated works published between 2018 and 2025, specifically focused on the detection of threats in Active Directory environments using machine learning, deep learning, and hybrid model architectures. The reviewed works are analyzed across five dimensions: AD-specific attack taxonomies, Windows Security Event Log feature engineering, classical and deep learning detection algorithms, hybrid and ensemble architectures, and graph-based approaches integrating provenance analysis and attack graphs. Key findings indicate that no single algorithm achieves comprehensive coverage across the diversity of AD attack types; instead, hybrid architectures combining sequential modeling (LSTM, BiLSTM), graph-based analysis (GNN), and ensemble classification (Random Forest, XGBoost) demonstrate superior detection accuracy and practical deployability. Provenance-graph-based systems exhibit particular promise for detecting multi-stage APT campaigns in real AD environments. Five critical research gaps are identified: the absence of large-scale labeled AD-specific Security Event Log datasets, the lack of unified hybrid frameworks validated on AD authentication data, insufficient multi-stage kill-chain detection capability, unresolved real-time deployment challenges, and limited application of explainable AI techniques to AD security contexts. These gaps define the research agenda for the development of novel hybrid machine learning models for AD-based threat prediction and detection.







