Review of Machine Learning Methods for Preventing False-Positive SQL Injection Detections
DOI:
https://doi.org/10.66571/tsarka-3134-6057-07Ключевые слова:
SQL injection, machine learning, false positives, web application firewall, adversarial training, calibration, dataset realism, anomaly detection.Аннотация
SQL injection remains a persistent web-application threat, yet practical detection systems often suffer from false positives that disrupt legitimate traffic and cause analyst fatigue. This review surveys machine learning methods for SQL injection detection with a focus on preventing false-positive outcomes in deployment, especially in Web Application Firewall (WAF) contexts. We define the scope across three observation layers—SQL query strings, HTTP request payloads/parameters, and derived network-flow features—and analyze how dataset design, feature engineering, and model selection influence false-positive behavior. We describe commonly used benchmark datasets and highlight limitations that hinder generalization, including synthetic traffic generation, label noise, multi-label ambiguity, and extreme class imbalance in web-attack traces. We compare classical machine learning models, deep learning approaches, hybrid systems, and anomaly-detection pipelines, emphasizing techniques that explicitly target false-positive reduction: threshold and risk-score tuning, probability calibration, cost-sensitive learning, ensemble strategies, adversarial training, and explainability for rule refinement. We summarize evaluation practices, recommending metrics suitable for imbalanced, low-base-rate settings and cost-aware decision-making. Finally, we discuss deployment constraints (latency, throughput, online learning, and concept drift) and articulate open research challenges and concrete recommendations for building reproducible, robust, low-false-positive SQL injection defenses.







