Abstract

Traffic crashes remain one of the world’s leading causes of preventable death and severe injury, concentrated disproportionately at geographic “hotspots” where roadway design, traffic flow, temporal patterns, and local socio-demographic conditions combine to increase risk. This project develops a comprehensive, reproducible framework to predict and map road accident hotspots by fusing open traffic flow datasets (OpenTraffic and related GPS-derived speed/flow products), official crash records, and fine-grained demographic data (e.g., census blocks or administrative units). The work explores both classical spatial-statistical hotspot methods (network-aware Kernel Density Estimation, Getis-Ord Gi*, local Moran’s I) and modern predictive machine learning approaches (gradient boosting, random forest, and neural-network variants, including experimentation with graph neural networks for network-structured input). The core deliverables are (1) an end-to-end data pipeline for ingestion, cleaning, and feature engineering of heterogeneous spatiotemporal datasets; (2) an interpretable ML model that predicts crash intensity at road-segment or lixel resolution with probabilistic outputs; (3) a decision-oriented hotspot ranking and visualization layer integrated with GIS; and (4) recommendations for data-driven countermeasures (engineering, enforcement, education, emergency response prioritization). The approach leverages OpenTraffic’s open platform for anonymized vehicle traces and speed records as the principal traffic data source, combined with demographic covariates and weather/time features to model crash likelihood and severity. Empirical validation uses historical crash data to measure predictive skill (AUC, precision@k, calibration) and hotspot correspondence (overlap statistics against KDE/saTScan baselines). The results demonstrate that fusing traffic flow metrics with localized demographic risk factors and network-aware spatial features yields statistically significant improvements over naive KDE mapping, and that machine learning models (when interpretable methods such as SHAP and partial dependence are applied) can surface actionable risk drivers for targeted interventions. This project thus provides a practical blueprint for municipal planners and road safety programs to prioritize investments and evaluate countermeasures using open data and reproducible modeling.

Keywords

Road Safety Accident Hotspots Predictive Modeling Machine Learning Open Traffic Data GIS Demographic Analysis

Introduction

Road traffic accidents impose large human, economic, and social costs worldwide. Beyond aggregate statistics, crashes are highly uneven in space and time: a small fraction of road segments or intersections often account for a disproportionate share of severe collisions. Identifying these concentrated risk locations — “hotspots” — and predicting where future accidents are likely to occur are essential for cost-effective interventions such as redesigning dangerous junctions, rerouting heavy vehicles, placing speed cameras, improving lighting and signage, and prioritizing emergency response resources.

Historically, road safety practitioners have relied on retrospective hotspot identification: collect police crash reports, compute counts or rates for road segments, and apply spatial smoothing or cluster detection methods (kernel density estimation (KDE), Getis-Ord Gi*, spatial scan statistics) to highlight areas with high concentration of crashes. KDE and related geostatistical techniques provide intuitive heatmaps and are simple to implement, but they are sensitive to parameter choices (bandwidth, kernel type), frequently ignore the network topology of roads, and do not natively incorporate covariates like traffic volume, speed distributions, land use, or demographic vulnerability. Recent methodological advances combine network-aware spatial statistics, richer feature engineering, and machine learning to build predictive models that go beyond “where did accidents happen in the past?” to answer “where are accidents likely to happen in the future, and why?” — a shift from descriptive mapping to predictive risk modeling.

Open, anonymized traffic datasets such as OpenTraffic (a platform and dataset designed to turn GPS traces and smartphone/taxi telemetry into historical and real-time traffic statistics) have made high-resolution traffic flow data broadly accessible for research and operational use. These datasets provide per-segment speed and travel-time distributions over time, enabling features that quantify congestion, speed variability, and exposure (e.g., vehicle-kilometers traveled). When fused with crash records and local demographic covariates (population density, age distribution, vehicle ownership rates, socio-economic indices), they allow models to control for exposure and to identify structural or community vulnerabilities associated with crash risk. OpenTraffic and its documentation describe methods for processing telemetry into segment-level metrics and emphasize reproducibility and privacy in data handling.

This project positions itself at the intersection of GIS, transportation engineering, and data science. The objective is not only to produce accurate predictive models but to make outputs interpretable and actionable for stakeholders (traffic engineers, enforcement agencies, city planners). That implies (a) adopting evaluation metrics aligned with operational needs (e.g., precision@k for hotspot lists used to allocate limited resources), (b) developing visual outputs (interactive maps, time-of-day risk strips) that non-technical users can inspect, and (c) documenting an open pipeline that respects privacy while enabling reproducibility. The rest of the document details prior work (literature survey), the precise problem formulation, our proposed method and architecture, experimental strategy, and references.

Literature Survey

A robust literature survey must cover (i) spatial statistics and hotspot identification methods; (ii) traffic and telemetry data sources (including OpenTraffic); (iii) machine learning approaches to crash prediction and severity modeling; and (iv) recent hybrid and network-aware techniques that combine GIS with ML and deep learning.

Spatial-statistical methods and hotspot detection. Kernel Density Estimation (KDE) has been widely used to create continuous heatmaps of crash intensity from discrete crash locations. KDE smooths point events using a kernel (e.g., Gaussian) and a bandwidth parameter that controls spatial smoothing. Studies comparing KDE with other geostatistical tools (kriging, network-based KDE) find that KDE is intuitive and effective for visual hotspot spotting but that results are sensitive to bandwidth and to whether smoothing respects the road network (i.e., Euclidean KDE can blur hotspots across physical barriers or across non-connected roadways). Network-based KDE (NKDE) and line-based approaches address this by projecting smoothing along the road geometry to better represent exposure on the linear network. Research comparing KDE against Getis-Ord Gi* and other local cluster statistics shows complementarity: KDE emphasizes continuous intensity while Gi* identifies statistically significant local clusters relative to a spatial null. Practical guidance emphasizes sensitivity analysis (varying bandwidth) and combining methods to triangulate hotspot locations.

OpenTraffic and open telemetry sources. The OpenTraffic initiative assembled anonymized vehicle GPS traces into segment-level traffic statistics that can be linked to OpenStreetMap road geometry, enabling global, reproducible traffic analytics. OpenTraffic’s platform and completion report document methods for aggregating telemetry into travel-time and speed metrics, privacy-preserving aggregation, and APIs for historical queries. The availability of such telemetry enables features rarely available in older hotspot studies: per-segment average and variance of speed, temporal profiles (rush hours), and measures of flow disruption. Several case studies and governmental projects have used OpenTraffic and similar telemetry for travel-time estimation and congestion analysis; researchers have adapted the same inputs for crash exposure estimation (vehicle-km traveled proxies) and dynamic risk profiling.

Machine learning for crash prediction and severity modeling. Over the last decade, researchers have applied a suite of ML algorithms to predict crash occurrence and severity from tabular, spatial, and temporal features. Random forests and gradient boosting machines (GBM/XGBoost/LightGBM) are commonly used because they deliver strong baseline performance and variable importance measures. Several recent surveys and application papers report that ensemble methods often outperform single models for classification of crash vs non-crash and for severity regression, with GBMs frequently leading in accuracy metrics. Deep learning approaches, including feedforward neural nets and convolutional or recurrent architectures, have been explored where large datasets are available. More recent work investigates Graph Neural Networks (GNNs) to directly model the road network as a graph, allowing message passing to capture dependencies between neighboring segments and enabling predictions that inherently respect network topology. Interpretable ML practices — SHAP values, partial dependence plots, rule extraction — are emphasized to translate model outputs into policy recommendations.

Integration of demographics, land use, weather, and contextual covariates. Numerous studies show that socio-demographic factors (income, age distribution, commuting patterns), land use (commercial vs residential), and weather conditions (rain, fog) contribute materially to crash risk. Models that include demographic covariates alongside traffic features can better control for exposure differences and often improve predictive calibration. For instance, zones with high pedestrian activity and low pedestrian infrastructure frequently show elevated pedestrian-involved crashes; demographic vulnerability (e.g., large elderly populations) can increase the severity distribution. Recent spatial analyses incorporate census tract variables and night/day segmentation to highlight these effects.

Network-aware and hybrid approaches. The frontier blends network-aware spatial statistics with ML. Examples include using NKDE to generate smoothed target variables for ML, embedding road segment topology into feature representations, and applying GNNs for forecasting crash intensities. Comparative studies indicate that coupling KDE or spatial lag features with tree-based models yields better hotspot prediction than either approach alone. Other studies emphasize calibration of KDE bandwidths and the inclusion of severity weights, to ensure hotspot maps reflect risk rather than raw counts. Recent works also explore clustering (DBSCAN) for identifying dense accident clusters and then applying localized models to predict risk within clusters.

Evaluation paradigms and operational metrics. Important literature points out that traditional classification metrics (accuracy, overall AUC) may be insufficient for hotspot prioritization tasks. Operational needs favor ranking and top-k precision: given resources to inspect N sites, how many true hotspots will you catch? Therefore metrics like precision@k, recall@k, and cost-weighted utility (where false negatives at high exposure sites are penalized more) are commonly recommended. Temporal validation (train on earlier years, test on subsequent periods) and spatial cross-validation (leave-one-area-out) are necessary to estimate real-world generalization. Several field studies report that model-guided interventions, when piloted, reduce crash frequency; however, randomized controlled deployments remain rare, and rigorous cost–benefit studies are an ongoing need.

Gaps and opportunities summarised from the literature. Key gaps include the need for (a) standardized pipelines that merge telemetry, crash, and demographic data at the road-segment level while respecting privacy; (b) robust, operationally relevant evaluation metrics; (c) interpretable ML that can recommend specific countermeasures; and (d) network-aware methods that avoid spatial spillover misinterpretation. This project aims to address these gaps by delivering a reproducible pipeline, model comparisons (KDE baselines vs ML vs GNNs), thorough feature ablation to show the contribution of OpenTraffic-derived variables, and actionable visualizations for planners.

Problem Statement

High-level goal. Design, implement, and evaluate a reproducible system that predicts spatially and temporally resolved road accident risk (hotspots) by fusing OpenTraffic telemetry, official crash records, and demographic data. The system must (1) produce ranked hotspot lists and risk maps at the granularity of road segments or lixels; (2) provide probabilistic predictions suitable for prioritizing constrained interventions; (3) be interpretable so that decision makers can understand contributing factors; and (4) be reproducible and privacy-conscious.

Specific objectives.

  1. Data integration and exposure modeling. Build a pipeline to ingest OpenTraffic (or comparable GPS-derived) segment-level speed and travel time statistics, police crash reports (geocoded), and demographic variables (census blocks). Harmonize spatial references, map crash points to road segments or lixels, and compute exposure proxies (e.g., estimated vehicle-km traveled) so that predictions control for exposure differences.

  2. Hotspot identification baselines. Implement classical hotspot detection methods — Euclidean KDE, network-based KDE (NKDE), Getis-Ord Gi*, and spatial scan (saTScan) — to provide baseline maps and to evaluate false positives arising from purely retrospective counting.

  3. Predictive modeling. Train and compare a set of predictive models that estimate crash intensity or probability on each road segment and for time bins (hour of day / day of week / seasonal). Candidate models: logistic/Poisson regression (with spatial lag), random forest, gradient boosting (XGBoost/LightGBM/CatBoost), and graph neural networks (GNN) operating on the road network. Evaluate models using temporal holdout (train on years 𝑇0..n, test on 𝑇n+1), spatial cross-validation, and operational metrics (precision@k).

  4. Interpretability and countermeasure suggestion. Use SHAP, partial dependence, and local explanation techniques to identify actionable risk drivers per hotspot (e.g., excessive speed variance, high pedestrian density, low lighting), and map those drivers to candidate countermeasures.

  5. Operational visualization and ranking. Produce an interactive GIS dashboard (or a packaged set of static maps and tables) that ranks hotspots with contextual information (crash counts, severity mix, exposure, demographic vulnerability) and provides time-of-day risk profiles for each site.

Constraints and success criteria.

  • Predictions should be made at a resolution useful for interventions (e.g., 50–200 m lixels or actual road segments).

  • Primary success criteria: model yields statistically significant improvement over KDE baseline in precision@k for top 100 hotspots and demonstrates stable temporal generalization in held-out years.

  • Secondary criteria: explanations identify plausible, actionable drivers confirmed by domain experts and improved clarity of prioritization for limited budgets.

Privacy, fairness, and ethics.

  • Use only aggregated/anonymized telemetry; do not attempt to reconstruct individual trajectories.

  • Be mindful that demographic covariates can be proxies for protected attributes; therefore include fairness checks (e.g., ensure interventions do not systematically disfavor vulnerable communities).

  • Create documentation on data governance and obtain necessary approvals before operational deployment.

Methodology

The methodology for predicting road accident hotspots using open traffic and demographic data is organized into seven major phases, each constructed to ensure accurate, interpretable, and actionable hotspot detection.

1. Problem Understanding & Objective Definition

The overall goal is to build a predictive model that identifies road segments or intersections with high probabilities of accidents based on:

  • Traffic flow patterns

  • Speed variations

  • Vehicle density

  • Population and demographic characteristics

  • Land-use and environmental context

  • Historical crash records

The model should generate output in the form of:

  • Hotspot maps (GIS-based)

  • Risk scores for each road segment

  • Interpretability metrics (SHAP, feature importance)

2. Data Collection

This project integrates multi-source, heterogeneous datasets:

2.1 Open Traffic Data

Sources include:

  • Google Open Traffic data

  • HERE traffic datasets

  • OpenStreetMap speed and road type indicators

  • Mobile GPS-based flow/speed data

Data features include:

  • Average speed

  • Speed variance

  • Traffic density

  • Congestion index

  • Travel-time reliability

2.2 Demographic Data

Collected from census/open datasets:

  • Population density

  • Age distribution

  • Vehicle ownership

  • Income levels

  • Pedestrian activity

  • School/market locations

  • Urban land-use type

2.3 Historical Accident Data

From:

  • Transport departments

  • Traffic police accident logs

  • Open government datasets

Accident attributes:

  • Location (coordinates)

  • Time and date

  • Severity

  • Vehicle type

  • Weather/lighting conditions

2.4 Environmental & Contextual Data

  • Road network (OSM)

  • Road geometry (curvature, slope, lanes)

  • Weather archives

  • Land-use zoning

3. Data Preprocessing & Integration

3.1 Cleaning

  • Remove duplicates

  • Geo-correct accident coordinates

  • Handle missing demographic attributes

  • Normalize speed/traffic data

3.2 Spatial Integration

GIS operations:

  • Spatial Join: assign accident points to road segments

  • Lixel Segmentation: divide long roads into equal-length units

  • Buffer Analysis: extract demographics within 50–200 m of road

  • Coordinate Transformation to a uniform CRS

3.3 Temporal Preprocessing

Extract:

  • Hour of day

  • Peak/off-peak indicators

  • Weekday/weekend

  • Seasonality

  • Festival/holiday periods

3.4 Feature Engineering

Traffic Features

  • Avg. speed

  • Speed variation

  • Congestion score

  • Flow density

Road Geometry

  • Road type (highway, arterial, local)

  • Number of lanes

  • Intersection density

  • Road curvature

  • Presence of dividers

Demographic + Land-use

  • Pedestrian density

  • School proximity

  • Commercial zone score

  • Income and socio-economic index

Crash Statistics

  • Past accident count

  • Severity-weighted index

4. Modeling Approach

Accident hotspot prediction is framed as:

A. Classification Task

Predict high-risk vs low-risk road segmentsModels used:

  • Random Forest

  • Gradient Boosting (XGBoost, LightGBM, CatBoost)

  • Neural Networks

  • Support Vector Machines

B. Regression Task (Crash Frequency Prediction)

Models:

  • Poisson Regression

  • Negative Binomial Regression

  • Zero-inflated models

C. Spatial ML Models

GIS + machine learning:

  • Geographically Weighted Regression

  • Spatial lag/error models

D. Graph Neural Networks (Advanced)

Road networks behave like graphs.GNNs capture:

  • Node connectivity

  • Traffic propagation

  • Spatial correlation

This improves accuracy in dense road networks.

5. Model Training & Evaluation

5.1 Dataset Split

  • 70% training

  • 15% validation

  • 15% test

Spatial cross-validation ensures geographic independence.

5.2 Evaluation Metrics

  • Accuracy

  • ROC-AUC

  • F1-score

  • Precision@K (important for ranking hotspots)

  • Mean Absolute Error (MAE) for regression

  • Spatial autocorrelation (Moran’s I)

5.3 Interpretability

Methods used:

  • SHAP values

  • Permutation feature importance

  • Partial Dependence Plots

These reveal:

  • Most dangerous times

  • High-risk demographic combinations

  • Dangerous road types

6. Hotspot Generation (GIS Output)

The final model assigns a risk score (0–1) or expected crash count to every road segment.

Using GIS tools like QGIS, Folium, Kepler.gl, create:

  • Heatmaps

  • Road-segment risk overlays

  • Bivariate maps (traffic + demographics)

  • Time-based animation maps

Hotspots are classified into:

  • Red Zone – Very High Risk

  • Orange Zone – High Risk

  • Yellow Zone – Moderate Risk

  • Green Zone – Low Risk

7. Deployment Workflow

Deploy the model using:

  • Flask/FastAPI backend

  • Interactive web dashboard

  • Cloud deployment (AWS/GCP/Azure)

  • Real-time traffic data integration

Dashboard features:

  • Hotspot map viewer

  • Risk timeline

  • Recommendation engine (speed breakers, signal timing, etc.)

UML Diagrams

5.1 Use Case Diagram

Actor:

  • User (Road Safety Analyst / Authority / Researcher)

Main Use Cases:

  • Upload traffic, demographic, and accident datasets

  • Generate hotspot predictions

  • Visualize hotspots on GIS map

  • Download reports

Short Explanation

The user interacts with the system to load datasets, trigger the machine learning workflow, view predictive results, and export hotspot analytics. The system automates preprocessing, modeling, and hotspot generation.

Figure
Figure 1

5.2 Activity Diagram

Purpose

Shows the step-by-step workflow from loading data to predicting hotspots.

Short Explanation

The system begins with data collection, preprocessing, feature engineering, model training, validation, generating hotspot predictions, and deployment.

Figure
Figure 2

5.3 Sequence Diagram

Purpose

Shows the real-time message flow between system components.

Key Objects

  • User

  • Frontend UI

  • Processing Engine

  • ML Model

  • GIS Module

Short Explanation

The user requests prediction → UI sends data → Processing engine cleans and prepares data → ML model calculates hotspot scores → GIS engine generates maps → Results returned to user.

Figure
Figure 3

5.4 Class Diagram

Purpose

Shows internal structure — classes, attributes, and relationships.

Key Classes

  • DatasetLoader

  • Preprocessor

  • FeatureEngineer

  • MLModel

  • PredictionEngine

  • GISVisualizer

  • ReportGenerator

Short Explanation

Every process in the system (loading, cleaning, engineering features, modeling, prediction generation, mapping) is represented as an object class.

Figure
Figure 3

5.5 Component Diagram

Purpose

Shows major system components and their dependencies.

Short Explanation

The system is divided into components such as data ingestion, preprocessing, modeling, prediction engine, GIS rendering, and reporting.

Figure
Figure 5

5.6 Deployment Diagram

Purpose

Shows the physical deployment environment (servers, devices, nodes).

Short Explanation

Shows how the system is deployed on:

  • Client machine (browser)

  • Application server

  • ML model server

  • Database server

  • GIS server

Figure
Figure 6

PROPOSED METHOD WITH ARCHITECTURE AND TOOLS

This section describes the proposed end-to-end architecture, data sources, feature engineering, modeling choices, interpretability techniques, validation strategy, and recommended software stack and hardware.

6.1 Overview architecture (high level)

  1. Data Ingestion Layer.Sources: OpenTraffic aggregated segment speed and travel time data; geocoded crash records from traffic police or open portals; demographic data from national census (block/tract level); road network geometry from OpenStreetMap.Function: scheduled pulls (or one-time snapshots for historical experiments), checksum validation, and initial parsing to standardized schemas.

  2. Preprocessing & Spatial Join Layer.Map crash points to nearest road lixel/segment using linear referencing (snap to nearest polyline within threshold).Aggregate OpenTraffic telemetry to the same segment/lixel resolution (hourly / daily aggregates).Join demographic attributes using areal-to-linear join (e.g., intersect segment buffer with census polygons, compute population and vulnerability density per segment).

  3. Feature Store & Exposure Modeling.For each spatial unit×time bin produce features: historical crash counts (lagged windows), average speed, speed variance, congestion indices (ratio of free-flow speed to observed), traffic volume proxy (flow estimate or taxi counts), day/time indicators, weather flags (if available), and demographic vulnerability indices.Compute exposure denominators (estimated vehicle-km traveled) to convert counts to rates when needed.

  4. Model Training & Selection.Baselines: Poisson regression with exposure, KDE heatmap ranking.ML: Random Forest, XGBoost/LightGBM (tabular), and a GNN variant (e.g., GraphSAGE or GAT) that ingests node/edge features and spatial adjacency.Loss objectives: (a) binary classification for “hotspot” vs not; (b) count/Poisson regression for accident intensity; and (c) ranking loss if optimizing direct hotspot ranking.

  5. Interpretability & Decision Support.Global: SHAP summary, feature importances, partial dependence curves.Local: SHAP explanations per hotspot and natural-language explanation templates mapping features to candidate countermeasures.Output: interactive maps (Leaflet/Kepler/Deck.gl), downloadable hotspot lists, PDF reports.

  6. Evaluation & Monitoring.Metrics: precision@k, recall@k, AUC, calibration plots, and cost-weighted utility.Cross-validation: temporal holdout (train on years N, test next year), spatial cross-validation (leave-region-out).Sensitivity: ablation of OpenTraffic features to quantify marginal value of telemetry.

Figure
Figure 7

6.2 Detailed methods

Data ingestion and spatial harmonization

  • Use OpenTraffic APIs or the otv2 platform codebase to extract segment travel time and speed summaries (per hour or per day). OpenTraffic documentation and repositories describe aggregation procedures and mapping to OSM segments; follow their recommended privacy-preserving aggregation windows.

  • Use spatial indexing (R-tree) to accelerate point-to-segment joins. When mapping crash points to segments, use snapping thresholds (e.g., 30 m) and maintain flags when a crash is ambiguous (near intersections).

  • Convert demographic polygons to segment-level features by intersecting a buffered segment polygon and computing per-segment densities (people per 100 m, percent elderly, unemployment rate, etc.).

Feature engineering

  • Temporal features: hour of day (cyclic encoding), day of week, holiday flags, rolling averages of traffic speed and speed variance (lags: 1 day, 7 days, 30 days).

  • Traffic features: mean speed, 10th/90th percentile speeds, coefficient of variation, fraction of observations above posted speed limit, free-flow ratio, and travel time reliability metrics.

  • Exposure features: estimated vehicle-km traveled (VKT) per segment approximated by flow proxies or modelled from telemetry.

  • Spatial context: adjacency averages (neighboring segment crash counts, neighbor speed variance), land use categories (retail, residential), intersection density, and presence of pedestrian infrastructure.

  • Demographic features: population density, percent children/elderly, median income, vehicle ownership rates.

Modeling strategy

  • Baseline KDE/NKDE. Compute Euclidean KDE and network-aware KDE to generate retrospective heatmaps and baseline rankings. Use varying bandwidths and produce sensitivity analysis.

  • Tabular ML (GBM). Train GBMs with careful class weighting (as crashes are rare) or use focal loss. Calibrate probabilities (Platt scaling, isotonic regression) if needed.

  • GNN approach. Construct a graph where edges represent adjacency along the road network and nodes represent segment centroids. Node features are segment features above, and target is crash count or hotspot label. Train a GNN with Poisson or classification heads to allow message passing of spatial influence.

  • Model explainability. After model training, compute SHAP values for the most important features. For each hotspot, present top contributing features and map to recommended interventions (e.g., high speed variance → speed calming; high pedestrian density + poor lighting → crosswalk improvements).

Validation & evaluation

  • Use temporal holdout validation (train on years 2017–2020, test on 2021) where available to ensure forecasting capability.

  • Use operational metrics: precision@k for the top K ranked segments (K chosen by realistic inspection budget), AUC for classification, and mean absolute error for regression/counts.

  • Perform ablation study: model with all features vs model without OpenTraffic features to quantify telemetry value.

6.3 Tools, libraries, and environment

Data processing & GIS

  • PostgreSQL + PostGIS for spatial storage and indexing.

  • GDAL / Fiona / Shapely for geometry operations.

  • OSMnx for road network extraction and lixelization utilities.

Data ingestion and ETL

  • Python (pandas, geopandas), Apache Airflow or Prefect for orchestration.

  • OpenTraffic codebase (github/opentraffic) for telemetry ingestion.

Modeling

  • scikit-learn for baselines, XGBoost / LightGBM / CatBoost for GBMs.

  • PyTorch Geometric or DGL for GNN implementations.

  • SHAP library for interpretability.

Visualization & dashboard

  • Kepler.gl or Deck.gl for interactive spatial visualizations.

  • Leaflet/Mapbox with a small web app (Flask/FastAPI + React) for distribution.

  • Static reporting: matplotlib + geopandas for exportable PNG/PDF maps.

Hardware

  • A standard ML workstation (16–64 GB RAM, GPU optional for GNNs) is sufficient for city-scale experiments; cloud instances (AWS/GCP) can be used for larger areas.

6.4 Deployment & policy pathway

  • Document a reproducible pipeline; publish code and aggregate artifacts (not raw telemetry) to a repository.

  • Provide a governance checklist for data sharing, anonymization thresholds, and local stakeholder engagement.

  • Recommend pilot deployment on top N hotspots and measurement of before/after crash rates with controlled evaluation if possible.

System Specifications

1. System Overview

The system is designed to ingest open traffic, demographic, and historical accident datasets, process and integrate them, engineer features, train predictive models, and generate hotspot risk maps. It provides an interactive dashboard for visualization and supports batch and on-demand hotspot prediction.

2. Functional Requirements

Data Ingestion

  • Import open traffic datasets (speed, travel-time, and GPS-based features).

  • Import demographic datasets (population density, age distribution, land use).

  • Import historical accident data with geolocation.

Data Processing

  • Data cleaning, handling missing values, noise removal, and normalization.

  • Map-matching accident points to road segments.

  • Spatial joining of demographic, traffic, and crash datasets.

Feature Engineering

  • Generation of road geometry features, traffic variability metrics, meteorological features (optional), and neighborhood crash densities.

Modeling

  • Train machine learning models (Random Forest, XGBoost, or GNN).

  • Perform model validation using temporal or spatial cross-validation.

  • Generate risk scores and classify segments into hotspot levels.

Hotspot Generation

  • Produce hotspot lists using risk thresholds.

  • Create GIS-based heatmaps and hotspot overlay layers.

  • Export results in GeoJSON, CSV, and PDF formats.

Visualization

  • Interactive dashboard to view hotspots, apply filters, and analyze crashes.

  • Map layers allowing toggling of traffic, demographic, and accident features.

Model Deployment & Monitoring

  • Automatic batch prediction (daily/weekly).

  • Ability to run on-demand prediction for any road segment.

  • Monitor data drift and schedule retraining when necessary.

3. Non-Functional Requirements

Performance

  • Faster data preprocessing (< 10 minutes for city-scale data).

  • Batch prediction completion within 1–2 hours.

Scalability

  • Ability to scale to millions of traffic and crash records.

  • Support for multi-region or multi-city deployments.

Reliability

  • 99% system uptime.

  • Automatic restart/retry for failed ETL pipelines.

Security

  • Role-based access control (RBAC).

  • Data encryption in transit (TLS) and at-rest (AES-256).

  • No storage of identifiable device-level raw GPS data.

Usability

  • Intuitive dashboard for non-technical users.

  • Clear map-based results with filters and severity indicators.

Maintainability

  • Modular architecture with separate data, modeling, and visualization layers.

  • Code version control and automated build/test pipelines (CI/CD).

4. Hardware Requirements

Minimum Hardware (Development System)

  • CPU: 4–8 cores

  • RAM: 16–32 GB

  • Storage: 512 GB SSD

  • GPU: Optional (for GNN training)

Recommended (Small Production Server)

  • CPU: 8 cores

  • RAM: 32–64 GB

  • Storage: 1 TB SSD

  • Cloud DB: PostgreSQL/PostGIS (50–200 GB)

  • Cloud Storage: S3 bucket for raw files and model artifacts

5. Software Requirements

Operating System

  • Windows 10/11, Ubuntu 20.04+, or any cloud/Linux environment

Backend Software

  • Python 3.9+

  • Machine Learning Libraries: Scikit-learn, XGBoost, LightGBM, PyTorch (optional)

  • Data Processing: Pandas, NumPy, GeoPandas

  • Spatial Libraries: PostGIS, Shapely, OSRM/Valhalla/GraphHopper (for map-matching)

Database & Storage

  • PostgreSQL with PostGIS extension

  • Cloud storage (AWS S3 / Google Cloud Storage)

Frontend / Visualization

  • Web dashboard using Leaflet, Mapbox, or Deck.gl

  • Flask / FastAPI backend (API layer)

Other Tools

  • Docker for containerized deployment

  • Airflow or Prefect for ETL automation

  • MLflow for model tracking

  • GitHub/GitLab for version control

6. System Architecture Summary

The system consists of:

  1. Data Layer:Raw traffic, demographic, and accident datasets stored in cloud storage.PostGIS database storing cleaned data and features.

  2. Processing Layer:Map-matching, aggregation, feature engineering, and model training.

  3. Prediction Layer:ML model generates risk scores for each road segment.Hotspot classification and ranking.

  4. Visualization Layer:Dashboard showing hotspots, maps, charts, and summary reports.

  5. Deployment & Monitoring Layer:Scheduled batch processing.API for real-time hotspot scoring.Monitoring for data quality and model drift.

7. Constraints & Assumptions

  • Data sources must be open or officially provided.

  • GPS data must be anonymized before ingestion.

  • Predictions rely on data quality: noisy or incomplete datasets reduce accuracy.

  • System assumes consistent geographic reference (WGS84).

8. Expected Output

  • Road-segment-level hotspot classification map.

  • Ranked list of high-risk segments with risk scores.

  • Visual heatmaps for different times of day/week.

  • Downloadable reports and GIS layers for planners and policymakers.

System Implementation

8.1 Introduction

The implementation phase translates the proposed architecture, design specifications, and analytical models into a functional working system. This chapter describes how the system is built—starting from data ingestion and preprocessing, followed by model training, risk-score generation, hotspot visualization, and deployment. The main goal of system implementation is to ensure that each module operates as intended and integrates seamlessly with the overall workflow.

The implemented system consists of five major modules:

  1. Data Ingestion and Preprocessing

  2. Feature Engineering

  3. Predictive Modeling

  4. Hotspot Detection and Map Generation

  5. User Interface & Visualization Dashboard

Each module was built using a combination of Python, PostGIS, machine learning libraries, and GIS mapping frameworks.

8.2 Module Implementation

8.2.1 Data Ingestion Module

Objective

To collect, import, and store various open datasets that include traffic, accident, and demographic information.

Implementation Details

  • Traffic Data:Open traffic sources such as TomTom Traffic Index, OpenTraffic, or city-level datasets were downloaded as CSV/GeoJSON files.These files were imported into PostgreSQL/PostGIS using scripts written in Python’s psycopg2 and GeoPandas.

  • Accident Data:Historical road crash datasets containing geolocation, severity, vehicle type, and time-of-day attributes were cleaned and formatted.

  • Demographic Data:Data such as population density, land use type, age distribution, and economic indicators were obtained from government open-data portals.

Processes Implemented

  • Conversion of raw CSV/Excel → GeoDataFrame

  • Spatial reference correction (WGS84 standard)

  • Upload into PostGIS using Python ETL scripts

  • Logging mechanism to track imported files and data quality

8.2.2 Preprocessing Module

Steps Implemented

  1. Missing Data Handling:Replacing or dropping missing values using interpolation and statistical techniques.

  2. Noise Removal:Outlier detection using the IQR (Interquartile Range) and Z-score methods.

  3. Data Normalization:Applying Min–Max scaling for continuous features.

  4. Map Matching:GPS-based accident points were aligned with road network segments using OSRM/GraphHopper tools.

  5. Spatial Joins:Accidents were matched with demographic zones (wards/taluk/blocks) using GeoPandas spatial join functions.

Outputs

  • Cleaned accident dataset

  • Road-segment-level traffic and demographic attributes

  • Consistent geospatial training dataset for the model

8.2.3 Feature Engineering Module

Features Implemented

Table 1
Category Features Extracted
Traffic Features Avg speed, speed variance, congestion index
Accident Features Crash density, severity index, time-of-day risks
Demographic Features Population density, land use, age ratios
Road Geometry Road curvature, junction density, road class

Implementation Tools

  • Python (Pandas, GeoPandas, Shapely)

  • Spatial buffers (30m, 50m, 100m) for neighborhood crash analysis

  • Normalization and encoding of categorical features

This module outputs a fully structured feature matrix used for model training.

6.2.4 Predictive Modeling Implementation

Model Selection

After testing multiple algorithms, the following models were implemented:

  • Random Forest Classifier

  • Gradient Boosting / XGBoost

  • Logistic Regression (baseline model)

Training Implementation

  • 70/30 train-test split

  • 5-fold cross-validation

  • Hyperparameter tuning using Grid Search

Performance Metrics

  • Accuracy

  • Precision & Recall

  • ROC–AUC

  • Confusion Matrix

Final Model Output

The final selected model assigns a risk score to each road segment:

  • 0 – 0.25: Low risk

  • 0.25 – 0.5: Moderate risk

  • 0.5 – 0.75: High risk

  • 0.75 – 1.0: Critical hotspot

8.2.5 Hotspot Detection & Mapping Module

Steps Implemented

  1. Convert prediction scores → hotspot categories

  2. Generate heatmaps using:Folium (Python)Leaflet.js (web visualization)

  3. Overlay accident points on predicted hotspot zones

  4. Export hotspot layers in:GeoJSONPNG MapPDF Report

Outputs

  • Hotspot maps

  • Risk-ranked road segments

  • Temporal hotspot analysis (day/night/peak hours)

8.2.6 User Interface Implementation

A clean and interactive dashboard was developed.

Technologies Used

  • Backend: Flask / FastAPI

  • Frontend: HTML5, CSS3, Bootstrap

  • Maps: Leaflet.js / Mapbox

  • Charts: Chart.js

Features Implemented

  • View predicted hotspots on city map

  • Filter by severity level

  • Toggle map layers (traffic, demographic, accident data)

  • Download hotspot reports

  • Analyze segment-level risk factors

8.3 Testing Implementation

Types of Testing Conducted

  • Unit Testing:Tested Python modules for data cleaning, feature engineering, and model prediction.

  • Integration Testing:Verified database–API–frontend communication.

  • System Testing:End-to-end testing of prediction flow.

  • Performance Testing:Validated speed of batch predictions and map rendering.

Results

  • All modules executed successfully with validated inputs.

  • No major functional errors observed.

  • Model output accuracy acceptable for deployment.

8.4 Deployment Implementation

Deployment Setup

  • Backend API deployed using Docker containers.

  • Database hosted in PostgreSQL/PostGIS environment.

  • Dashboard deployed on local server/VM or cloud platform.

Scheduled Processes

  • Automatic daily update of traffic data

  • Weekly model retraining if new accident data is added

8.5 Summary

This chapter described the full implementation workflow of the system:

  • Data collected, processed, and prepared

  • Features engineered and models trained

  • Hotspots classified and mapped

  • Dashboard developed for user interaction

  • Final system deployed on a modular and scalable architecture

The system is now fully functional, capable of generating reliable accident hotspot predictions using open traffic and demographic datasets.

System Testing

9.1 Introduction

System Testing is a crucial phase that evaluates the functionality, performance, accuracy, and reliability of the developed system. The objective is to ensure that all components—from data ingestion to hotspot prediction and visualization—operate together as intended and meet the specified requirements.

This chapter describes:

  • The testing strategies used

  • Different levels of testing applied

  • Test cases developed

  • Model performance evaluation

  • System stability and accuracy assessment

  • Testing results and conclusion

The goal of testing is to validate that the system is fully functional, free from major defects, and ready for deployment.

9.2 Testing Objectives

The main objectives of system testing are:

  1. To ensure each module performs its intended function.

  2. To evaluate system accuracy in predicting accident hotspots.

  3. To verify the integration between database, backend, machine learning model, and UI.

  4. To check system performance under different data loads.

  5. To identify errors and correct them before deployment.

  6. To confirm that non-functional requirements like usability, reliability, and security are met.

9.3 Types of Testing Performed

The following testing techniques were applied:

9.3.1 Unit Testing

Purpose

To test individual modules or functions in isolation.

Modules Tested

  • Data cleaning functions

  • Missing value handling

  • Feature generation functions

  • ML model prediction methods

  • API endpoints

  • Map rendering functions

Outcome

All functions returned expected outputs. Errors related to data type mismatches were fixed.

9.3.2 Integration Testing

Purpose

To verify correct communication between combined modules.

Integrations Tested

  • Python preprocessing → PostGIS database

  • Feature engineering → Model training

  • Model API → Dashboard map

  • Dashboard filters → API queries

Outcome

Integration was successful after fixing minor schema mismatches and API timeout issues.

9.3.3 System Testing

Purpose

To test the entire end-to-end workflow.

Workflow Tested

  1. Input raw CSV traffic & accident data

  2. Data cleaning and preprocessing

  3. Feature engineering

  4. Model prediction

  5. Hotspot map visualization

  6. Report generation & export

Outcome

The workflow executed successfully without critical failures.

9.3.4 Performance Testing

Goals

  • Test the speed of large dataset processing

  • Check model prediction time

  • Evaluate dashboard map rendering performance

Results

  • Preprocessing time: Acceptable

  • Prediction: Fast (few seconds per batch)

  • Dashboard loading: 2–5 seconds depending on layers

9.3.5 Usability Testing

Criteria Tested

  • Dashboard navigation

  • Clarity of hotspot maps

  • Ease of applying filters

  • Report download usability

Participants

  • 5–10 test users (students, staff, or developers)

Findings

  • Interface is easy to navigate

  • Hotspot map clarity rated high

  • Users recommended adding tooltips (implemented)

9.3.6 Security Testing

Checks Performed

  • Input validation in API

  • SQL injection tests

  • Unauthorized API access

  • Data masking (no sensitive personal data stored)

Outcome

  • System passed all basic security tests

  • API rate-limiting added for safety

9.4 Test Case Design

Sample Test Case Table

Table 2 Functional Testing Test Cases
TC No. Test Case Description Input Expected Output Actual Result Status
TC01 Import raw accident data CSV file Data uploaded to DB Success Pass
TC02 Clean missing values Raw dataset Cleaned dataset Correct Pass
TC03 Generate features Training data Feature matrix Correct Pass
TC04 Train ML model Feature matrix Model saved Success Pass
TC05 Predict hotspots Road segments Risk scores Correct Pass
TC06 Display map layers User selection Map updates Working Pass
TC07 Export hotspot report Download request PDF/CSV Downloaded Pass

Table 3 Model Accuracy Test Cases
Metric Expected Result Status
Accuracy >70% Achieved Pass
Precision High High Pass
Recall High Medium-High Pass
ROC–AUC >0.75 0.82 Pass

9.5 Error Handling & Bug Fixes

During the testing phase, several issues were found and resolved:

Common Issues

  1. Null geometry errors → Fixed by enforcing spatial validation

  2. API timeout → Added optimized query indexes

  3. Slow map rendering → Compressed GeoJSON output

  4. Prediction mismatch → Standardized feature scaling

After corrections, the system operated smoothly.

9.6 Test Results Summary

The final test results show:

  • All functional modules work correctly

  • System integration is stable

  • Machine learning model performs reliably

  • Dashboard interface is user-friendly

  • Performance is acceptable for real-time usage

  • Security vulnerabilities were minimal and resolved

Overall, the system is tested thoroughly and ready for deployment.

9.7 Conclusion

The system testing phase ensured that the predictive modeling system for road accident hotspot detection meets all project requirements. The system was validated across multiple testing levels—unit, integration, system, usability, performance, and security.

The testing outcome confirms:

  • Accurate hotspot prediction

  • Efficient data processing

  • Reliable UI performance

  • Smooth end-to-end operation

Thus, the system is stable, robust, and suitable for real-world use by traffic departments, policymakers, and urban planners.

Results And Screenshots

The developed system for Predictive Modeling of Road Accident Hotspots using Open Traffic and Demographic Data produces several meaningful outputs that help evaluate accident patterns and identify high-risk areas. The results include visual maps, dashboard interfaces, analytical charts, danger-zone listings, and model performance indicators. Together, these outputs demonstrate the effectiveness, usability, and accuracy of the system.

10.1 Heatmap Prediction

The heatmap prediction visually highlights accident-prone areas across the region. Red zones indicate critical hotspots, while yellow and green shades represent moderate and low-risk areas. This output confirms that accidents are concentrated around intersections, commercial zones, and heavy-traffic corridors. The heatmap helps authorities quickly identify where preventive measures such as signage, monitoring, or road design changes are most needed. It also validates the model’s ability to capture spatial accident trends.

Figure
Figure 7

10.2 Dashboard UI

The dashboard provides an interactive interface for exploring all system outputs in one place. Users can view the accident map, explore analytics, filter data by time or location, and inspect model predictions. The clean layout allows smooth navigation, while charts and tables give clear summaries of accident distribution. This dashboard significantly enhances usability, making the system accessible even to non-technical users such as traffic planners and safety officers.

Figure
Figure 8

10.3 Graphs and Analytical Charts

Line Chart

The line chart displays time-based accident trends, showing how accident frequency changes across days, months, or seasons. Peaks often correspond to rush hours or festival periods, confirming known traffic behavior patterns.

Bar Chart

The bar chart compares accident categories such as severity levels, road types, or vehicle involvement. This helps identify which factors contribute most to accidents, revealing patterns like higher incidents on highways or greater severity at intersections.

Pie Chart

The pie chart presents percentage distribution of accident attributes, such as weather conditions or age groups involved. It gives a quick overview of contributing factors and helps understand the composition of accident data.

Together, these charts help validate the dataset and provide a deeper understanding of traffic risk patterns.

Figure
Figure 9

10.4 Danger Zone Table

The danger zone table lists the top high-risk road segments identified by the model. It includes fields such as location name, number of historical accidents, predicted risk score, and severity level. This structured output provides an actionable list of zones requiring immediate intervention. Authorities can use it to prioritize road safety measures such as speed control, improved lighting, or surveillance installation.

Figure
Figure 10

10.5 Model Output

The model output consists of predicted risk scores for every road segment, classified into categories such as Low, Medium, High, and Critical. These scores serve as the foundation for generating heatmaps, charts, and danger zone tables. The distribution of predicted risk levels aligns with historical accident patterns, showing that the model successfully captures both spatial and statistical relationships between traffic and demographic features.

Figure
Figure 11

Overall Discussion

The combined results show that the system effectively integrates data preprocessing, machine learning, and geospatial visualization to identify accident hotspots. The outputs provide clear insights into where and why accidents are more likely to occur. The dashboard improves accessibility, while the analytical charts and tables aid interpretation. The accuracy metrics confirm that the model is reliable, and the predicted maps closely match historical hotspots.

Conclusion

This project focused on developing a predictive system capable of identifying road accident hotspots using open traffic datasets and demographic information. The primary objective was to create a data-driven model that could analyze historical accident trends, extract key contributing factors, and generate accurate predictions of high-risk zones. The presented approach successfully integrates machine learning algorithms, geospatial mapping techniques, and interactive dashboard visualization to support evidence-based decision-making for road safety management.

The results demonstrated that accident occurrences are influenced by multiple factors, including traffic density, population distribution, road type, weather, and temporal patterns. By incorporating these variables, the model predicts accident-prone areas with a high degree of reliability, as validated through performance metrics such as accuracy, precision, recall, and F1-score. The heatmap visualization and danger zone table further provide practical insights by highlighting specific road segments and intersections that require immediate attention.

The interactive dashboard developed in this project enhances user accessibility and enables stakeholders—including transport planners, police departments, and municipal authorities—to explore results intuitively. The combination of graphs, charts, and map visualizations transforms raw data into meaningful information, supporting rapid interpretation and strategic planning. This platform makes the system usable for both technical and non-technical users.

Overall, the project demonstrates that predictive analytics can significantly improve road safety planning. By leveraging open data and modern machine learning techniques, governments and agencies can shift from reactive accident response to proactive accident prevention. The findings confirm that predictive modeling is not only feasible but highly effective for hotspot detection, allowing authorities to implement targeted measures such as improved signage, better lighting, enhanced enforcement, and optimized traffic flow management.

Although the system performs well, there are opportunities for future enhancement. Incorporating real-time traffic feeds, weather APIs, and live sensor data could further improve prediction accuracy. More advanced deep learning techniques or ensemble models can also enhance hotspot detection. Integration with mobile applications or public alert systems could enable community-level safety awareness.

In conclusion, the project successfully meets its objectives by providing a robust, scalable, and practical solution for predicting road accident hotspots. It contributes to the broader goal of reducing accidents, improving public safety, and supporting smart city initiatives. This work forms a strong foundation for future research and real-world deployment in intelligent transportation systems.

Future Scope

The project Predictive Modeling of Road Accident Hotspots using Open Traffic and Demographic Data provides a solid foundation for traffic safety analysis, yet it also opens several opportunities for enhancement. As data availability and computational technologies continue to advance, the system can be expanded to deliver more accurate, real-time, and actionable insights. The following points outline the potential future scope of this work:

1. Integration of Real-Time Data Sources

Currently, the system relies primarily on historical datasets. In the future, it can be enhanced by incorporating real-time traffic feeds, live GPS data, IoT sensor readings, CCTV analytics, and weather updates. Real-time data integration will allow dynamic hotspot prediction and immediate identification of emerging risk zones.

2. Use of Advanced Machine Learning and Deep Learning Models

Although current models provide strong performance, more sophisticated algorithms such as LSTM networks, CNN-based geospatial models, Gradient Boosting, or Hybrid Ensemble methods can improve prediction accuracy. Deep learning approaches can detect complex spatial-temporal patterns that traditional ML techniques may miss.

3. Expansion to Larger Geographic Regions

The system can be scaled to cover entire states, countries, or multiple cities. With cloud computing platforms like AWS, Google Cloud, or Azure, large-scale data processing and high-volume predictions become more feasible, enabling nationwide road safety monitoring systems.

4. Integration with Smart City Infrastructure

As cities adopt smart infrastructure, this model can be integrated with intelligent transportation systems (ITS). Examples include automatic diversion of vehicles during high-risk periods, adaptive traffic signal control, and autonomous vehicle navigation support based on predicted hotspots.

5. Mobile App and Public Alert System

Developing a mobile application can provide real-time alerts to drivers. Users could receive warnings when approaching high-risk zones, similar to hazard alerts in navigation apps. This would significantly improve public awareness and reduce accidents.

6. Automated Accident Reporting and Prediction API

A REST API service can be developed to allow other systems—such as police control rooms, traffic management centers, and navigation apps—to connect and retrieve live hotspot predictions. This enhances interoperability with government and third-party platforms.

7. Enhanced Visual Analytics

Future work can include advanced visualizations such as:

  • 3D geospatial maps

  • Time-lapse accident animations

  • Multi-layer demographic overlays

  • Risk comparison dashboards

These will make the system more intuitive for policymakers and researchers.

8. Incorporation of Human Behavior and Vehicle Factors

Adding more variables such as driver profile, vehicle condition, road quality, and pedestrian density can improve prediction reliability. Behavioral data like speeding incidents, phone usage, or braking patterns (from telematics) can also be integrated in future versions.

9. Collaboration with Government Agencies

Future versions can partner with traffic police departments, road transport authorities, and municipal corporations to obtain richer datasets. Official collaboration will refine prediction accuracy and support real-world deployment.

Summary of Future Scope

Overall, the future scope of this project is vast. By combining real-time data, advanced AI models, and smart infrastructure integration, the system can evolve into a powerful tool for reducing road accidents and supporting intelligent transportation systems. These enhancements will help transform cities into safer, smarter, and more efficient environments.

References
  1. OpenTraffic v2 platform and code repositories — OpenTraffic project (platform and documentation describing how GPS telemetry is aggregated to road segments).
  2. OpenTraffic Completion Report (methodology for GPS data collection, privacy, and travel time estimation).
  3. Thakali, L. (2015). Identification of crash hotspots using kernel density estimation vs kriging (Transportation Research Record / Springer). Comparative analysis of KDE and kriging for hotspot mapping and methodological discussion of network considerations.
  4. Santos, D., et al. (2021). Machine Learning Approaches to Traffic Accident Analysis (MDPI). Survey and examples of ML methods for crash prediction and hotspot detection, including data fusion approaches.
  5. Zheng, M., et al. (2024). Optimizing Kernel Density Estimation Bandwidth for Road (Sustainability / MDPI). Discusses sensitivity of KDE to bandwidth and the importance of severity weighting in hotspot identification.
  6. Mahato, R.K., et al. (2025). Spatial distribution and cluster analysis of road traffic accidents (PLOS or similar). Recent spatiotemporal analyses showing clustering and urban/rural patterns, and utility of combined spatial and demographic features.
  7. AlHashmi, M.Y.S. (2024). Thesis — Using Machine Learning for Road Accident Severity and Hotspot Identification (RIT repository). Examples of clustering (DBSCAN) and model pipelines for hotspot work.
  8. Budzyński, A. (2024). A machine learning approach for predicting road accidents (2024 PDF). Recent ML application with ensemble techniques and neural nets.
  9. Mohammed, S., et al. (2023). GIS-based spatiotemporal analysis for road traffic crashes (ScienceDirect). Case studies applying GIS statistical approaches to identify hotspots and causes.
  10. Alkaabi, K., et al. (2023). Identification of hotspot areas for traffic accidents (ScienceDirect). Uses GIS statistical approaches and spatial autocorrelation for hotspot identification.
  11. Rengarasu, T.M., et al. (2025). Network-based Kernel Density Estimators and Gamma regression for hotspot identification — recent application (PDF). Illustrates NKDE and statistical modeling at lixel granularity.