# Backtesting and benchmarks `motac` now treats forecasting as a **probabilistic** task, not only a deterministic mean-field rollout. ## Rolling-origin protocol Backtests are run with rolling train/test folds: 1. fit on `y[:, :train_end]` 2. forecast `horizon` steps ahead with Monte Carlo paths 3. score on held-out `y[:, train_end:train_end+horizon]` 4. repeat with a configurable rolling step This is implemented via: - `motac.eval.backtest_fit_forecast_nll` - `motac.eval.run_backtest_report` ## Probabilistic forecasts The model forecast path now supports sampling future count trajectories: - `forecast_count_paths_horizon` returns sampled paths and latent intensity paths. - `forecast_probabilistic_horizon` returns paths plus mean/quantile summaries. Key outputs: - forecast mean counts - quantile envelopes (default `q=(0.05, 0.5, 0.95)`) - fold-level and aggregate metrics ## Metrics Backtests report: - Negative log-likelihood (NLL) - RMSE - MAE - empirical interval coverage ## Baselines Reports include simple benchmark baselines: - `last_value` - `seasonal_naive` - `moving_average` These provide context for model gains and avoid single-model performance claims. ## Reproducible artifacts `run_backtest_report` writes: - `report.json` - fold metric figure - baseline comparison figure The report structure is designed for machine-readable comparison across Chicago and ACLED benchmark runs.