Backtesting and benchmarks

motac now treats forecasting as a probabilistic task, not only a deterministic mean-field rollout.

Rolling-origin protocol

Backtests are run with rolling train/test folds:

  1. fit on y[:, :train_end]

  2. forecast horizon steps ahead with Monte Carlo paths

  3. score on held-out y[:, train_end:train_end+horizon]

  4. repeat with a configurable rolling step

This is implemented via:

  • motac.eval.backtest_fit_forecast_nll

  • motac.eval.run_backtest_report

Probabilistic forecasts

The model forecast path now supports sampling future count trajectories:

  • forecast_count_paths_horizon returns sampled paths and latent intensity paths.

  • forecast_probabilistic_horizon returns paths plus mean/quantile summaries.

Key outputs:

  • forecast mean counts

  • quantile envelopes (default q=(0.05, 0.5, 0.95))

  • fold-level and aggregate metrics

Metrics

Backtests report:

  • Negative log-likelihood (NLL)

  • RMSE

  • MAE

  • empirical interval coverage

Baselines

Reports include simple benchmark baselines:

  • last_value

  • seasonal_naive

  • moving_average

These provide context for model gains and avoid single-model performance claims.

Reproducible artifacts

run_backtest_report writes:

  • report.json

  • fold metric figure

  • baseline comparison figure

The report structure is designed for machine-readable comparison across Chicago and ACLED benchmark runs.