# Model overview

`motac` implements a **road-constrained, discrete-time Hawkes process** for event
counts on a grid laid over a road network. The model decomposes into:

1. A **substrate** defining spatial cells and travel-time neighbours.
2. A **Hawkes intensity** driven by lagged counts and travel-time decay.
3. A **count likelihood** (Poisson or Negative Binomial).
4. Optional **observation noise** for generating observed counts from latent
   intensities in simulation workflows.

Throughout, let $y_{j,t}$ be the count of events in cell $j$ during time bin $t$.

## Substrate (road constraints)

The road network induces a travel-time distance $d_{jk}$ between grid cells
$j$ and $k$ (e.g. shortest-path travel time). We only consider neighbours within
a cutoff, defining a sparse neighbourhood set $\mathcal{N}(j)$ and a sparse
travel-time matrix.

This structure is used to build a **nonnegative travel-time kernel**
$W(d_{jk})$, producing a sparse influence matrix that respects the road network
connectivity.

## Hawkes intensity (discrete time)

The parametric model defines the intensity (conditional mean) for each cell:

$$
\lambda_{j,t} = \mu_j + \alpha \sum_{k \in \mathcal{N}(j)} W(d_{jk})\, h_{k,t}
$$

with the lagged history term

$$
h_{k,t} = \sum_{\ell=1}^{L} g(\ell)\, y_{k,t-\ell}
$$

where:

- $\mu_j \ge 0$ is the baseline intensity per cell,
- $\alpha \ge 0$ scales self- and cross-excitation,
- $g(\ell)$ is a nonnegative lag kernel over discrete lags,
- $W(d_{jk})$ downweights excitation by road travel-time distance.

In the parametric baseline, $W(d) = \exp(-\beta d)$, with $\beta > 0$ controlling
decay. The code also supports swapping in custom kernel functions $W(d)$ (e.g.
neural or alternative deterministic kernels) while preserving the same
likelihood.

## Count likelihood

Given the intensity, counts are modelled as either:

- **Poisson:** $y_{j,t} \sim \text{Poisson}(\lambda_{j,t})$.
- **Negative Binomial (NB2):** $y_{j,t} \sim \text{NegBin}(\text{mean}=\lambda_{j,t}, \text{dispersion}=\kappa)$.

The NB2 parameterisation used throughout has variance
$\text{Var}[Y] = \lambda + \lambda^2 / \kappa$, where larger $\kappa$ approaches
the Poisson case.

## Simulation and observation noise

Simulation utilities generate latent counts from the Hawkes recursion and can
optionally apply observation noise (detection probability + false positives) to
produce observed counts. This is primarily used for synthetic evaluation and
parameter recovery.

## Forecasting and evaluation

Forecasting rolls the intensity forward one step at a time using the fitted
parameters and the latest observed history. Evaluation utilities support
backtests (train window + held-out horizon) and log-likelihood / RMSE / MAE
scoring for quick sanity checks.

The package also provides a probabilistic forecast interface via Monte Carlo
path sampling (`forecast_probabilistic_horizon`) and rolling-origin backtests
with baseline comparisons.