Likelihood and inference

This section details the likelihood used by the road-constrained Hawkes model and the corresponding estimation objective used in motac.

Intensity recap

We model discrete-time counts \(y_{j,t}\) over cells \(j=1..N\) and time steps \(t=1..T\). The conditional mean (intensity) is

\[ \lambda_{j,t} = \mu_j + \alpha \sum_{k \in \mathcal{N}(j)} W(d_{jk})\, h_{k,t} \quad \text{with} \quad h_{k,t} = \sum_{\ell=1}^{L} g(\ell)\, y_{k,t-\ell}. \]

The lag kernel \(g(\ell)\) is a fixed nonnegative vector (length \(L\)). The road kernel \(W(d)\) is a nonnegative travel-time weighting function (parametric baseline: \(W(d) = \exp(-\beta d)\)).

Poisson likelihood

Under the Poisson observation model,

\[ P(y_{j,t} \mid \lambda_{j,t}) = \frac{\lambda_{j,t}^{y_{j,t}}}{y_{j,t}!}\, e^{-\lambda_{j,t}}. \]

The joint log-likelihood for a full count matrix \(Y\) is

\[ \log p(Y \mid \mu, \alpha, \beta) = \sum_{j=1}^N \sum_{t=1}^T \left(y_{j,t} \log \lambda_{j,t} - \lambda_{j,t} - \log(y_{j,t}!)\right). \]

In code, this is implemented by motac.models.likelihood.poisson_logpmf and summed over the grid.

Negative Binomial likelihood (NB2)

For overdispersed data, we use the NB2 parameterisation with dispersion \(\kappa > 0\), so that

\[ \text{Var}[Y] = \lambda + \frac{\lambda^2}{\kappa}. \]

The NB2 log-PMF for a single observation is

\[ \log p(y \mid \lambda, \kappa) = \log \Gamma(y+\kappa) - \log \Gamma(\kappa) - \log(y!) + \kappa \log \left(\frac{\kappa}{\kappa+\lambda}\right) + y \log \left(\frac{\lambda}{\kappa+\lambda}\right). \]

The full log-likelihood is the sum across cells and time. This is implemented by motac.models.likelihood.negbin_logpmf.

Maximum likelihood estimation

The parametric road-Hawkes fitter solves

\[ \max_{\mu, \alpha, \beta, \kappa} \; \log p(Y \mid \mu, \alpha, \beta, \kappa). \]

We enforce positivity with a softplus transform on parameters. Optimization uses L-BFGS-B on the unconstrained parameterisation. The fitted parameters are then used for forecasting and backtesting.

Regularization and stability guardrails

The fitter supports optional regularization on the baseline field mu:

mu_ridge: isotropic L2 penalty on mu
mu_laplacian: graph-smoothness penalty over the substrate adjacency

For conservative subcriticality diagnostics, motac computes a branching bound

\[ b = \alpha \cdot \sum_{\ell=1}^{L} g(\ell) \cdot \max_i \sum_j W_{ij} \]

where b < 1 is a sufficient (conservative) stability condition.

Fit-time handling is configurable with stability_mode:

off: disable checks
warn: emit warning if the conservative bound is not subcritical
penalty: add smooth barrier penalty in the objective
reject: strongly penalize supercritical parameter regions

Smooth speed-of-movement gate

Besides the base exponential travel-time kernel, the model supports an optional soft travel-time cutoff:

\[ W(d) = \exp(-\beta d)\;\sigma\!\left(\frac{\tau_{\max}-d}{s}\right) \]

with threshold tau_max = max_travel_time_s and smoothness scale s = speed_gate_smoothness_s.

This keeps optimization differentiable while encoding practical movement limits.

Practical notes

The fitted \(\mu_j\) are per-cell baselines, so they scale with the spatial resolution (cell size) and the time bin width.
The road kernel \(W(d)\) is sparse because it respects the neighbourhood graph; this keeps computation feasible when the grid grows.
When using a custom kernel function, you must return nonnegative weights with the same shape as the travel-time data. Use motac.models.neural_kernels.validate_kernel_fn to sanity check.