Likelihood and inference
This section details the likelihood used by the road-constrained Hawkes model
and the corresponding estimation objective used in motac.
Intensity recap
We model discrete-time counts \(y_{j,t}\) over cells \(j=1..N\) and time steps \(t=1..T\). The conditional mean (intensity) is
The lag kernel \(g(\ell)\) is a fixed nonnegative vector (length \(L\)). The road kernel \(W(d)\) is a nonnegative travel-time weighting function (parametric baseline: \(W(d) = \exp(-\beta d)\)).
Poisson likelihood
Under the Poisson observation model,
The joint log-likelihood for a full count matrix \(Y\) is
In code, this is implemented by motac.models.likelihood.poisson_logpmf and
summed over the grid.
Negative Binomial likelihood (NB2)
For overdispersed data, we use the NB2 parameterisation with dispersion \(\kappa > 0\), so that
The NB2 log-PMF for a single observation is
The full log-likelihood is the sum across cells and time. This is implemented
by motac.models.likelihood.negbin_logpmf.
Maximum likelihood estimation
The parametric road-Hawkes fitter solves
We enforce positivity with a softplus transform on parameters. Optimization uses L-BFGS-B on the unconstrained parameterisation. The fitted parameters are then used for forecasting and backtesting.
Regularization and stability guardrails
The fitter supports optional regularization on the baseline field mu:
mu_ridge: isotropic L2 penalty onmumu_laplacian: graph-smoothness penalty over the substrate adjacency
For conservative subcriticality diagnostics, motac computes a branching bound
where b < 1 is a sufficient (conservative) stability condition.
Fit-time handling is configurable with stability_mode:
off: disable checkswarn: emit warning if the conservative bound is not subcriticalpenalty: add smooth barrier penalty in the objectivereject: strongly penalize supercritical parameter regions
Smooth speed-of-movement gate
Besides the base exponential travel-time kernel, the model supports an optional soft travel-time cutoff:
with threshold tau_max = max_travel_time_s and smoothness scale
s = speed_gate_smoothness_s.
This keeps optimization differentiable while encoding practical movement limits.
Practical notes
The fitted \(\mu_j\) are per-cell baselines, so they scale with the spatial resolution (cell size) and the time bin width.
The road kernel \(W(d)\) is sparse because it respects the neighbourhood graph; this keeps computation feasible when the grid grows.
When using a custom kernel function, you must return nonnegative weights with the same shape as the travel-time data. Use
motac.models.neural_kernels.validate_kernel_fnto sanity check.