# Likelihood and inference This section details the likelihood used by the road-constrained Hawkes model and the corresponding estimation objective used in `motac`. ## Intensity recap We model discrete-time counts $y_{j,t}$ over cells $j=1..N$ and time steps $t=1..T$. The conditional mean (intensity) is $$ \lambda_{j,t} = \mu_j + \alpha \sum_{k \in \mathcal{N}(j)} W(d_{jk})\, h_{k,t} \quad \text{with} \quad h_{k,t} = \sum_{\ell=1}^{L} g(\ell)\, y_{k,t-\ell}. $$ The lag kernel $g(\ell)$ is a fixed nonnegative vector (length $L$). The road kernel $W(d)$ is a nonnegative travel-time weighting function (parametric baseline: $W(d) = \exp(-\beta d)$). ## Poisson likelihood Under the Poisson observation model, $$ P(y_{j,t} \mid \lambda_{j,t}) = \frac{\lambda_{j,t}^{y_{j,t}}}{y_{j,t}!}\, e^{-\lambda_{j,t}}. $$ The joint log-likelihood for a full count matrix $Y$ is $$ \log p(Y \mid \mu, \alpha, \beta) = \sum_{j=1}^N \sum_{t=1}^T \left(y_{j,t} \log \lambda_{j,t} - \lambda_{j,t} - \log(y_{j,t}!)\right). $$ In code, this is implemented by `motac.models.likelihood.poisson_logpmf` and summed over the grid. ## Negative Binomial likelihood (NB2) For overdispersed data, we use the NB2 parameterisation with dispersion $\kappa > 0$, so that $$ \text{Var}[Y] = \lambda + \frac{\lambda^2}{\kappa}. $$ The NB2 log-PMF for a single observation is $$ \log p(y \mid \lambda, \kappa) = \log \Gamma(y+\kappa) - \log \Gamma(\kappa) - \log(y!) + \kappa \log \left(\frac{\kappa}{\kappa+\lambda}\right) + y \log \left(\frac{\lambda}{\kappa+\lambda}\right). $$ The full log-likelihood is the sum across cells and time. This is implemented by `motac.models.likelihood.negbin_logpmf`. ## Maximum likelihood estimation The parametric road-Hawkes fitter solves $$ \max_{\mu, \alpha, \beta, \kappa} \; \log p(Y \mid \mu, \alpha, \beta, \kappa). $$ We enforce positivity with a softplus transform on parameters. Optimization uses L-BFGS-B on the unconstrained parameterisation. The fitted parameters are then used for forecasting and backtesting. ## Regularization and stability guardrails The fitter supports optional regularization on the baseline field `mu`: - `mu_ridge`: isotropic L2 penalty on `mu` - `mu_laplacian`: graph-smoothness penalty over the substrate adjacency For conservative subcriticality diagnostics, `motac` computes a branching bound $$ b = \alpha \cdot \sum_{\ell=1}^{L} g(\ell) \cdot \max_i \sum_j W_{ij} $$ where `b < 1` is a sufficient (conservative) stability condition. Fit-time handling is configurable with `stability_mode`: - `off`: disable checks - `warn`: emit warning if the conservative bound is not subcritical - `penalty`: add smooth barrier penalty in the objective - `reject`: strongly penalize supercritical parameter regions ## Smooth speed-of-movement gate Besides the base exponential travel-time kernel, the model supports an optional soft travel-time cutoff: $$ W(d) = \exp(-\beta d)\;\sigma\!\left(\frac{\tau_{\max}-d}{s}\right) $$ with threshold `tau_max = max_travel_time_s` and smoothness scale `s = speed_gate_smoothness_s`. This keeps optimization differentiable while encoding practical movement limits. ## Practical notes - The fitted $\mu_j$ are per-cell baselines, so they scale with the spatial resolution (cell size) and the time bin width. - The road kernel $W(d)$ is sparse because it respects the neighbourhood graph; this keeps computation feasible when the grid grows. - When using a custom kernel function, you must return **nonnegative** weights with the same shape as the travel-time data. Use `motac.models.neural_kernels.validate_kernel_fn` to sanity check.