# Likelihood and inference

This section details the likelihood used by the road-constrained Hawkes model
and the corresponding estimation objective used in `motac`.

## Intensity recap

We model discrete-time counts $y_{j,t}$ over cells $j=1..N$ and time steps
$t=1..T$. The conditional mean (intensity) is

$$
\lambda_{j,t} = \mu_j + \alpha \sum_{k \in \mathcal{N}(j)} W(d_{jk})\, h_{k,t}
\quad \text{with} \quad
h_{k,t} = \sum_{\ell=1}^{L} g(\ell)\, y_{k,t-\ell}.
$$

The lag kernel $g(\ell)$ is a fixed nonnegative vector (length $L$). The
road kernel $W(d)$ is a nonnegative travel-time weighting function (parametric
baseline: $W(d) = \exp(-\beta d)$).

## Poisson likelihood

Under the Poisson observation model,

$$
P(y_{j,t} \mid \lambda_{j,t}) = \frac{\lambda_{j,t}^{y_{j,t}}}{y_{j,t}!}\, e^{-\lambda_{j,t}}.
$$

The joint log-likelihood for a full count matrix $Y$ is

$$
\log p(Y \mid \mu, \alpha, \beta) = \sum_{j=1}^N \sum_{t=1}^T
\left(y_{j,t} \log \lambda_{j,t} - \lambda_{j,t} - \log(y_{j,t}!)\right).
$$

In code, this is implemented by `motac.models.likelihood.poisson_logpmf` and
summed over the grid.

## Negative Binomial likelihood (NB2)

For overdispersed data, we use the NB2 parameterisation with dispersion
$\kappa > 0$, so that

$$
\text{Var}[Y] = \lambda + \frac{\lambda^2}{\kappa}.
$$

The NB2 log-PMF for a single observation is

$$
\log p(y \mid \lambda, \kappa) =
\log \Gamma(y+\kappa) - \log \Gamma(\kappa) - \log(y!)
+ \kappa \log \left(\frac{\kappa}{\kappa+\lambda}\right)
+ y \log \left(\frac{\lambda}{\kappa+\lambda}\right).
$$

The full log-likelihood is the sum across cells and time. This is implemented
by `motac.models.likelihood.negbin_logpmf`.

## Maximum likelihood estimation

The parametric road-Hawkes fitter solves

$$
\max_{\mu, \alpha, \beta, \kappa} \; \log p(Y \mid \mu, \alpha, \beta, \kappa).
$$

We enforce positivity with a softplus transform on parameters. Optimization uses
L-BFGS-B on the unconstrained parameterisation. The fitted parameters are then
used for forecasting and backtesting.

## Regularization and stability guardrails

The fitter supports optional regularization on the baseline field `mu`:

- `mu_ridge`: isotropic L2 penalty on `mu`
- `mu_laplacian`: graph-smoothness penalty over the substrate adjacency

For conservative subcriticality diagnostics, `motac` computes a branching bound

$$
b = \alpha \cdot \sum_{\ell=1}^{L} g(\ell) \cdot \max_i \sum_j W_{ij}
$$

where `b < 1` is a sufficient (conservative) stability condition.

Fit-time handling is configurable with `stability_mode`:

- `off`: disable checks
- `warn`: emit warning if the conservative bound is not subcritical
- `penalty`: add smooth barrier penalty in the objective
- `reject`: strongly penalize supercritical parameter regions

## Smooth speed-of-movement gate

Besides the base exponential travel-time kernel, the model supports an optional
soft travel-time cutoff:

$$
W(d) = \exp(-\beta d)\;\sigma\!\left(\frac{\tau_{\max}-d}{s}\right)
$$

with threshold `tau_max = max_travel_time_s` and smoothness scale
`s = speed_gate_smoothness_s`.

This keeps optimization differentiable while encoding practical movement limits.

## Practical notes

- The fitted $\mu_j$ are per-cell baselines, so they scale with the spatial
  resolution (cell size) and the time bin width.
- The road kernel $W(d)$ is sparse because it respects the neighbourhood
  graph; this keeps computation feasible when the grid grows.
- When using a custom kernel function, you must return **nonnegative** weights
  with the same shape as the travel-time data. Use
  `motac.models.neural_kernels.validate_kernel_fn` to sanity check.