# Substrate The substrate is the road-constrained spatial scaffold used by models and simulators: - a road graph (OSMnx GraphML) - a regular grid (cell centroids) - sparse travel-time neighbourhoods between grid cells - optional POI feature matrix aligned to the grid ## Building Use `SubstrateBuilder(SubstrateConfig(...)).build()` (or the CLI wrapper) to build and optionally cache artefacts. `SubstrateConfig` supports three ways to specify the region: - `bbox=(north,south,east,west)` via the individual fields `north/south/east/west` - `place="..."` (OSM place query) - `graphml_path="..."` (offline / tests) ## Cache artefacts If `cache_dir` is set, the builder writes a self-contained cache directory: - `graph.graphml` - `grid.npz` - `neighbours.npz` - `meta.json` - `poi.npz` (optional) ### Cache contents (v2) `graph.graphml` : Road network saved via `osmnx.save_graphml`. `grid.npz` : A compressed NumPy archive with: - `lat`: `float64`, shape `(n_cells,)` — grid cell centroid latitudes (EPSG:4326) - `lon`: `float64`, shape `(n_cells,)` — grid cell centroid longitudes (EPSG:4326) - `cell_size_m`: `float64`, shape `(1,)` — grid spacing in metres `neighbours.npz` : A SciPy sparse matrix saved via `scipy.sparse.save_npz`. - matrix type: CSR (`.tocsr()` on load) - shape: `(n_cells, n_cells)` - entries: `travel_time_s[i, j]` = shortest-path travel time (seconds) from cell `i` to `j`, for all `j` reachable within `max_travel_time_s` (plus the diagonal) `poi.npz` (optional) : Present when POIs are enabled. - `x`: `float64`, shape `(n_cells, n_features)` — POI feature matrix aligned to the grid - `feature_names`: `object` array of strings, length `n_features` `meta.json` : Human-readable cache metadata. Keys: - `cache_format_version` (int) - `built_at_utc` (UTC timestamp, `YYYY-MM-DDTHH:MM:SSZ`) - `motac_version` (string) - `config` (dict) — normalized subset of `SubstrateConfig` fields - `graphml_path` (string) — path of the GraphML used by the cache (when cached, this points at `cache_dir/graph.graphml`) - `has_poi` (bool) ### Cache format versioning The cache includes `meta.json["cache_format_version"]`. The loader accepts supported versions (currently v1 and v2) and raises for unsupported versions. ### Loading a cached directory (with version validation) #### Minimal example: load cache + validate version {#cache-load-validate-version} `SubstrateBuilder.build()` validates the cache format version automatically, but if you want an explicit, human-readable guard before doing any work you can read `meta.json` yourself: ```python from __future__ import annotations import json from pathlib import Path from motac.spatial import SubstrateBuilder, SubstrateConfig cache_dir = Path("./cache/camden") meta = json.loads((cache_dir / "meta.json").read_text()) # Fast fail if the on-disk cache is from an unsupported format. if meta.get("cache_format_version") not in SubstrateBuilder.SUPPORTED_CACHE_FORMAT_VERSIONS: raise ValueError( "Unsupported substrate cache format version: " f"{meta.get('cache_format_version')} (supported {SubstrateBuilder.SUPPORTED_CACHE_FORMAT_VERSIONS})" ) # Loads graph.graphml, grid.npz, neighbours.npz (and optionally poi.npz). substrate = SubstrateBuilder(SubstrateConfig(cache_dir=str(cache_dir))).build() print(substrate.grid.lat.shape, substrate.neighbours.travel_time_s.shape) ``` If `cache_dir` already contains a cache directory, `SubstrateBuilder.build()` loads it and validates `cache_format_version` automatically: ```python from motac.spatial import SubstrateBuilder, SubstrateConfig # Point at a previously built cache directory containing: # graph.graphml, grid.npz, neighbours.npz, meta.json (and optionally poi.npz) cache_dir = "./cache/camden" try: substrate = SubstrateBuilder(SubstrateConfig(cache_dir=cache_dir)).build() except ValueError as e: # Raised e.g. when meta.json["cache_format_version"] is unsupported. raise # Use the loaded substrate. print(substrate.grid.lat.shape, substrate.neighbours.travel_time_s.shape) ``` ## POI features If POIs are enabled, `Substrate.poi` is a `POIFeatures` object with: - `x`: shape `(n_cells, n_features)` - `feature_names`: list of feature names, aligned to columns of `x` By default we always include: - `poi_count`: total number of POIs assigned to each grid cell ### Tag/value breakout counts If `SubstrateConfig.poi_tags` is provided, we also add optional breakout count features based on POI properties: - If `{"amenity": True}` then we add a feature named `amenity` counting POIs with a non-null `amenity` property. - If `{"amenity": ["cafe", "restaurant"]}` then we add features named `amenity=cafe` and `amenity=restaurant`. This works both for OSM downloads (where those properties are columns in the GeoDataFrame) and for local GeoJSON inputs (as long as `properties` contain the relevant keys). ### Example config ```json { "place": "Camden, London, UK", "cell_size_m": 250.0, "max_travel_time_s": 900.0, "poi_tags": {"amenity": ["cafe", "restaurant"]}, "cache_dir": "./cache/camden" } ``` ### Travel-time-to-nearest-POI (min travel time) If `SubstrateConfig.poi_travel_time_features=true`, the builder appends travel-time-based features computed from the sparse neighbourhood matrix `neighbours.travel_time_s`: - `poi_min_travel_time_s`: minimum travel time (seconds) from each cell to *any* cell that contains at least one POI. If `poi_tags` defines breakout categories, we also add category-specific features using the same naming convention as the count breakouts: - `poi__min_travel_time_s` for `: True` (e.g. `poi_amenity_min_travel_time_s`) - `poi_=_min_travel_time_s` for `: [values]` (e.g. `poi_amenity=school_min_travel_time_s`) If no POI target is reachable for a cell within the cached neighbourhood cutoff, we set the feature to `max_travel_time_s`. --- ## API reference ```{eval-rst} .. automodule:: motac.spatial :members: :undoc-members: :show-inheritance: ``` ```{eval-rst} .. automodule:: motac.spatial.substrate :members: :undoc-members: :show-inheritance: ``` ```{eval-rst} .. automodule:: motac.spatial.types :members: :undoc-members: :show-inheritance: ```