Skip to content

Sidecar rollout: per-source thumbnails + license + is_public fields #140

@rdhyee

Description

@rdhyee

Context

Raymond endorsed the sidecar enrichment pattern on 2026-04-17: per-source parquet sidecars keyed by pid, LEFT-JOINed into the wide parquet at build time. This keeps source-specific enrichment out of the canonical PQG pipeline while making it queryable from a single wide table.

Unlike a thumbnail-only sidecar, the schema should carry richer fields that the Explorer and downstream consumers (e.g., Charismatic samples audit #130) actually need.

Sidecar schema (minimum)

Field Type Notes
pid string Join key
thumbnail_url string Image URL for Explorer card & homepage showcase
license string SPDX or source-specific license identifier
is_public bool Whether record can be surfaced publicly (embargo handling)
media_url string Full-res media, if distinct from thumbnail
harvested_at timestamp When sidecar was generated (provenance)

Priority order (per Raymond)

  1. Smithsonian IPT (partner-first — relationship-driven priority)
  2. GEOME
  3. SESAR
  4. OpenContext — already done (reference implementation)

Acceptance

Downstream consumers

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions