From 84ca69b314566438ae1798f91c61776079fc7d9d Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 24 Apr 2026 07:59:29 -0700
Subject: [PATCH 1/4] Add QUERY_SPEC.md v0.1 (draft)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Substrate-neutral query contract spanning DuckDB-WASM (web), DuckDB/Ibis
(Python), and Apache Solr (legacy). Names mirror the Solr schema
vocabulary (authoritative precedent) with substrate-specific aliases
provided in §5.

Scope:
- Canonical facet / filter dimensions (§2)
- Abstract filter grammar (§3)
- Full-text search semantics (§3.2, the 16-field Solr searchText target)
- Sample-card projection (§4.2)
- Substrate binding tables (§5)
- Open questions for v0.2 (§7)

Out of scope: PQG graph traversal (see QUERY_COMPARISON.md), bulk
export, ingestion.

Refs isamplesorg.github.io#138.
---
 query-spec.qmd | 344 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 344 insertions(+)
 create mode 100644 query-spec.qmd

diff --git a/query-spec.qmd b/query-spec.qmd
new file mode 100644
index 0000000..073e31f
--- /dev/null
+++ b/query-spec.qmd
@@ -0,0 +1,344 @@
+---
+title: "iSamples Query Specification"
+subtitle: "A substrate-neutral contract for searching and filtering iSamples data"
+author: "iSamples team"
+date: today
+toc: true
+sidebar: false
+categories: [spec, architecture, query]
+---
+
+::: {.callout-warning}
+## Draft — v0.1
+
+This is a skeleton. Field inventories are drawn from the Solr schema
+(authoritative precedent) and the PQG metadata model, but gaps remain.
+Comments and PRs welcome — see [issue tracker][issues].
+
+[issues]: https://github.com/isamplesorg/isamplesorg.github.io/issues
+:::
+
+## 1. Purpose and scope {#sec-scope}
+
+iSamples data is reached today through at least three substrates — and
+potentially more in the future:
+
+- **DuckDB-WASM against parquet** (this website's Interactive Explorer)
+- **DuckDB / Ibis against parquet** (the Python client and notebooks)
+- **Apache Solr** (legacy iSamples Central; potentially revived)
+
+Each substrate has its own query dialect. Users and maintainers shouldn't
+have to relearn the facet vocabulary, the text-search semantics, or the
+spatial filter grammar when moving between them. This document specifies
+a **substrate-neutral query model** that each implementation can bind to.
+
+**What this spec covers:**
+
+- Canonical facet / filter dimensions and their names
+- Filter grammar (an abstract syntax, not a wire format)
+- Full-text search semantics (which fields participate)
+- Spatial and temporal primitives
+- Sample-card projection (what a clicked sample returns)
+- Substrate binding tables (spec → DuckDB, spec → Solr)
+
+**What it does NOT cover:**
+
+- PQG graph traversal queries (edge walking, multi-hop joins). See
+  [QUERY_COMPARISON.md][qc] in the monorepo root for that work and the
+  Eric-vs-Observable alignment notes.
+- Bulk export / download mechanics. See [how-to-use](how-to-use.qmd).
+- Ingestion and metadata normalization.
+
+[qc]: https://github.com/isamplesorg/isamplesorg.github.io/blob/main/QUERY_COMPARISON.md
+
+**Normative precedent.** Where this spec names a field, the name mirrors
+the iSamples metadata model's dotted-path form as used in the Solr schema
+(`isamples_inabox/solr_schema_init/create_isb_core_schema.py`), because
+that's the most complete, externally-documented query vocabulary the
+project has shipped. Aliases for substrate-specific naming are provided
+in §5.
+
+## 2. Canonical dimensions {#sec-dimensions}
+
+A **dimension** is an attribute of a material sample record that users
+filter, facet, or search on. Every binding (§5) must provide at least
+the **required** dimensions.
+
+### 2.1 Identity and provenance
+
+| Dimension | Type | Required | Solr field | PQG path | Notes |
+|---|---|---|---|---|---|
+| `pid` | string | ✅ | `id` | `MaterialSampleRecord.pid` | Primary key |
+| `source` | enum | ✅ | `source` | `MaterialSampleRecord.source_name` | `SESAR\|OPENCONTEXT\|GEOME\|SMITHSONIAN` |
+| `label` | string | ✅ | `label` | `MaterialSampleRecord.label` | Display name |
+| `description` | text | ✅ | `description` | `MaterialSampleRecord.description` | Free text |
+| `registrant` | string | | `registrant` | `MaterialSampleRecord.registrant` | Who registered |
+| `sourceUpdatedTime` | instant | | `sourceUpdatedTime` | — | Freshness |
+
+### 2.2 Classification (the four facets)
+
+| Dimension | Type | Required | Solr field | PQG path |
+|---|---|---|---|---|
+| `material` | enum | ✅ | `hasMaterialCategory` | `MaterialSampleRecord.has_material_category.label` |
+| `context` | enum | ✅ | `hasContextCategory` | `MaterialSampleRecord.has_context_category.label` |
+| `specimen` | enum | ⚠️ (see below) | `hasSpecimenCategory` | `MaterialSampleRecord.has_specimen_category.label` |
+| `keywords` | multi-string | | `keywords` | `MaterialSampleRecord.keywords[]` |
+| `informalClassification` | multi-string | | `informalClassification` | `MaterialSampleRecord.informal_classification[]` |
+
+::: {.callout-note}
+`specimen` (**hasSpecimenCategory**) is in the blessed Solr vocabulary
+but is **not currently exposed** in the web Explorer. Adding it is on
+the P1 stack.
+:::
+
+Each of these has a paired **confidence** field (`…Confidence`, `pfloat`)
+in Solr. The spec allows filters to reference confidence (e.g.
+`material.confidence >= 0.8`) but implementations MAY omit if the
+substrate doesn't carry the field.
+
+### 2.3 Sampling event and site
+
+| Dimension | Type | Solr field | PQG path |
+|---|---|---|---|
+| `resultTime` | instant | `producedBy_resultTime` (`pdate`) | `SamplingEvent.result_time` |
+| `resultTimeRange` | interval | `producedBy_resultTimeRange` (`date_range`) | derived |
+| `samplingPurpose` | string | `samplingPurpose` | `SamplingEvent.sampling_purpose` |
+| `featureOfInterest` | string | `producedBy_hasFeatureOfInterest` | `SamplingEvent.has_feature_of_interest` |
+| `responsibility` | multi-string | `producedBy_responsibility` | `SamplingEvent.responsibility[]` |
+| `siteLabel` | string | `producedBy_samplingSite_label` | `SamplingSite.label` |
+| `siteDescription` | text | `producedBy_samplingSite_description` | `SamplingSite.description` |
+| `placeName` | string | `producedBy_samplingSite_placeName` | `SamplingSite.place_name[]` |
+| `elevation` | float | `producedBy_samplingSite_location_elevationInMeters` | `GeospatialCoordLocation.elevation` |
+
+### 2.4 Spatial {#sec-spatial}
+
+| Dimension | Type | Solr field | PQG path |
+|---|---|---|---|
+| `latitude` | float | `producedBy_samplingSite_location_latitude` | `GeospatialCoordLocation.latitude` |
+| `longitude` | float | `producedBy_samplingSite_location_longitude` | `GeospatialCoordLocation.longitude` |
+| `bbox` | bbox | `producedBy_samplingSite_location_bb` | derived |
+| `h3[resN]` | h3-index | `producedBy_samplingSite_location_h3_{0..13}` | `samples_wide.h3_res{N}` |
+
+**H3 tier convention.** Resolutions 4, 6, and 8 are the spec-recommended
+tier breakpoints for zoom-adaptive visualization. Other resolutions MAY
+be materialized but 4/6/8 are load-bearing.
+
+### 2.5 Curation
+
+| Dimension | Type | Solr field |
+|---|---|---|
+| `curationLocation` | string | `curation_location` |
+| `curationResponsibility` | string | `curation_responsibility` |
+| `curationAccessConstraints` | string | `curation_accessContraints` |
+
+## 3. Filter grammar {#sec-grammar}
+
+A query is a conjunction (AND) of filters. Each binding is responsible
+for translating the abstract filter into its dialect.
+
+### 3.1 Filter primitives
+
+```text
+Filter       := FieldFilter | TextFilter | SpatialFilter | TemporalFilter
+
+FieldFilter  := dim  IN  (value, ...)
+              | dim  =   value
+              | dim  >=  value        ( numeric / date only )
+              | dim  <=  value
+              | dim  CONTAINS  token  ( multi-string / keywords )
+
+TextFilter   := text MATCHES  "phrase"
+
+SpatialFilter:= bbox WITHIN  (min_lat, min_lon, max_lat, max_lon)
+              | h3   AT RES n  IN  (h3_cell, ...)
+
+TemporalFilter
+             := time BETWEEN  t1  AND  t2
+              | time_range OVERLAPS  (t1, t2)
+```
+
+### 3.2 Full-text search semantics {#sec-text}
+
+`text MATCHES "phrase"` searches the aggregate of these fields (the
+Solr `searchText` copy-field target, canonical list):
+
+- `source`, `label`, `description`
+- `keywords`, `informalClassification`
+- `producedBy_label`, `producedBy_description`, `producedBy_hasFeatureOfInterest`,
+  `producedBy_responsibility`
+- `producedBy_samplingSite_label`, `producedBy_samplingSite_description`,
+  `producedBy_samplingSite_placeName`
+- `registrant`, `samplingPurpose`
+- `curation_label`, `curation_description`, `curation_location`
+
+Substrates that can't index all 16 fields MUST document which subset
+they cover and surface the limitation in UI. (The current web Explorer
+covers `label` + `description` + `place_name` only — a known gap.)
+
+Multi-term queries default to **AND** with relevance ranking where the
+substrate supports it (Solr, DuckDB FTS). See PR #95 for web-side FTS
+work.
+
+### 3.3 Cross-filter counts
+
+A faceted UI exposing a dimension SHOULD show, next to each facet value,
+the count of records matching **the current query *excluding* that
+dimension's own filter**. This lets users see the effect of selecting
+additional values without shrinking the list to zero.
+
+Substrates may pre-compute these counts (see
+`isamples_202601_facet_cross_filter.parquet` for the single-filter
+cache) or compute them on the fly.
+
+## 4. Result projections {#sec-projections}
+
+### 4.1 Map / globe point
+
+Minimum projection for a point on a map:
+
+```
+{ pid, label, source, latitude, longitude }
+```
+
+This is what the web Explorer's "lite parquet" already provides.
+
+### 4.2 Sample card
+
+Projection for a clicked / selected sample:
+
+```
+{
+  pid, label, source,
+  description,
+  latitude, longitude, placeName, elevation,
+  material, context, specimen, keywords,
+  resultTime, samplingPurpose,
+  registrant, responsibility,
+  curationLocation, curationResponsibility,
+  sourceRecordURL,
+  thumbnailURL            // via per-source sidecar; see issue #131
+}
+```
+
+Fields MAY be null. The sample card UI in every binding SHOULD handle
+missing values gracefully.
+
+### 4.3 Facet counts
+
+```
+{ dimension, value, count }[]
+```
+
+## 5. Substrate bindings {#sec-bindings}
+
+### 5.1 DuckDB-WASM on parquet (web)
+
+| Spec | Binding |
+|---|---|
+| `source IN (…)` | `source IN (…)` on wide / lite parquet |
+| `material IN (…)` | `pid IN (SELECT pid FROM sample_facets WHERE material IN (…))` |
+| `text MATCHES "q"` | `(label ILIKE '%q%' OR description ILIKE '%q%' OR place_name ILIKE '%q%')` — currently a subset of §3.2 |
+| `bbox WITHIN (…)` | `latitude BETWEEN … AND … AND longitude BETWEEN … AND …` |
+| `h3 AT RES 6 IN (…)` | `h3_res6 IN (…)` on H3-annotated parquet |
+| `time BETWEEN …` | TBD — `producedBy_resultTime` not yet in lite parquet |
+
+**Canonical data URL base**: `https://data.isamples.org/` (Cloudflare
+Worker in front of the R2 bucket). Two layers:
+
+- **Versioned** `/isamples_YYYYMM_<file>.parquet` — 1-yr immutable cache,
+  safe to pin in papers, spec examples, or reproducibility notebooks.
+- **Alias** `/current/<alias>` — 302 redirect with 5-minute cache; tracks
+  whatever the latest snapshot is. Use for "always fresh" consumers.
+
+Never reference the raw `pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/...`
+URL — it bypasses the Worker and defeats the alias layer.
+
+Data files: see [catalog in how-to-use](how-to-use.qmd#data-files).
+
+### 5.2 DuckDB / Ibis on parquet (Python)
+
+| Spec | Binding |
+|---|---|
+| Same DuckDB SQL as §5.1 | Same URLs under `https://data.isamples.org/` |
+| Ibis expressions | `t.source.isin([...])` and so on |
+
+See `isamples-python/examples/basic/isamples_explorer.ipynb` for the
+reference implementation. A `isamples_query.py` module extracting the
+filter builder is planned.
+
+### 5.3 Apache Solr (if Central returns)
+
+| Spec | Binding |
+|---|---|
+| `source IN (a, b)` | `fq=source:(a OR b)` |
+| `material IN (…)` | `fq=hasMaterialCategory:(…)` |
+| `text MATCHES "q"` | `q=searchText:q` (relevance-ranked by default) |
+| `bbox WITHIN (…)` | `fq={!field f=producedBy_samplingSite_location_rpt}Intersects(ENVELOPE(...))` |
+| `time BETWEEN …` | `fq=producedBy_resultTime:[t1 TO t2]` |
+| `time_range OVERLAPS (…)` | `fq=producedBy_resultTimeRange:[t1 TO t2]` — date_range field |
+
+See `isamples_inabox/isb_web/isb_solr_query.py` for the full client.
+
+## 6. Versioning and compatibility {#sec-versioning}
+
+This spec uses semantic-ish versioning:
+
+- **Major** (1.0, 2.0): new required dimensions, renames, or grammar
+  changes that break existing clients.
+- **Minor** (0.2, 0.3): new optional dimensions, clarifications,
+  additional binding rows.
+- **Patch**: typo fixes.
+
+Breaking changes MUST be accompanied by a migration note and a sunset
+window for the prior spec version.
+
+## 7. Open questions (for v0.2) {#sec-open}
+
+1. **Specimen filter in the web Explorer.** Canonical vocabulary is
+   `hasSpecimenCategory`. Which display labels should the UI use?
+2. **Time filter in lite parquet.** `producedBy_resultTime` is not yet
+   in the lite parquet; decide whether to add it or query the wide
+   parquet on demand.
+3. **Text-search field coverage** in the web Explorer (currently 3 of
+   16). Which of the remaining 13 are worth indexing in a browser
+   FTS? See PR #95.
+4. **Cross-filter cache shape** for multi-dimension filter combinations
+   (current cache handles single-filter only).
+5. **Confidence thresholds** — should the spec define a default for
+   `*.Confidence` fields, or leave it per-client?
+6. **H3 tier breakpoints** — when filters are active, what zoom level
+   triggers the switch from H3 clusters to individual points? The web
+   Explorer currently uses ~120 km; the Python notebooks use viewport
+   bounding box size.
+7. **Sample-card thumbnail provenance** — see issue #131 and the
+   sidecar pattern memo.
+
+## Appendix A. Metadata model at a glance
+
+iSamples treats these as the core entity types (domain-agnostic):
+
+- `MaterialSampleRecord` — the sample itself
+- `SamplingEvent` — the act of collection
+- `SamplingSite` — the place
+- `GeospatialCoordLocation` — lat/lon/elevation
+- `MaterialSampleCuration` — curation metadata
+- `IdentifiedConcept` — vocabulary terms (materials, contexts, specimens)
+- `Agent` — people / institutions
+
+The canonical UML is in the
+[isamplesorg-metadata](https://github.com/isamplesorg/metadata) repo.
+PQG (the parquet property-graph binding) is specified in
+[`pqg/docs/PQG_SPECIFICATION.md`](https://github.com/isamplesorg/isamples-python/blob/main/pqg/docs/PQG_SPECIFICATION.md).
+
+## Appendix B. Related documents
+
+- `QUERY_COMPARISON.md` — PQG traversal query alignment (Eric's Python
+  vs. the Observable JS, Oct 2025)
+- `test_cesium_queries.js`, `test_python_js_alignment.py` — alignment
+  test harness at the monorepo root
+- [Interactive Explorer](tutorials/progressive_globe.qmd) — the reference
+  web UI
+- `isamples-python/examples/basic/isamples_explorer.ipynb` — the
+  reference Python UI
+- `isamples_inabox/solr_schema_init/create_isb_core_schema.py` — the
+  authoritative Solr schema

From 8faa4db7c0da8b093f8742d61e7ee7b8a4efcb06 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 24 Apr 2026 08:02:18 -0700
Subject: [PATCH 2/4] Apply QUERY_SPEC v0.2 amendments from PQG conformance
 matrix
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Amendments informed by isamplesorg/pqg#22 (conformance_matrix.md §4-§5),
which audited which shipped parquet files actually carry which spec
dimensions:

1. Rename `specimen` → `objectType` (§2.2). Every shipped parquet uses
   `object_type` / `hasSampleObjectType`; adopt the data-side name as
   canonical, keep `hasSpecimenCategory` as Solr alias.
2. Drop ghosts: `informalClassification` (§2.2) and `resultTimeRange`
   (§2.3) — both were in Solr but never migrated to any parquet. Also
   drop `time_range OVERLAPS` from §3.1 grammar and §5.3 Solr binding.
3. Add `thumbnailURL` to §2.1 as optional (ships in `wide` today for
   OpenContext only; moving to per-source sidecars — issue #131).
4. Update §5.1 `time BETWEEN` binding from "TBD" to real DuckDB cast:
   `TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2`. `result_time`
   IS in lite (as VARCHAR).
5. Document H3 column availability in §2.4: `wide_h3` and
   `h3_summary_res{4,6,8}` carry res 4/6/8; `lite` has res 8 only;
   plain `wide` / `narrow` carry no H3 columns.
6. Pick `tmodified` (INTEGER epoch) over `last_modified_time` (VARCHAR)
   for `sourceUpdatedTime` in §2.1; alias the VARCHAR as deprecated.
7. Bump version callout to v0.2.
8. §7 open questions: close Q2 (time filter in lite — now resolved);
   reframe Q1 around the new `objectType` naming.
9. Appendix B: reference conformance_matrix.md and SERIALIZATIONS.md
   (pqg#143) as companion documents.

Refs isamplesorg/pqg#22, isamplesorg.github.io#138.
---
 query-spec.qmd | 131 ++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 98 insertions(+), 33 deletions(-)

diff --git a/query-spec.qmd b/query-spec.qmd
index 073e31f..acd1126 100644
--- a/query-spec.qmd
+++ b/query-spec.qmd
@@ -9,13 +9,17 @@ categories: [spec, architecture, query]
 ---
 
 ::: {.callout-warning}
-## Draft — v0.1
+## Draft — v0.2
 
-This is a skeleton. Field inventories are drawn from the Solr schema
-(authoritative precedent) and the PQG metadata model, but gaps remain.
-Comments and PRs welcome — see [issue tracker][issues].
+Field inventories are drawn from the Solr schema (authoritative
+precedent) and the PQG metadata model. v0.2 incorporates findings from
+the [PQG conformance matrix][cmatrix] (which parquet files actually
+carry which dimensions) to resolve naming drift, drop ghosts, and
+tighten substrate bindings. Comments and PRs welcome — see
+[issue tracker][issues].
 
 [issues]: https://github.com/isamplesorg/isamplesorg.github.io/issues
+[cmatrix]: https://github.com/isamplesorg/pqg/blob/main/docs/conformance_matrix.md
 :::
 
 ## 1. Purpose and scope {#sec-scope}
@@ -73,7 +77,16 @@ the **required** dimensions.
 | `label` | string | ✅ | `label` | `MaterialSampleRecord.label` | Display name |
 | `description` | text | ✅ | `description` | `MaterialSampleRecord.description` | Free text |
 | `registrant` | string | | `registrant` | `MaterialSampleRecord.registrant` | Who registered |
-| `sourceUpdatedTime` | instant | | `sourceUpdatedTime` | — | Freshness |
+| `sourceUpdatedTime` | instant | | `sourceUpdatedTime` | `MaterialSampleRecord.tmodified` | Freshness; bind to `tmodified` (INTEGER epoch) — see note below |
+| `thumbnailURL` | string | | — | `MaterialSampleRecord.thumbnail_url` | Optional; shipped in `wide` today (OpenContext only). Expected to move to per-source sidecars over time (see §4.2 sample card, issue #131) |
+
+::: {.callout-note}
+**`sourceUpdatedTime` binding**: the `wide` parquet ships both
+`last_modified_time` (VARCHAR) and `tmodified` (INTEGER unix epoch).
+v0.2 picks `tmodified` as canonical because epoch is easier to filter
+and sort; `last_modified_time` is kept as a deprecated alias for
+backwards compatibility and will be removed in a future major release.
+:::
 
 ### 2.2 Classification (the four facets)
 
@@ -81,14 +94,28 @@ the **required** dimensions.
 |---|---|---|---|---|
 | `material` | enum | ✅ | `hasMaterialCategory` | `MaterialSampleRecord.has_material_category.label` |
 | `context` | enum | ✅ | `hasContextCategory` | `MaterialSampleRecord.has_context_category.label` |
-| `specimen` | enum | ⚠️ (see below) | `hasSpecimenCategory` | `MaterialSampleRecord.has_specimen_category.label` |
+| `objectType` | enum | ⚠️ (see below) | `hasSampleObjectType` (alias `hasSpecimenCategory`) | `MaterialSampleRecord.has_sample_object_type.label` |
 | `keywords` | multi-string | | `keywords` | `MaterialSampleRecord.keywords[]` |
-| `informalClassification` | multi-string | | `informalClassification` | `MaterialSampleRecord.informal_classification[]` |
 
 ::: {.callout-note}
-`specimen` (**hasSpecimenCategory**) is in the blessed Solr vocabulary
-but is **not currently exposed** in the web Explorer. Adding it is on
-the P1 stack.
+**Naming resolution (v0.2)**: v0.1 named this dimension `specimen` with
+Solr field `hasSpecimenCategory`. Every shipped parquet file uses
+`object_type` / `hasSampleObjectType`. v0.2 adopts the data-side name
+(`objectType`) as canonical and keeps `hasSpecimenCategory` as a Solr
+alias. See [PQG conformance matrix §3.2][cmatrix-3-2] for the audit
+that prompted this rename.
+
+`objectType` is in the blessed vocabulary but is **not currently
+exposed** in the web Explorer. Adding it is on the P1 stack.
+
+[cmatrix-3-2]: https://github.com/isamplesorg/pqg/blob/main/docs/conformance_matrix.md#32-classification-query_spec-22
+:::
+
+::: {.callout-note}
+**Dropped from v0.2**: `informalClassification` was named in v0.1 but
+no shipped parquet file carries it (it was a Solr-era remnant). It is
+removed from the canonical dimension list until/unless the pipeline
+adds it.
 :::
 
 Each of these has a paired **confidence** field (`…Confidence`, `pfloat`)
@@ -101,7 +128,6 @@ substrate doesn't carry the field.
 | Dimension | Type | Solr field | PQG path |
 |---|---|---|---|
 | `resultTime` | instant | `producedBy_resultTime` (`pdate`) | `SamplingEvent.result_time` |
-| `resultTimeRange` | interval | `producedBy_resultTimeRange` (`date_range`) | derived |
 | `samplingPurpose` | string | `samplingPurpose` | `SamplingEvent.sampling_purpose` |
 | `featureOfInterest` | string | `producedBy_hasFeatureOfInterest` | `SamplingEvent.has_feature_of_interest` |
 | `responsibility` | multi-string | `producedBy_responsibility` | `SamplingEvent.responsibility[]` |
@@ -110,6 +136,13 @@ substrate doesn't carry the field.
 | `placeName` | string | `producedBy_samplingSite_placeName` | `SamplingSite.place_name[]` |
 | `elevation` | float | `producedBy_samplingSite_location_elevationInMeters` | `GeospatialCoordLocation.elevation` |
 
+::: {.callout-note}
+**Dropped from v0.2**: `resultTimeRange` (Solr `producedBy_resultTimeRange`,
+a `date_range` field) was named in v0.1 but no shipped parquet carries
+an interval type. It was a Solr-era remnant that never migrated. Query
+a `resultTime` range with `time BETWEEN t1 AND t2` (§3.1) instead.
+:::
+
 ### 2.4 Spatial {#sec-spatial}
 
 | Dimension | Type | Solr field | PQG path |
@@ -123,6 +156,21 @@ substrate doesn't carry the field.
 tier breakpoints for zoom-adaptive visualization. Other resolutions MAY
 be materialized but 4/6/8 are load-bearing.
 
+::: {.callout-important}
+**H3 column availability across shipped parquet files (v0.2)**:
+
+- `wide_h3` and the `h3_summary_res{4,6,8}` tier files carry
+  `h3_res4`, `h3_res6`, `h3_res8`.
+- `lite` carries `h3_res8` (and `h3_res8_hex`) only — not res4 / res6.
+- Plain `wide` and `narrow` do **not** carry H3 columns. To filter at
+  res 4 or res 6, query `wide_h3` or the appropriate `h3_summary`
+  tier file.
+
+See [PQG conformance matrix §3.4][cmatrix-3-4] for the full table.
+
+[cmatrix-3-4]: https://github.com/isamplesorg/pqg/blob/main/docs/conformance_matrix.md#34-spatial-query_spec-24
+:::
+
 ### 2.5 Curation
 
 | Dimension | Type | Solr field |
@@ -154,7 +202,6 @@ SpatialFilter:= bbox WITHIN  (min_lat, min_lon, max_lat, max_lon)
 
 TemporalFilter
              := time BETWEEN  t1  AND  t2
-              | time_range OVERLAPS  (t1, t2)
 ```
 
 ### 3.2 Full-text search semantics {#sec-text}
@@ -163,7 +210,7 @@ TemporalFilter
 Solr `searchText` copy-field target, canonical list):
 
 - `source`, `label`, `description`
-- `keywords`, `informalClassification`
+- `keywords`
 - `producedBy_label`, `producedBy_description`, `producedBy_hasFeatureOfInterest`,
   `producedBy_responsibility`
 - `producedBy_samplingSite_label`, `producedBy_samplingSite_description`,
@@ -171,7 +218,7 @@ Solr `searchText` copy-field target, canonical list):
 - `registrant`, `samplingPurpose`
 - `curation_label`, `curation_description`, `curation_location`
 
-Substrates that can't index all 16 fields MUST document which subset
+Substrates that can't index all 15 fields MUST document which subset
 they cover and surface the limitation in UI. (The current web Explorer
 covers `label` + `description` + `place_name` only — a known gap.)
 
@@ -211,12 +258,13 @@ Projection for a clicked / selected sample:
   pid, label, source,
   description,
   latitude, longitude, placeName, elevation,
-  material, context, specimen, keywords,
+  material, context, objectType, keywords,
   resultTime, samplingPurpose,
   registrant, responsibility,
   curationLocation, curationResponsibility,
   sourceRecordURL,
-  thumbnailURL            // via per-source sidecar; see issue #131
+  thumbnailURL            // see §2.1; ships in `wide` today (OpenContext
+                          // only), moving to per-source sidecars — issue #131
 }
 ```
 
@@ -239,8 +287,8 @@ missing values gracefully.
 | `material IN (…)` | `pid IN (SELECT pid FROM sample_facets WHERE material IN (…))` |
 | `text MATCHES "q"` | `(label ILIKE '%q%' OR description ILIKE '%q%' OR place_name ILIKE '%q%')` — currently a subset of §3.2 |
 | `bbox WITHIN (…)` | `latitude BETWEEN … AND … AND longitude BETWEEN … AND …` |
-| `h3 AT RES 6 IN (…)` | `h3_res6 IN (…)` on H3-annotated parquet |
-| `time BETWEEN …` | TBD — `producedBy_resultTime` not yet in lite parquet |
+| `h3 AT RES 6 IN (…)` | `h3_res6 IN (…)` on `wide_h3` or `h3_summary_res6` (see §2.4 note) |
+| `time BETWEEN …` | `TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2` — `result_time` ships as VARCHAR in `lite`, `wide`, and `narrow` |
 
 **Canonical data URL base**: `https://data.isamples.org/` (Cloudflare
 Worker in front of the R2 bucket). Two layers:
@@ -275,7 +323,6 @@ filter builder is planned.
 | `text MATCHES "q"` | `q=searchText:q` (relevance-ranked by default) |
 | `bbox WITHIN (…)` | `fq={!field f=producedBy_samplingSite_location_rpt}Intersects(ENVELOPE(...))` |
 | `time BETWEEN …` | `fq=producedBy_resultTime:[t1 TO t2]` |
-| `time_range OVERLAPS (…)` | `fq=producedBy_resultTimeRange:[t1 TO t2]` — date_range field |
 
 See `isamples_inabox/isb_web/isb_solr_query.py` for the full client.
 
@@ -292,27 +339,39 @@ This spec uses semantic-ish versioning:
 Breaking changes MUST be accompanied by a migration note and a sunset
 window for the prior spec version.
 
-## 7. Open questions (for v0.2) {#sec-open}
-
-1. **Specimen filter in the web Explorer.** Canonical vocabulary is
-   `hasSpecimenCategory`. Which display labels should the UI use?
-2. **Time filter in lite parquet.** `producedBy_resultTime` is not yet
-   in the lite parquet; decide whether to add it or query the wide
-   parquet on demand.
-3. **Text-search field coverage** in the web Explorer (currently 3 of
-   16). Which of the remaining 13 are worth indexing in a browser
-   FTS? See PR #95.
-4. **Cross-filter cache shape** for multi-dimension filter combinations
+## 7. Open questions (for v0.3) {#sec-open}
+
+1. **`objectType` filter in the web Explorer.** Canonical vocabulary is
+   now `hasSampleObjectType` (resolved in v0.2; see §2.2). The
+   `sample_facets_v2` parquet carries `object_type` as a denormalized
+   URI string, so binding is straightforward. Which display labels
+   should the UI surface, and should `object_type` be added to `lite`
+   so specimen-type filters don't require a second file fetch?
+2. **Text-search field coverage** in the web Explorer (currently 3 of
+   15 post-v0.2). Which of the remaining 12 are worth indexing in a
+   browser FTS? See PR #95.
+3. **Cross-filter cache shape** for multi-dimension filter combinations
    (current cache handles single-filter only).
-5. **Confidence thresholds** — should the spec define a default for
+4. **Confidence thresholds** — should the spec define a default for
    `*.Confidence` fields, or leave it per-client?
-6. **H3 tier breakpoints** — when filters are active, what zoom level
+5. **H3 tier breakpoints** — when filters are active, what zoom level
    triggers the switch from H3 clusters to individual points? The web
    Explorer currently uses ~120 km; the Python notebooks use viewport
    bounding box size.
-7. **Sample-card thumbnail provenance** — see issue #131 and the
+6. **Sample-card thumbnail provenance** — `thumbnail_url` is now named
+   in §2.1 (v0.2) but lives in `wide` and is populated only for
+   OpenContext. Move to per-source sidecars per issue #131 / the
    sidecar pattern memo.
 
+### Questions resolved in v0.2
+
+- ~~**Specimen vs. objectType naming**~~ — resolved: adopt data-side
+  name `objectType` (Solr `hasSampleObjectType`) as canonical. See
+  §2.2 and conformance matrix §3.2.
+- ~~**Time filter in lite parquet**~~ — resolved: `result_time` is
+  already present in `lite` (as VARCHAR). §5.1 binding now shows the
+  DuckDB cast.
+
 ## Appendix A. Metadata model at a glance
 
 iSamples treats these as the core entity types (domain-agnostic):
@@ -332,6 +391,12 @@ PQG (the parquet property-graph binding) is specified in
 
 ## Appendix B. Related documents
 
+- [`pqg/docs/conformance_matrix.md`](https://github.com/isamplesorg/pqg/blob/main/docs/conformance_matrix.md)
+  — which shipped parquet files cover which QUERY_SPEC dimensions
+  (companion to this spec; informed every v0.2 amendment)
+- [`pqg/docs/SERIALIZATIONS.md`](https://github.com/isamplesorg/pqg/pull/143)
+  — the three canonical parquet formats (export / narrow / wide) and
+  how they round-trip
 - `QUERY_COMPARISON.md` — PQG traversal query alignment (Eric's Python
   vs. the Observable JS, Oct 2025)
 - `test_cesium_queries.js`, `test_python_js_alignment.py` — alignment

From da2a71362fb5c2131272fe2babf909bc0b2963fc Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 24 Apr 2026 08:31:47 -0700
Subject: [PATCH 3/4] =?UTF-8?q?fix(query-spec):=20Codex=20review=20?=
 =?UTF-8?q?=E2=80=94=20h3=5Fsummary=20column=20names,=20SERIALIZATIONS=20l?=
 =?UTF-8?q?ink?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two issues from Codex review:

1. **§2.4 callout wrong about h3_summary schema**: the previous text
   said the summary tier files carry `h3_res4`, `h3_res6`, `h3_res8`.
   They don't — they ship `h3_cell` (UBIGINT) + `resolution` (INTEGER)
   and filter by resolution. Corrected the callout and the §5.1
   DuckDB binding row to show the actual form
   (`h3_cell IN (...) AND resolution = 6`).

2. **Appendix B wrong link target**: the SERIALIZATIONS.md reference
   pointed at `isamplesorg/pqg/pull/143`, but the catalog PR is
   `isamplesorg/isamplesorg.github.io#143`. Fixed.
---
 query-spec.qmd | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/query-spec.qmd b/query-spec.qmd
index acd1126..92bc92c 100644
--- a/query-spec.qmd
+++ b/query-spec.qmd
@@ -159,8 +159,10 @@ be materialized but 4/6/8 are load-bearing.
 ::: {.callout-important}
 **H3 column availability across shipped parquet files (v0.2)**:
 
-- `wide_h3` and the `h3_summary_res{4,6,8}` tier files carry
-  `h3_res4`, `h3_res6`, `h3_res8`.
+- `wide_h3` ships three direct columns: `h3_res4`, `h3_res6`, `h3_res8`.
+- `h3_summary_res{4,6,8}` tier files do NOT ship `h3_res{N}` columns —
+  they ship a single `h3_cell` (UBIGINT) plus a `resolution` (INTEGER)
+  column. Query them as `WHERE h3_cell = X AND resolution = N`.
 - `lite` carries `h3_res8` (and `h3_res8_hex`) only — not res4 / res6.
 - Plain `wide` and `narrow` do **not** carry H3 columns. To filter at
   res 4 or res 6, query `wide_h3` or the appropriate `h3_summary`
@@ -287,7 +289,7 @@ missing values gracefully.
 | `material IN (…)` | `pid IN (SELECT pid FROM sample_facets WHERE material IN (…))` |
 | `text MATCHES "q"` | `(label ILIKE '%q%' OR description ILIKE '%q%' OR place_name ILIKE '%q%')` — currently a subset of §3.2 |
 | `bbox WITHIN (…)` | `latitude BETWEEN … AND … AND longitude BETWEEN … AND …` |
-| `h3 AT RES 6 IN (…)` | `h3_res6 IN (…)` on `wide_h3` or `h3_summary_res6` (see §2.4 note) |
+| `h3 AT RES 6 IN (…)` | `h3_res6 IN (…)` on `wide_h3`; OR `h3_cell IN (…) AND resolution = 6` on `h3_summary_res6` (see §2.4 note) |
 | `time BETWEEN …` | `TRY_CAST(result_time AS TIMESTAMP) BETWEEN t1 AND t2` — `result_time` ships as VARCHAR in `lite`, `wide`, and `narrow` |
 
 **Canonical data URL base**: `https://data.isamples.org/` (Cloudflare
@@ -394,7 +396,7 @@ PQG (the parquet property-graph binding) is specified in
 - [`pqg/docs/conformance_matrix.md`](https://github.com/isamplesorg/pqg/blob/main/docs/conformance_matrix.md)
   — which shipped parquet files cover which QUERY_SPEC dimensions
   (companion to this spec; informed every v0.2 amendment)
-- [`pqg/docs/SERIALIZATIONS.md`](https://github.com/isamplesorg/pqg/pull/143)
+- [`SERIALIZATIONS.md`](https://github.com/isamplesorg/isamplesorg.github.io/pull/143) (catalog of shipped parquet files, in `isamplesorg.github.io`)
   — the three canonical parquet formats (export / narrow / wide) and
   how they round-trip
 - `QUERY_COMPARISON.md` — PQG traversal query alignment (Eric's Python

From c962f4a096d56d95281bf60256ce0674240d0ad7 Mon Sep 17 00:00:00 2001
From: Raymond Yee <raymond.yee@gmail.com>
Date: Fri, 24 Apr 2026 08:54:37 -0700
Subject: [PATCH 4/4] fix(query-spec): source dimension column is 'n' on
 wide/narrow
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Codex round-2: §5.1 DuckDB binding claimed `source IN (…)` binds to
`source IN (…) on wide / lite parquet`. Wrong for wide — it uses `n`
(PQG convention), not `source`. The query as written fails with
"Referenced column source not found".

Updated the binding row to distinguish:
  wide / narrow: WHERE n IN (…)
  lite / sample_facets_v2: WHERE source IN (…) — alias already exposed
---
 query-spec.qmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/query-spec.qmd b/query-spec.qmd
index 92bc92c..675f248 100644
--- a/query-spec.qmd
+++ b/query-spec.qmd
@@ -285,7 +285,7 @@ missing values gracefully.
 
 | Spec | Binding |
 |---|---|
-| `source IN (…)` | `source IN (…)` on wide / lite parquet |
+| `source IN (…)` | `n IN (…)` on wide / narrow (column is `n` per PQG); `source IN (…)` on lite / sample_facets_v2 (alias exposed) |
 | `material IN (…)` | `pid IN (SELECT pid FROM sample_facets WHERE material IN (…))` |
 | `text MATCHES "q"` | `(label ILIKE '%q%' OR description ILIKE '%q%' OR place_name ILIKE '%q%')` — currently a subset of §3.2 |
 | `bbox WITHIN (…)` | `latitude BETWEEN … AND … AND longitude BETWEEN … AND …` |