Skip to content

fix: add W3C/OGC standard prefixes to linked_data curated context#81

Open
jdsika wants to merge 1 commit intolinkml:mainfrom
ASCS-eV:fix/add-w3c-semweb-prefixes
Open

fix: add W3C/OGC standard prefixes to linked_data curated context#81
jdsika wants to merge 1 commit intolinkml:mainfrom
ASCS-eV:fix/add-w3c-semweb-prefixes

Conversation

@jdsika
Copy link
Copy Markdown

@jdsika jdsika commented Apr 29, 2026

Summary

Continuation of PR #71 / issue #70 — adds 5 more W3C/OGC standard semantic-web prefixes to linked_data.curated.yaml to prevent bioregistry.upper from overriding them with uppercase or non-standard names in the merged context.

Problem

When the merged context is built (merge order: obo → go → linked_data → bioregistry.upper → prefixcc), several widely-used W3C/OGC semantic-web prefixes get incorrect canonical names because bioregistry.upper either:

  1. Uppercases them (ODRL, TIME, WGS84) — problematic because rdflib prefix bindings are case-sensitive
  2. Renames them (dctypes instead of dcmitype)
  3. Omits them entirely (xml)

PR #71 fixed rdf and rdfs using the established pattern of adding them to linked_data.curated.yaml. This PR extends the same approach to the remaining 5 affected prefixes.

What this PR changes

New entries in linked_data.curated.yaml / linked_data.csv:

Prefix Namespace Without this fix With this fix
odrl http://www.w3.org/ns/odrl/2/ ODRL (uppercase) odrl
time http://www.w3.org/2006/time# TIME (uppercase) time
dcmitype http://purl.org/dc/dcmitype/ dctypes (wrong name) dcmitype
wgs http://www.w3.org/2003/01/geo/wgs84_pos# WGS84 (uppercase) wgs
xml http://www.w3.org/XML/1998/namespace (missing entirely) xml

New regression tests:

  • test_w3c_semweb_prefixes_in_linked_data — verifies the 5 prefixes resolve correctly in the linked_data context
  • test_w3c_semweb_prefixes_in_dyn_merged — verifies they survive dynamic merge (load_multi_context) with correct lowercase canonical forms

Evidence for canonical lowercase forms

Each prefix's canonical form is documented by its defining W3C/OGC specification:

  • odrlW3C ODRL Vocabulary §2 uses @prefix odrl: <http://www.w3.org/ns/odrl/2/> throughout the spec. Bioregistry itself stores preferred_prefix: odrl (lowercase).
  • timeW3C OWL-Time §2 states: "The suggested prefix for the OWL-Time namespace is time."
  • dcmityperdflib 7.x ships dcmitype as its built-in prefix for http://purl.org/dc/dcmitype/. prefix.cc/dcmitype confirms this. Bioregistry uses the non-standard name dctypes.
  • wgs — The W3C Basic Geo Vocabulary uses xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" in its own examples. rdflib 7.x ships wgs as the built-in prefix.
  • xmlhttp://www.w3.org/XML/1998/namespace is one of only two namespaces reserved by the XML specification itself. rdflib 7.x binds it as xml by default. It was absent from all prefixmaps contexts.

Cross-reference with rdflib 7.x:

All 5 prefixes match the default bindings from rdflib.Graph().namespaces() (rdflib 7.6.0). Of the 29 rdflib built-in prefixes, 21 already matched the merged context. After PR #71 (rdf, rdfs) and this PR (+5), only 1 discrepancy remains: geo (see below).

Known limitation: geo (GeoSPARQL)

The OGC GeoSPARQL 1.1 spec states "The suggested prefix for this namespace is geo" (for http://www.opengis.net/ont/geosparql#), and rdflib 7.x agrees. However, this cannot be fixed via linked_data because the obo context (which has higher merge priority) already claims GEO for Gene Expression Omnibus. This is a known architectural conflict documented in EXPECTED_OBO in the test suite.

Action needed from maintainers

merged.csv is not regenerated in this commit (would require running the full ETL pipeline with bioregistry dependency). Please run make etl or trigger the refresh workflow to propagate the linked_data changes into the merged context CSV.

Closes #70

Continuation of PR linkml#71 and issue linkml#70.

When the `merged` context is built, `bioregistry.upper` promotes
several well-known W3C/OGC semantic-web prefix names to UPPERCASE
(ODRL, TIME, WGS84) or assigns non-standard names (`dctypes` instead
of `dcmitype`).  The `xml` prefix has no entry at all.

PR linkml#71 fixed this for `rdf` and `rdfs` by adding them to
`linked_data.curated.yaml` (which has higher merge priority than
`bioregistry.upper`).  This commit extends the same approach to the
remaining affected prefixes:

| Prefix    | Without fix       | Authoritative source            |
|-----------|-------------------|---------------------------------|
| odrl      | ODRL (uppercase)  | W3C ODRL Vocabulary §2          |
| time      | TIME (uppercase)  | W3C OWL-Time §2                 |
| dcmitype  | dctypes (renamed) | rdflib 7.x built-in namespace   |
| wgs       | WGS84 (uppercase) | W3C Basic Geo Vocabulary xmlns  |
| xml       | (missing)         | W3C XML Namespace               |

Evidence for the canonical lowercase forms:

* **ODRL** – W3C Recommendation (https://www.w3.org/TR/odrl-vocab/)
  uses `@prefix odrl: <http://www.w3.org/ns/odrl/2/>` throughout.
  Bioregistry itself stores `preferred_prefix: odrl` (lowercase).
* **OWL-Time** – W3C Recommendation (https://www.w3.org/TR/owl-time/)
  states "The suggested prefix for the OWL-Time namespace is `time`."
* **dcmitype** – rdflib 7.x ships `dcmitype` as the built-in prefix
  for `http://purl.org/dc/dcmitype/`.  prefix.cc also uses
  `dcmitype`.  Bioregistry uses the non-standard name `dctypes`.
* **wgs** – The W3C Basic Geo Vocabulary page
  (https://www.w3.org/2003/01/geo/) uses
  `xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"` in its own
  examples; rdflib 7.x ships `wgs` as the built-in prefix.
* **xml** – `http://www.w3.org/XML/1998/namespace` is one of only two
  namespaces reserved by the XML specification itself.  rdflib 7.x
  binds it as `xml` by default.  It was absent from all prefixmaps
  contexts.

Note: `geo` (GeoSPARQL, `http://www.opengis.net/ont/geosparql#`)
is also affected (no canonical entry in merged), but cannot be fixed
here because the `obo` context already claims the `GEO` prefix for
Gene Expression Omnibus and has higher merge priority.  This is tracked
as a known conflict.

Note: `merged.csv` is not regenerated in this commit.  Please run
`make etl` (or the `refresh` workflow) to propagate the linked_data
changes into the merged context CSV.

Closes linkml#70

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
Copy link
Copy Markdown

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like all of these changes. I will assign someone from the core team to contemplate consequences.

BTW, no action needed but generally this is where we try to consolidate prefixes: https://semantic.farm/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bioregistry changes have broken the merged prefix map

3 participants