Skip to content

feat(generators): add --default-language flag for language-tagged literals#9

Open
jdsika wants to merge 1 commit intodevelopfrom
feat/generators-default-language
Open

feat(generators): add --default-language flag for language-tagged literals#9
jdsika wants to merge 1 commit intodevelopfrom
feat/generators-default-language

Conversation

@jdsika
Copy link
Copy Markdown

@jdsika jdsika commented Apr 25, 2026

Summary

Adds a --default-language CLI option to both gen-owl and gen-shacl that emits BCP 47 language-tagged string literals (e.g. "Person"@en) for human-readable annotations.

This enables ontology producers to comply with RDF 1.1 §3.3 (language-tagged strings as rdf:langString) and OWL 2 §6.3 (annotation property values) without manual post-processing.

Problem

LinkML generators currently emit all string literals as plain xsd:string values, even for human-readable annotations like rdfs:label, rdfs:comment, sh:name, and sh:description. This prevents downstream consumers from:

  • Filtering labels by language in SPARQL (FILTER(lang(?label) = "en"))
  • Supporting multilingual ontologies
  • Complying with W3C best practices for language-tagged metadata

The LinkML metamodel already has an in_language metaslot, but no generator uses it.

Changes

gen-owl (owlgen.py)

  • New default_language field on OwlSchemaGenerator
  • _LANGUAGE_TAGGABLE_RANGES frozenset (string, ncname) guards tagging — technical types (URI, integer, boolean, datetime) are never tagged
  • _resolve_language() resolves element-level in_language → generator-level default_languageNone
  • _literal() helper creates properly tagged Literal objects
  • add_metadata() tags string-range and fallback-range annotation literals
  • add_enum() PV labels respect language tags (constraint values in owl:oneOf are correctly NOT tagged)
  • New --default-language Click CLI option

gen-shacl (shaclgen.py)

  • New default_language field with __post_init__ whitespace normalisation
  • NodeShape rdfs:label / rdfs:comment get language tags
  • PropertyShape sh:name / sh:description get language tags via prop_pv_text()
  • _add_annotations() tags string annotation values
  • Numeric literals (sh:order, sh:minCount, etc.) are never tagged
  • New --default-language Click CLI option

Tests

  • 7 new OWL tests: tagged labels, backward-compat plain literals, URI ranges, in_language override, annotations, empty string, whitespace-only
  • 7 new SHACL tests: NodeShape, PropertyShape, plain literals, numeric guards, annotations, empty string, whitespace-only

Backward compatibility

  • Default is None (no language tags) — existing behaviour is completely unchanged
  • Empty strings and whitespace-only values are normalised to None

Standards compliance

Standard Requirement Status
RDF 1.1 §3.3 rdf:langString vs xsd:string distinction
OWL 2 §6.3 Annotation properties accept rdf:langString
SHACL §2.3.2.1 sh:name / sh:description range includes rdf:langString
BCP 47 (RFC 5646) Language tag format ✅ (no pre-validation, consistent with rdflib)

Testing

  • 113 tests pass (91 owlgen + 22 shaclgen), 4 skipped
  • 3 rounds of adversarial review completed with 0 open bugs

@jdsika jdsika force-pushed the feat/generators-default-language branch 3 times, most recently from 06ac9fc to 97b9619 Compare April 27, 2026 06:54
…erals

Add a `--default-language` CLI option to both gen-owl and gen-shacl that
emits BCP 47 language-tagged string literals for human-readable annotations.

gen-owl changes:
- New `default_language` field on OwlSchemaGenerator
- `_LANGUAGE_TAGGABLE_RANGES` frozenset (string, ncname) guards tagging
- `_resolve_language()` checks element-level in_language first, then default
- `_literal()` helper creates properly tagged Literal objects
- `add_metadata()` tags string-range and fallback-range literals
- `add_enum()` PV labels respect language tags
- New `--default-language` Click option

gen-shacl changes:
- New `default_language` field on ShaclGenerator
- NodeShape rdfs:label / rdfs:comment get language tags
- PropertyShape sh:name / sh:description get language tags via prop_pv_text()
- Numeric literals (sh:order, sh:minCount, etc.) are never tagged
- New `--default-language` Click option

Tests:
- 3 new OWL tests: tagged labels, backward-compat plain literals, URI ranges
- 4 new SHACL tests: NodeShape, PropertyShape, plain literals, numeric guard

Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
@jdsika jdsika force-pushed the feat/generators-default-language branch from 97b9619 to f54897c Compare April 27, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant