feat(generators): add --default-language flag for language-tagged literals#9
Open
feat(generators): add --default-language flag for language-tagged literals#9
Conversation
06ac9fc to
97b9619
Compare
…erals Add a `--default-language` CLI option to both gen-owl and gen-shacl that emits BCP 47 language-tagged string literals for human-readable annotations. gen-owl changes: - New `default_language` field on OwlSchemaGenerator - `_LANGUAGE_TAGGABLE_RANGES` frozenset (string, ncname) guards tagging - `_resolve_language()` checks element-level in_language first, then default - `_literal()` helper creates properly tagged Literal objects - `add_metadata()` tags string-range and fallback-range literals - `add_enum()` PV labels respect language tags - New `--default-language` Click option gen-shacl changes: - New `default_language` field on ShaclGenerator - NodeShape rdfs:label / rdfs:comment get language tags - PropertyShape sh:name / sh:description get language tags via prop_pv_text() - Numeric literals (sh:order, sh:minCount, etc.) are never tagged - New `--default-language` Click option Tests: - 3 new OWL tests: tagged labels, backward-compat plain literals, URI ranges - 4 new SHACL tests: NodeShape, PropertyShape, plain literals, numeric guard Signed-off-by: Carlo van Driesten <carlo.van-driesten@bmw.de>
97b9619 to
f54897c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
--default-languageCLI option to bothgen-owlandgen-shaclthat emits BCP 47 language-tagged string literals (e.g."Person"@en) for human-readable annotations.This enables ontology producers to comply with RDF 1.1 §3.3 (language-tagged strings as
rdf:langString) and OWL 2 §6.3 (annotation property values) without manual post-processing.Problem
LinkML generators currently emit all string literals as plain
xsd:stringvalues, even for human-readable annotations likerdfs:label,rdfs:comment,sh:name, andsh:description. This prevents downstream consumers from:FILTER(lang(?label) = "en"))The LinkML metamodel already has an
in_languagemetaslot, but no generator uses it.Changes
gen-owl (
owlgen.py)default_languagefield onOwlSchemaGenerator_LANGUAGE_TAGGABLE_RANGESfrozenset (string,ncname) guards tagging — technical types (URI, integer, boolean, datetime) are never tagged_resolve_language()resolves element-levelin_language→ generator-leveldefault_language→None_literal()helper creates properly taggedLiteralobjectsadd_metadata()tags string-range and fallback-range annotation literalsadd_enum()PV labels respect language tags (constraint values inowl:oneOfare correctly NOT tagged)--default-languageClick CLI optiongen-shacl (
shaclgen.py)default_languagefield with__post_init__whitespace normalisationrdfs:label/rdfs:commentget language tagssh:name/sh:descriptionget language tags viaprop_pv_text()_add_annotations()tags string annotation valuessh:order,sh:minCount, etc.) are never tagged--default-languageClick CLI optionTests
in_languageoverride, annotations, empty string, whitespace-onlyBackward compatibility
None(no language tags) — existing behaviour is completely unchangedNoneStandards compliance
rdf:langStringvsxsd:stringdistinctionrdf:langStringsh:name/sh:descriptionrange includesrdf:langStringTesting