Node-grounding pass: ground 39 causal-graph nodes via kg-microbe ontology index#104
Merged
Conversation
…logy index Matches residual causal-node labels against the kg-microbe merged-kg node index (CHEBI/GO/ENVO/PATO) by exact case-insensitive name/synonym, constrained to the prefix/category appropriate for each node_type, then runs ground_causal_nodes.py --apply. Node grounding rises 946 -> 985 / 1643 (58% -> 59%). 31 new mappings added to mappings/node_grounding.tsv: - CHEMICAL -> CHEBI: organic molecule, staphyloxanthin, bacteriochlorophyll, homogentisic acid, crystal violet, prodigiosin, dipicolinic acid, hydroxylamine, autoinducer, nitrogen oxides, adp, nadph, lipopolysaccharide, elemental sulfur, acetyl phosphate, coenzyme f430 - BIOLOGICAL_PROCESS/PATHWAY -> GO: cellular growth, melanin biosynthesis, tyrosine catabolism, cytokinesis, pigment biosynthesis, nutrient sensing, sulfur oxidation, formaldehyde assimilation, ribulose monophosphate cycle - CELLULAR_LOCALIZATION -> GO-CC: flagellar filament, flagellar hook, forespore - ENVIRONMENTAL_FACTOR -> PATO/ENVO: high temperature, high osmolarity, cold environment Each row carries the skos predicate_id strength (exactMatch / closeMatch). Rejected during review (not grounded): nutrient uptake (GO obsolete), spore coat->spore wall (distinct bacterial structures), plant tissue colonization and intracellular membrane (semantic mismatch), and charge/role- ambiguous chemicals (electron acceptor, pyocyanin). Residual: 658 node instances / 561 (label,type) keys remain — a long tail of idiosyncratic descriptive phrases (precursor metabolites, lateral cell-wall elongation, compatible-solute transport) and GENE_OR_PROTEIN labels needing UniProt lookups. Exact matching keeps precision high; raising recall would need label normalization or fuzzy matching (higher risk, deferred). validate-strict: 477 files, 0 errors. Grounding is idempotent. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Continues the grounding pipeline on the node side. Matches residual causal-node labels against the kg-microbe merged-kg node index (CHEBI/GO/ENVO/PATO) by exact case-insensitive name/synonym — constrained to the prefix/category appropriate for each
node_typeso types can't cross-map — then runs the write-safeguardedground_causal_nodes.py --apply.Node grounding: 946 → 985 / 1643 (58% → 59%).
31 new mappings (
mappings/node_grounding.tsv)Each row carries the skos
predicate_idstrength (exactMatch / closeMatch).Rejected during review (not grounded)
nutrient uptake→ GO:0009935 (obsolete GO term)spore coat→ GO:0031160 "spore wall" (distinct bacterial structures — coat ≠ wall)plant tissue colonization,intracellular membrane(semantic mismatch / too loose)electron acceptor,pyocyaninResidual (deferred)
658 node instances / 561 (label, type) keys remain — a long tail of idiosyncratic descriptive phrases (
precursor metabolites,lateral cell-wall elongation,compatible-solute transport) plus GENE_OR_PROTEIN labels needing UniProt lookups. Exact matching keeps precision high; raising recall would require label normalization or fuzzy matching (higher risk).Verification
just validate-strict: 477 files, 0 errors.🤖 Generated with Claude Code