Skip to content

Node-grounding pass: ground 39 causal-graph nodes via kg-microbe ontology index#104

Merged
realmarcin merged 1 commit into
mainfrom
claude/node-grounding-pass
Jun 14, 2026
Merged

Node-grounding pass: ground 39 causal-graph nodes via kg-microbe ontology index#104
realmarcin merged 1 commit into
mainfrom
claude/node-grounding-pass

Conversation

@realmarcin

Copy link
Copy Markdown
Contributor

What

Continues the grounding pipeline on the node side. Matches residual causal-node labels against the kg-microbe merged-kg node index (CHEBI/GO/ENVO/PATO) by exact case-insensitive name/synonym — constrained to the prefix/category appropriate for each node_type so types can't cross-map — then runs the write-safeguarded ground_causal_nodes.py --apply.

Node grounding: 946 → 985 / 1643 (58% → 59%).

31 new mappings (mappings/node_grounding.tsv)

node_type targets
CHEMICAL → CHEBI organic molecule, staphyloxanthin, bacteriochlorophyll, homogentisic acid, crystal violet, prodigiosin, dipicolinic acid, hydroxylamine, autoinducer, nitrogen oxides, adp, nadph, lipopolysaccharide, elemental sulfur, acetyl phosphate, coenzyme f430
BIOLOGICAL_PROCESS / PATHWAY → GO cellular growth, melanin biosynthesis, tyrosine catabolism, cytokinesis, pigment biosynthesis, nutrient sensing, sulfur oxidation, formaldehyde assimilation, ribulose monophosphate cycle
CELLULAR_LOCALIZATION → GO-CC flagellar filament, flagellar hook, forespore
ENVIRONMENTAL_FACTOR → PATO/ENVO high temperature, high osmolarity, cold environment

Each row carries the skos predicate_id strength (exactMatch / closeMatch).

Rejected during review (not grounded)

  • nutrient uptake → GO:0009935 (obsolete GO term)
  • spore coat → GO:0031160 "spore wall" (distinct bacterial structures — coat ≠ wall)
  • plant tissue colonization, intracellular membrane (semantic mismatch / too loose)
  • charge/role-ambiguous chemicals: electron acceptor, pyocyanin

Residual (deferred)

658 node instances / 561 (label, type) keys remain — a long tail of idiosyncratic descriptive phrases (precursor metabolites, lateral cell-wall elongation, compatible-solute transport) plus GENE_OR_PROTEIN labels needing UniProt lookups. Exact matching keeps precision high; raising recall would require label normalization or fuzzy matching (higher risk).

Verification

  • just validate-strict: 477 files, 0 errors.
  • Idempotent (re-run grounds 0).
  • Matching done against the real ontology index, not from memory.

🤖 Generated with Claude Code

…logy index

Matches residual causal-node labels against the kg-microbe merged-kg node
index (CHEBI/GO/ENVO/PATO) by exact case-insensitive name/synonym, constrained
to the prefix/category appropriate for each node_type, then runs
ground_causal_nodes.py --apply. Node grounding rises 946 -> 985 / 1643 (58% -> 59%).

31 new mappings added to mappings/node_grounding.tsv:
- CHEMICAL -> CHEBI: organic molecule, staphyloxanthin, bacteriochlorophyll,
  homogentisic acid, crystal violet, prodigiosin, dipicolinic acid,
  hydroxylamine, autoinducer, nitrogen oxides, adp, nadph, lipopolysaccharide,
  elemental sulfur, acetyl phosphate, coenzyme f430
- BIOLOGICAL_PROCESS/PATHWAY -> GO: cellular growth, melanin biosynthesis,
  tyrosine catabolism, cytokinesis, pigment biosynthesis, nutrient sensing,
  sulfur oxidation, formaldehyde assimilation, ribulose monophosphate cycle
- CELLULAR_LOCALIZATION -> GO-CC: flagellar filament, flagellar hook, forespore
- ENVIRONMENTAL_FACTOR -> PATO/ENVO: high temperature, high osmolarity,
  cold environment

Each row carries the skos predicate_id strength (exactMatch / closeMatch).

Rejected during review (not grounded): nutrient uptake (GO obsolete),
spore coat->spore wall (distinct bacterial structures), plant tissue
colonization and intracellular membrane (semantic mismatch), and charge/role-
ambiguous chemicals (electron acceptor, pyocyanin).

Residual: 658 node instances / 561 (label,type) keys remain — a long tail of
idiosyncratic descriptive phrases (precursor metabolites, lateral cell-wall
elongation, compatible-solute transport) and GENE_OR_PROTEIN labels needing
UniProt lookups. Exact matching keeps precision high; raising recall would
need label normalization or fuzzy matching (higher risk, deferred).

validate-strict: 477 files, 0 errors. Grounding is idempotent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@realmarcin realmarcin merged commit 137ff3e into main Jun 14, 2026
2 checks passed
@realmarcin realmarcin deleted the claude/node-grounding-pass branch June 14, 2026 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant