Fix globular clusters mis-typed as galaxies in NGC/IC import#445
Merged
Conversation
Steinicke's TYPE for a globular is its Shapley-Sawyer concentration class (a Roman numeral, e.g. "I"), which is_galaxy_type() read as an Irregular galaxy. The existing GCL-cross-ID rescue in preprocess_steinicke_type() ran *after* the galaxy check, so it was dead code for these objects. Move the globular disambiguation ahead of the galaxy check. The is_trumpler_class() guard keeps objects that merely mention a globular in their remarks (e.g. IC 4802, a star group inside Pal 9) from being reclassified. Corrects NGC 7006, NGC 2808, NGC 5824, NGC 5834 and NGC 6864 (M 75), previously typed Gx. Rebuilds pifinder_objects.db and adds NGC 7006 / M 75 regression assertions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The existing data-validation tests only spot-check a few well-known objects in the built DB, so the globular-as-galaxy mis-typing went unnoticed. Add direct coverage of the parsing layer where the bug lived: - Tier 1: parametrized unit tests of preprocess_steinicke_type(), one case per decision branch and known ambiguity (Roman-numeral galaxy/globular/ Trumpler overlap, combinations, suffix stripping, the IC 4802 guard). - Tier 2: an invariant audit over the real Steinicke source asserting that every object with a globular cross-ID and a concentration-class TYPE maps to Gb -- catching the whole class generically, not just named objects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A user reported NGC 7006 is listed as a galaxy but is a globular cluster. Investigation showed it's not alone — five globulars are mis-typed as
Gx:TYPEIGxGbIGxGbIGxGbIGxGbIGxGbRoot cause
For a globular, Steinicke's
TYPEis the Shapley–Sawyer concentration class — a Roman numeral such asI— whichis_galaxy_type()read as an Irregular galaxy.preprocess_steinicke_type()already had a rescue that reclassifies these asGbwhen aGCL/globular cross-ID is present in the remarks, but it ran after the galaxy check, so it was dead code for exactly these objects.Fix
Move the globular disambiguation ahead of the galaxy check. The existing
is_trumpler_class()guard ensures objects that merely mention a globular in their remarks are not reclassified — e.g. IC 4802, a star group inside Pal 9, correctly staysAst.Changes
steinicke_loader.py— reorder the globular rescue beforeis_galaxy_type, with an explanatory comment.test_catalog_data.py— add NGC 7006 and NGC 6864 (M 75) asGbregression assertions.pifinder_objects.db— regenerated via the full import pipeline.Verification
All 4 tests in
test_catalog_data.pypass; the rebuilt DB shows the 5 globulars asGband IC 4802 asAst.🤖 Generated with Claude Code
Test coverage (why this slipped through)
The previous tests only spot-checked a handful of famous objects in the built DB, and
test_object_countsis blind to type errors (GxvsGbis still one object). Addedpython/tests/test_steinicke_parsing.py:preprocess_steinicke_type()— one case per decision branch and known ambiguity (Roman-numeral galaxy/globular/Trumpler overlap, combinations, suffix stripping, the IC 4802 guard). No DB required.TYPEmust map toGb— catching the whole class generically rather than enumerating named objects.31 new tests; full suite (
test_steinicke_parsing.py+test_catalog_data.py) green.