refactor: structural improvements to distance, trie, and spellcheck modules#775
Open
Wolfvin wants to merge 1 commit into
Open
refactor: structural improvements to distance, trie, and spellcheck modules#775Wolfvin wants to merge 1 commit into
Wolfvin wants to merge 1 commit into
Conversation
…odules ## Changes ### lib/natural/distance/levenshtein_distance.js - **DECOMPOSITION**: Extracted the 113-line monolithic levenshteinDistance() function into 4 focused helper functions: - initMatrix() — initialize the DP matrix with base cases - computeStandard() — standard Levenshtein (insert/delete/substitute) - computeDamerau() — restricted Damerau-Levenshtein (adjacent transpositions) - computeUnrestrictedDamerau() — unrestricted Damerau variant - findMinCostParent() — extracted min-cost selection logic The core computeLevenshtein orchestrator is now just 24 lines. - **REMOVED UNDERSCORE**: Replaced _.extend with Object.assign, _.min with reduce-based findMinCostParent. Removed the underscore dependency entirely. - **NAMING**: Renamed internal 'distance' function to 'computeLevenshtein' for clarity. - All 4 exported functions produce IDENTICAL output for IDENTICAL input. ### lib/natural/trie/trie.js - **BUG FIX**: keysWithPrefix() referenced this.caseSensitive but the property was stored as this.cs. This meant case-insensitive prefix search never actually lowercased the input — a real bug. Unified property name to this.caseSensitive throughout. - **REPLACE for...in**: Changed for...in on arrays to for...of in addStrings() to avoid iterating prototype properties. - **NAMING**: Renamed cs → caseSensitive, get() → findNode(), recurse() → collectWords(), stringAgg → currentPrefix, resultsAgg → results. ### lib/natural/spellcheck/spellcheck.js - **REPLACE for...in**: Changed 3 instances of for...in on arrays to for...of loops (constructor, getCorrections, editsWithMaxDistanceHelper). - **REPLACE indexOf dedup with Set**: getCorrections() and edits() now use Set for O(n) deduplication instead of indexOf O(n^2). - **NAMING**: word2frequency → wordFrequencies, distance2edits → distanceToEdits, distanceCounter → remainingDistance, wordscore → wordScore. ## Verification All changes verified with Regrets regression testing: - 10 fingerprint clusters: all GREEN - 2 chain tests: all MATCH - 5-run drift detection: all PASS+STABLE - Direct output comparison against pre-refactor baseline: IDENTICAL - Fingerprint cross-check against pre-refactor baseline: IDENTICAL
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR contains structural refactoring of three modules in natural, verified safe using the Regrets regression testing tool with a dual-truth verification pattern.
What was refactored and why
lib/natural/distance/levenshtein_distance.js
levenshteinDistance()function was extracted into 4 focused helpers:initMatrix(),computeStandard(),computeDamerau(),computeUnrestrictedDamerau(), andfindMinCostParent(). The corecomputeLevenshteinorchestrator is now just 24 lines._.extendwithObject.assignand_.minwith areduce-based helper. Theunderscoreimport was removed entirely.distancefunction tocomputeLevenshteinfor clarity.lib/natural/trie/trie.js
keysWithPrefix()referencedthis.caseSensitivebut the property was stored asthis.cs. This meant case-insensitive prefix search never actually lowercased the input — a real bug. Unified the property name tothis.caseSensitivethroughout.for...inon arrays tofor...ofinaddStrings().cs→caseSensitive,get()→findNode(),recurse()→collectWords(),stringAgg→currentPrefix,resultsAgg→results.lib/natural/spellcheck/spellcheck.js
for...inon arrays tofor...of.getCorrections()andedits()now useSetfor O(n) deduplication instead ofindexOfO(n²).word2frequency→wordFrequencies,distance2edits→distanceToEdits,distanceCounter→remainingDistance,wordscore→wordScore.Verification: KEBENARAN 1 vs Final Output
Before refactoring, I captured KEBENARAN 1 (raw ground truth output from all entry functions) and KEBENARAN 2 (Regrets fingerprint snapshot). After refactoring, I verified:
Verification 1 — Regrets Cluster: All 10 clusters GREEN
Verification 2 — Direct Output: All raw outputs IDENTICAL to KEBENARAN 1
Every function returns exactly the same value for the same input.
Verification 3 — Fingerprint Cross-Check: All fingerprints MATCH KEBENARAN 2
Verification 4 — Chain Hashes: Both chains MATCH
All 4 verifications confirm the refactoring preserved behavioral identity.