Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741
Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741jkmassel wants to merge 6 commits into
.strings handling: unquoted strings, inline comments, and merge prefixing#741Conversation
Generated by 🚫 Danger |
`StringsFileValidationHelper.find_duplicated_keys` tokenized only quoted keys, so a valid ASCII-plist file using unquoted strings — `CFBundleName = WordPress;`, common in `InfoPlist.strings` — raised `Invalid character`. The state machine now also tokenizes unquoted keys and values, accepting the same character set `plutil` allows for unquoted strings (alphanumerics plus `_ . - $ : /`). The scanner only ever runs on input `plutil` has already accepted, so matching its grammar keeps a parseable file from tripping it.
`L10nHelper.strings_file_type` gains an opt-in `assume_valid:` flag that skips the `plutil -lint` validity check when the caller has already parsed the file, avoiding a redundant `plutil` invocation. Default behavior is unchanged for every existing caller.
`ios_lint_localizations` open-coded a `strings_file_type == :text ? scan : warn` gate. That logic now lives in `StringsFileValidationHelper.scan_for_duplicate_keys`, which detects the file format and returns a `[:scanned | :unsupported_format | :unscannable, payload]` tri-state; each caller maps it to its own policy. As a side effect, `ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a valid-but-not-flat-`.strings` file — it warns via `UI.important` and skips it.
`StringsFileValidationHelper`'s tokenizer only recognized comments in its `:root` context, so a comment placed between the tokens of a statement — `"key" /* note */ = "value";`, `"key" = /* note */ "value";`, or `"key" = "value" /* note */;` — raised `Invalid character` even though `plutil` accepts all three. The scanner now carries a `resume_context` so a comment returns to the state it interrupted instead of always dropping back to `:root`. The one ambiguous position — a `/` right after `=`, which could begin a comment or a `/usr/bin`-style unquoted value — is disambiguated one character later via a new `:maybe_comment_or_value` state, so `/`-leading unquoted values still parse.
`L10nHelper.merge_strings` rewrites each key to add the caller's prefix, but its line matcher only handled `[A-Z0-9_]` keys with a *quoted* value. An unquoted key containing `. - $ : /`, or any line with an *unquoted* value (`CFBundleName = WordPress;`), was written to the merged file without the prefix while still being bookkept *with* it — an inconsistency that can resurface the collisions the prefix avoids and break downstream key extraction. The matcher now accepts the full unquoted-string character set `plutil` allows. The unquoted-value case keys on the fact that an unquoted string can't contain spaces, so it won't mistake comment prose (`and = a red herring ...`) for a key/value pair — a case the fixtures deliberately guard.
.strings duplicate-key detection (unquoted values, shared scan helper).strings handling: unquoted strings, inline comments, and merge prefixing
A temporal localization guardrail: compares two versions of a base-language `.strings` file and fails when an existing key's value changes its format placeholders (count, position, or argument type), which silently breaks every translation filed under that key. - `StringPlaceholdersHelper` — a pure-Ruby, platform-agnostic primitive that reduces a value's printf/`NSString` specifiers to a canonical signature. - `ios_lint_localization_placeholder_changes` — the iOS action wrapping it, with fail-closed guards for duplicate keys and mixed positional/non-positional specifiers. Builds on the `.strings` parsing and `scan_for_duplicate_keys` helper from #741.
| duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path) | ||
| duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty? | ||
| else | ||
| status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path) |
| UI.important <<~WRONG_FORMAT | ||
| File `#{path}` is in #{file_type} format, while finding duplicate keys only make sense on files that are in ASCII-plist format. | ||
| Since your files are in #{file_type} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call. | ||
| File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format. |
There was a problem hiding this comment.
Nitpick, even though it was in the previous version "only makes sense" is a bit condescending IMHO.
| File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format. | |
| File `#{path}` is in #{payload} format, while finding duplicate keys can only occurr on files that are in ASCII-plist format. |
| # The value is required to be a single unquoted token ending in `;` — an unquoted string can't contain spaces, so | ||
| # this won't mistake comment prose like `and = a red herring for when…` (spaces in the "value") for a key/value pair. | ||
| line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*)([a-zA-Z0-9_.$:/-]+\s*;)}u, "\\1\"#{prefix}\\2\"\\3\\4") | ||
| end |
There was a problem hiding this comment.
My AI noticed that merge_strings doesn't handle inter-token comments.
Could this logic handle the same inter-token comment placements that StringsFileValidationHelper now accepts? (https://github.com/wordpress-mobile/release-toolkit/pull/741/changes#diff-6e464dfe8ccd3974026649f8aaa02f9f8518261ef391f13bdb9535ef46d2b908R111-R116)
For example, CFBundleName /* c */ = WordPress; is accepted by plutil and by the duplicate-key scanner after this PR, but merge_strings still writes it unprefixed while bookkeeping it as prefixed.
That can reintroduce collisions in the merged file.
I added failing specs in #742 to pin the gap.
AI suggests: A tokenizer-backed rewrite would be ideal, but a comment-aware interim rewrite would also close the regression.
Summary
Related improvements to the existing
.stringshandling — the duplicate-key scanner (StringsFileValidationHelper) and themerge_stringshelper — extracted from #738 so the newios_lint_localization_placeholder_changesaction, which builds on these, can be reviewed on its own. This PR stands alone and is independently useful.The common thread: each change accepts more of the unquoted-string grammar
plutilitself allows, so legitimateInfoPlist.strings-style input — unquoted keys/values, inter-token comments — is handled instead of rejected.Changes
1. Parse unquoted
.stringsvalues and widen the unquoted grammarStringsFileValidationHelperalready tokenized unquoted keys; it now also tokenizes unquoted values and accepts the full character setplutilallows for unquoted strings (alphanumerics plus_ . - $ : /). So anInfoPlist.strings-style file likeCFBundleName = WordPress;parses instead of raisingInvalid character. The scanner only ever runs on inputplutilhas already accepted, so matching its grammar keeps a parseable file from tripping it; a genuinely un-tokenizable plist (e.g. a nested-dictionary value) still fails closed.2. Accept comments placed between a statement's tokens
StringsFileValidationHelperrecognized comments only in its top-level:rootcontext, so a comment between a statement's tokens —"key" /* note */ = "value";,"key" = /* note */ "value";, or"key" = "value" /* note */;— raisedInvalid character, even thoughplutilaccepts all three. The scanner now threads aresume_contextso a comment returns to the state it interrupted instead of always dropping back to:root. The lone ambiguous spot — a/right after=, which could begin a comment or a/usr/bin-style unquoted value — is disambiguated one character later via a new:maybe_comment_or_valuestate, so/-leading unquoted values still parse. ✅3. Centralize duplicate-key detection behind
scan_for_duplicate_keysios_lint_localizationsopen-coded astrings_file_type == :text ? scan : warngate. That logic now lives inStringsFileValidationHelper.scan_for_duplicate_keys, which detects the file format and returns a[:scanned | :unsupported_format | :unscannable, payload]tri-state; each caller maps it to its own policy. As a side effect,ios_lint_localizations'check_duplicate_keysno longer crashes the lane on a valid-but-not-flat-.stringsfile — it warns viaUI.importantand skips it.4.
assume_valid:to skip a redundantplutil -lintL10nHelper.strings_file_typegains an opt-inassume_valid:flag that skips theplutil -lintvalidity check when the caller has already parsed the file. Default behavior is unchanged for every existing caller. (Used by #738.)5. Apply the
merge_stringskey prefix to unquoted keys and valuesL10nHelper.merge_stringsrewrites each key to add the caller's prefix, but its line matcher only handled[A-Z0-9_]keys with a quoted value. An unquoted key containing. - $ : /(e.g. a reverse-DNS key), or any line with an unquoted value (CFBundleName = WordPress;), was written to the merged file without the prefix — while still being bookkept with it. That inconsistency can resurface the very collisions the prefix exists to avoid and break downstream key extraction. The matcher now accepts the full unquoted-string setplutilallows; the unquoted-value case keys on the fact that an unquoted string can't contain spaces, so it won't mistake comment prose likeand = a red herring …for a key/value pair (a case the fixtures deliberately guard). ✅Test Plan
bundle exec rubocop— no violations.bundle exec rspec— 914 examples, 0 failures; new coverage for unquoted values, the widened key set,scan_for_duplicate_keys, the graceful warn-and-skip,assume_valid, inter-token comments, andmerge_stringsunquoted-key/value prefixing.Related
ios_lint_localization_placeholder_changesaction #738 will be rebased to depend on this.