Skip to content

Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741

Open
jkmassel wants to merge 6 commits into
trunkfrom
foundation
Open

Improve .strings handling: unquoted strings, inline comments, and merge prefixing#741
jkmassel wants to merge 6 commits into
trunkfrom
foundation

Conversation

@jkmassel

@jkmassel jkmassel commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Related improvements to the existing .strings handling — the duplicate-key scanner (StringsFileValidationHelper) and the merge_strings helper — extracted from #738 so the new ios_lint_localization_placeholder_changes action, which builds on these, can be reviewed on its own. This PR stands alone and is independently useful.

The common thread: each change accepts more of the unquoted-string grammar plutil itself allows, so legitimate InfoPlist.strings-style input — unquoted keys/values, inter-token comments — is handled instead of rejected.

Changes

1. Parse unquoted .strings values and widen the unquoted grammar

StringsFileValidationHelper already tokenized unquoted keys; it now also tokenizes unquoted values and accepts the full character set plutil allows for unquoted strings (alphanumerics plus _ . - $ : /). So an InfoPlist.strings-style file like CFBundleName = WordPress; parses instead of raising Invalid character. The scanner only ever runs on input plutil has already accepted, so matching its grammar keeps a parseable file from tripping it; a genuinely un-tokenizable plist (e.g. a nested-dictionary value) still fails closed.

2. Accept comments placed between a statement's tokens

StringsFileValidationHelper recognized comments only in its top-level :root context, so a comment between a statement's tokens — "key" /* note */ = "value";, "key" = /* note */ "value";, or "key" = "value" /* note */; — raised Invalid character, even though plutil accepts all three. The scanner now threads a resume_context so a comment returns to the state it interrupted instead of always dropping back to :root. The lone ambiguous spot — a / right after =, which could begin a comment or a /usr/bin-style unquoted value — is disambiguated one character later via a new :maybe_comment_or_value state, so /-leading unquoted values still parse. ✅

3. Centralize duplicate-key detection behind scan_for_duplicate_keys

ios_lint_localizations open-coded a strings_file_type == :text ? scan : warn gate. That logic now lives in StringsFileValidationHelper.scan_for_duplicate_keys, which detects the file format and returns a [:scanned | :unsupported_format | :unscannable, payload] tri-state; each caller maps it to its own policy. As a side effect, ios_lint_localizations' check_duplicate_keys no longer crashes the lane on a valid-but-not-flat-.strings file — it warns via UI.important and skips it.

4. assume_valid: to skip a redundant plutil -lint

L10nHelper.strings_file_type gains an opt-in assume_valid: flag that skips the plutil -lint validity check when the caller has already parsed the file. Default behavior is unchanged for every existing caller. (Used by #738.)

5. Apply the merge_strings key prefix to unquoted keys and values

L10nHelper.merge_strings rewrites each key to add the caller's prefix, but its line matcher only handled [A-Z0-9_] keys with a quoted value. An unquoted key containing . - $ : / (e.g. a reverse-DNS key), or any line with an unquoted value (CFBundleName = WordPress;), was written to the merged file without the prefix — while still being bookkept with it. That inconsistency can resurface the very collisions the prefix exists to avoid and break downstream key extraction. The matcher now accepts the full unquoted-string set plutil allows; the unquoted-value case keys on the fact that an unquoted string can't contain spaces, so it won't mistake comment prose like and = a red herring … for a key/value pair (a case the fixtures deliberately guard). ✅

Test Plan

  • bundle exec rubocop — no violations.
  • bundle exec rspec — 914 examples, 0 failures; new coverage for unquoted values, the widened key set, scan_for_duplicate_keys, the graceful warn-and-skip, assume_valid, inter-token comments, and merge_strings unquoted-key/value prefixing.
  • CHANGELOG updated.
  • MIGRATION.md — N/A (additive + backward-compatible).

Related

@jkmassel jkmassel requested a review from a team as a code owner June 23, 2026 20:08
@dangermattic

Copy link
Copy Markdown
Collaborator
1 Warning
⚠️ No reviewers have been set for this PR yet. Please request a review from @⁠wordpress-mobile/apps-infrastructure.

Generated by 🚫 Danger

jkmassel added 6 commits June 23, 2026 15:59
`StringsFileValidationHelper.find_duplicated_keys` tokenized only quoted keys, so a
valid ASCII-plist file using unquoted strings — `CFBundleName = WordPress;`, common in
`InfoPlist.strings` — raised `Invalid character`. The state machine now also tokenizes
unquoted keys and values, accepting the same character set `plutil` allows for unquoted
strings (alphanumerics plus `_ . - $ : /`). The scanner only ever runs on input `plutil`
has already accepted, so matching its grammar keeps a parseable file from tripping it.
`L10nHelper.strings_file_type` gains an opt-in `assume_valid:` flag that skips the
`plutil -lint` validity check when the caller has already parsed the file, avoiding a
redundant `plutil` invocation. Default behavior is unchanged for every existing caller.
`ios_lint_localizations` open-coded a `strings_file_type == :text ? scan : warn` gate.
That logic now lives in `StringsFileValidationHelper.scan_for_duplicate_keys`, which
detects the file format and returns a `[:scanned | :unsupported_format | :unscannable,
payload]` tri-state; each caller maps it to its own policy. As a side effect,
`ios_lint_localizations`' `check_duplicate_keys` no longer crashes the lane on a
valid-but-not-flat-`.strings` file — it warns via `UI.important` and skips it.
`StringsFileValidationHelper`'s tokenizer only recognized comments in its `:root`
context, so a comment placed between the tokens of a statement — `"key" /* note */ =
"value";`, `"key" = /* note */ "value";`, or `"key" = "value" /* note */;` — raised
`Invalid character` even though `plutil` accepts all three.

The scanner now carries a `resume_context` so a comment returns to the state it
interrupted instead of always dropping back to `:root`. The one ambiguous position —
a `/` right after `=`, which could begin a comment or a `/usr/bin`-style unquoted
value — is disambiguated one character later via a new `:maybe_comment_or_value`
state, so `/`-leading unquoted values still parse.
`L10nHelper.merge_strings` rewrites each key to add the caller's prefix, but its line
matcher only handled `[A-Z0-9_]` keys with a *quoted* value. An unquoted key
containing `. - $ : /`, or any line with an *unquoted* value (`CFBundleName = WordPress;`),
was written to the merged file without the prefix while still being bookkept *with* it
— an inconsistency that can resurface the collisions the prefix avoids and break
downstream key extraction.

The matcher now accepts the full unquoted-string character set `plutil` allows. The
unquoted-value case keys on the fact that an unquoted string can't contain spaces, so
it won't mistake comment prose (`and = a red herring ...`) for a key/value pair — a
case the fixtures deliberately guard.
@jkmassel jkmassel changed the title Improve .strings duplicate-key detection (unquoted values, shared scan helper) Improve .strings handling: unquoted strings, inline comments, and merge prefixing Jun 23, 2026
@jkmassel jkmassel added bug Something isn't working ruby Pull requests that update ruby code labels Jun 23, 2026
@jkmassel jkmassel self-assigned this Jun 23, 2026
@jkmassel jkmassel requested a review from a team June 23, 2026 23:10
jkmassel added a commit that referenced this pull request Jun 23, 2026
A temporal localization guardrail: compares two versions of a base-language
`.strings` file and fails when an existing key's value changes its format
placeholders (count, position, or argument type), which silently breaks every
translation filed under that key.

- `StringPlaceholdersHelper` — a pure-Ruby, platform-agnostic primitive that
  reduces a value's printf/`NSString` specifiers to a canonical signature.
- `ios_lint_localization_placeholder_changes` — the iOS action wrapping it, with
  fail-closed guards for duplicate keys and mixed positional/non-positional
  specifiers.

Builds on the `.strings` parsing and `scan_for_duplicate_keys` helper from #741.
duplicates = Fastlane::Helper::Ios::StringsFileValidationHelper.find_duplicated_keys(file: path)
duplicate_keys[language] = duplicates.map { |key, value| "`#{key}` was found at multiple lines: #{value.join(', ')}" } unless duplicates.empty?
else
status, payload = Fastlane::Helper::Ios::StringsFileValidationHelper.scan_for_duplicate_keys(file: path)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice appraoch.

UI.important <<~WRONG_FORMAT
File `#{path}` is in #{file_type} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
Since your files are in #{file_type} format, you should probably disable the `check_duplicate_keys` option from this `#{action_name}` call.
File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.

@mokagio mokagio Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick, even though it was in the previous version "only makes sense" is a bit condescending IMHO.

Suggested change
File `#{path}` is in #{payload} format, while finding duplicate keys only make sense on files that are in ASCII-plist format.
File `#{path}` is in #{payload} format, while finding duplicate keys can only occurr on files that are in ASCII-plist format.

# The value is required to be a single unquoted token ending in `;` — an unquoted string can't contain spaces, so
# this won't mistake comment prose like `and = a red herring for when…` (spaces in the "value") for a key/value pair.
line.gsub!(%r{^(\s*)([a-zA-Z0-9_.$:/-]+)(\s*=\s*)([a-zA-Z0-9_.$:/-]+\s*;)}u, "\\1\"#{prefix}\\2\"\\3\\4")
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My AI noticed that merge_strings doesn't handle inter-token comments.

Could this logic handle the same inter-token comment placements that StringsFileValidationHelper now accepts? (https://github.com/wordpress-mobile/release-toolkit/pull/741/changes#diff-6e464dfe8ccd3974026649f8aaa02f9f8518261ef391f13bdb9535ef46d2b908R111-R116)

For example, CFBundleName /* c */ = WordPress; is accepted by plutil and by the duplicate-key scanner after this PR, but merge_strings still writes it unprefixed while bookkeeping it as prefixed.
That can reintroduce collisions in the merged file.

I added failing specs in #742 to pin the gap.

AI suggests: A tokenizer-backed rewrite would be ideal, but a comment-aware interim rewrite would also close the regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ruby Pull requests that update ruby code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants