Skip to content

Fix lexer::scan() and utils::impl MyStrExt for &str #21

Closed
hardglitch wants to merge 5 commits into
d0rianb:masterfrom
hardglitch:master
Closed

Fix lexer::scan() and utils::impl MyStrExt for &str #21
hardglitch wants to merge 5 commits into
d0rianb:masterfrom
hardglitch:master

Conversation

@hardglitch

Copy link
Copy Markdown

Fix UTF-8 safety in string slicing

Problem:
The original code indexed &str using byte offsets (bytes[i] as char and slices like &s[a..b]), which breaks UTF-8 when multi-byte characters are present (e.g., Cyrillic). This caused panics like: byte index N is not a char boundary

Solution:

  • Loops rewritten to use char_indices() so all slice indices always land on valid UTF-8 character boundaries.
  • Slices computed safely with i and c.len_utf8().
  • Functions scan() and split_first_whitespace() now handle Unicode correctly.
  • Added is_only_whitespace() implemented via chars().all(|c| c.is_whitespace()).

Effect:
The library no longer panics on RTF containing Cyrillic or other multi-byte characters, while preserving correct behavior for ASCII.

@d0rianb d0rianb left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the work,
Just a few comments on linting belwow

Comment thread src/lexer.rs
Comment thread src/lexer.rs
Comment thread src/utils.rs
@d0rianb

d0rianb commented Oct 7, 2025

Copy link
Copy Markdown
Owner

This could fix #18.
However, a test on the parser is not passing. It seems that a whitespace is missing.
maybe you could try this test in the lexer :

#[test]
fn should_lex_unicode() {
  let rtf = r#"{\u21834  \u21834 }"#;
  let tokens = Lexer::scan(rtf).unwrap();
  assert_eq!(
      tokens,
      vec![OpeningBracket, ControlSymbol((Unicode, Value(21834))), PlainText(" "), ControlSymbol((Unicode, Value(21834))), ClosingBracket]
  );
}

d0rianb added a commit that referenced this pull request Jun 11, 2026
@d0rianb

d0rianb commented Jun 11, 2026

Copy link
Copy Markdown
Owner

I re-implemented this PR in 93486db. It should be released on next patch version

@d0rianb d0rianb closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants