Skip to content

decode("[]") returns the corrupted string '[' instead of parsing or erroring #61

Description

@antrixy

Description

The TOON decoder fails to decode a root-level empty array []. Instead of
returning an empty list (or raising an error), decode("[]") returns the string
'[' — a single character, with the ] silently dropped. The canonical
empty-array forms ([0]:, items[0]:) decode correctly, so this is specific to
the bare [] form.

This is a silent-corruption bug: the input is neither parsed nor rejected, but
turned into wrong data that flows downstream with no signal.

Reproduction Steps

Steps to Reproduce

from toon_format import decode

print(repr(decode("[]")))        # Output: '['           ❌  (note: not even "[]")
print(repr(decode("[0]:")))      # Output: []            ✅
print(repr(decode("items[0]:"))) # Output: {'items': []} ✅

Expected Behavior

Per TOON spec v3.0, input "[]" has two acceptable outcomes:

  • Error — per §6, an array header's bracket segment "MUST parse as a
    non-negative integer length N"; [] has no integer, so it is not a valid
    header. Strict mode is the default (§13) and §14 is the authoritative
    MUST-error checklist. A raised error is conformant.
  • The string "[]" — per §5 (root-form discovery), a depth-0 line that is
    neither a valid array header nor a key-value line decodes as a single
    primitive, which for [] is the literal string "[]".
decode("[]")  # raises a decode error (strict, default)
# or
decode("[]")  # == "[]"   (primitive fallback)

Actual Behavior

decode("[]")  # == '['

The decoder returns '[' — the ] is dropped. This is neither a clean parse,
the correct primitive string "[]", nor an error. Even the most lenient reading
(treat [] as a primitive) yields "[]", not '[', so this is a parser defect
independent of any spec-version question. The dropped character points at an
off-by-one in the root-form / bracket-handling path.

Environment

Additional Context

Found via differential round-trip testing across TOON implementations. The bare
[] form is what the TypeScript reference (@toon-format/toon 2.3.0) currently
emits for an empty root array, so this input arises when data crosses between
implementations. For contrast, the TS decoder reads all three forms:

Decoder input @toon-format/toon 2.3.0 toon_format 0.9.0b1
"[]" [] '[' (corrupted)
"[0]:" [] []
"items[0]:" {items: []} {'items': []}

The spec is a fast-moving Working Draft (v1.0 → v3.0 in ~one month). If 0.9.0b1
targets a pre-v3.0 point where [] had defined semantics, the fix may be "parse
[] to []" rather than "error" — but returning '[' is incorrect under any
version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions