feat: add replacer option to encode() for pre-encoding value transformation

### Problem Statement

## Problem

TOON's tabular row compression requires all fields in an array of objects to be scalar primitives. When any field holds a nested object, `detect_tabular_header` returns `None` and the encoder falls back to verbose indented list output.

A real example: 

**Current output (verbose, no tabular compression):**

```
blocks[2]:
  - type: header
    content: Q2 Sales Report
    bbox:
      x: 0.05
      y: 0.05
      width: 0.9
      height: 0.04
    confidence: high
    confidence_score: 0.98
  - type: paragraph
    content: This report summarizes sales...
    bbox:
      ...
```

I tried to use [key folding](https://toons.readthedocs.io/en/stable/examples/#key-folding-flatten-nested-keys). If I understand correctly, key folding is called on [detect_tabular_header](https://github.com/toon-format/toon-python/blob/e475c82e9da03dfaf88c0b277dee6b5d17100b13/src/toon_format/encoders.py#L274) which only sees the Python dict and not the folded representation. The only workaround today is to manually pre-flatten the data at every call. 

### Proposed Solution

## How TypeScript solves this — design reference

The TypeScript package exposes a `replacer` option on `encode()`. Its architecture is worth understanding before describing the Python proposal.

**Key design principle: the replacer is a separate pre-encoding pass, not inline logic.**

From the TypeScript source — [`packages/toon/src/encode/replacer.ts`](https://github.com/toon-format/toon/blob/a19a1179193451fad40f11ef88de5f363ea3684a/packages/toon/src/encode/replacer.ts):

```js
encodeJsonValue(
  options.replacer
    ? applyReplacer(normalizedValue, options.replacer)  // pre-pass
    : normalizedValue,                                  // no-op if absent
  options, 0
);
```

The replacer walks the full data tree and returns a new transformed value. The encoder then runs on that value exactly as if no replacer had been specified — **the encoder never sees the replacer**. This keeps encoding logic clean and makes the replacer composable with any future encoder change.

**TypeScript signature:**

```ts
replacer: (key: string, value: unknown, path: string[]) => unknown
```

- `key` — current key name; `""` for the root call; `str(i)` for array elements
- `value` — current value, already normalized
- `path` — path from root to the current node; object keys are `str`, array indices are `int` (mirrors TypeScript's `readonly (string | number)[]`)
- Return the same value to leave it unchanged; children are still traversed
- Return `undefined` to drop the key or element from output

**TypeScript implementation — four functions, ~50 lines:**

```
applyReplacer(root, replacer)       // entry: call replacer on root, then recurse
transformChildren(value, r, path)   // route to object or array transformer
transformObject(obj, r, path)       // iterate keys → call replacer → normalize → recurse
transformArray(arr, r, path)        // same with str(i) as key
```

---

## Proposed Python API

Mirror TypeScript exactly, including the pre-pass architecture.

**Signature:**

```python
Replacer = Callable[[str, Any, List[Union[str, int]]], Any]

encode(value, {"replacer": fn})  # fn: Replacer
```
The replacer supports two distinct operations:

**Option A — flatten a nested field into its parent (e.g. `bbox`)**

Return a _new parent object_ when the replacer is called on the containing dict.
The nested field is replaced with flat scalar siblings, which is what `detect_tabular_header` requires.
The data is fully preserved — just restructured.

```python
from toon_format import encode

def flatten_bbox(key, value, path):
    if isinstance(value, dict) and isinstance(value.get("bbox"), dict):
        bbox = value["bbox"]
        return {k: v for k, v in value.items() if k != "bbox"} | {
            "bbox_x": bbox["x"], "bbox_y": bbox["y"],
            "bbox_w": bbox["width"], "bbox_h": bbox["height"],
        }
    return value  # leave everything else unchanged

result = encode(blocks, {"replacer": flatten_bbox})
```

**Output — tabular compression now triggers:**

```
blocks[2]{type,content,bbox_x,bbox_y,bbox_w,bbox_h,confidence,confidence_score}:
  header,Q2 Sales Report,0.05,0.05,0.9,0.04,high,0.98
  paragraph,This report summarizes sales...,0.05,0.12,0.9,0.06,high,0.97
```

**Option B — drop fields entirely (e.g. LLM-irrelevant metadata)**

Return the `OMIT` sentinel (Python has no `undefined`) to remove a key or array element from the output.
Useful for fields like `embed`, `enriched`, and `enrichment_success` that are preprocessing artefacts and add no value when the output is consumed by an LLM.
`OMIT` is exported from the top-level package.

```python
from toon_format import encode, OMIT

_SKIP = {"embed", "enriched", "enrichment_success"}

def strip_metadata(key, value, path):
    if key in _SKIP:
        return OMIT
    return value

result = encode(response, {"replacer": strip_metadata})
```

Both options can be combined in a single replacer function.

### Alternatives Considered

_No response_

### SPEC Compliance

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add replacer option to encode() for pre-encoding value transformation #60

Problem Statement

Problem

Proposed Solution

How TypeScript solves this — design reference

Proposed Python API

Alternatives Considered

SPEC Compliance

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

feat: add replacer option to encode() for pre-encoding value transformation #60

Description

Problem Statement

Problem

Proposed Solution

How TypeScript solves this — design reference

Proposed Python API

Alternatives Considered

SPEC Compliance

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions