Skip to content

fix: boolean handler must not coerce None or empty-string to False#196

Open
afkerhousse wants to merge 1 commit intosinger-io:masterfrom
afkerhousse:fix/boolean-null-coercion-order
Open

fix: boolean handler must not coerce None or empty-string to False#196
afkerhousse wants to merge 1 commit intosinger-io:masterfrom
afkerhousse:fix/boolean-null-coercion-order

Conversation

@afkerhousse
Copy link
Copy Markdown

Summary

transform_recur always moves "null" to the end of the type list so it is tried last. For a schema typed ["null", "boolean"] this means boolean is attempted first.

The boolean handler fell through to bool(data) which:

  • bool(None)False — no exception raised → returns (True, False)
  • bool("")False — no exception raised → returns (True, False)

In both cases the null handler is never reached, so any field typed ["null", "boolean"] whose API value is None or "" is silently written as false instead of null.

This is a silent data corruption bug: null values are replaced with false in every destination without any schema mismatch error being raised.

Reproduction

from singer import Transformer, UNIX_MILLISECONDS_INTEGER_DATETIME_PARSING

schema = {"type": ["null", "boolean"]}

with Transformer(UNIX_MILLISECONDS_INTEGER_DATETIME_PARSING) as t:
    print(t.transform(None,  schema, {}))  # returns False — should be None
    print(t.transform("",    schema, {}))  # returns False — should be None
    print(t.transform(True,  schema, {}))  # returns True  ✓
    print(t.transform(False, schema, {}))  # returns False ✓

Fix

Return (False, None) early in the boolean handler when data is None or data == "", so transform_recur falls through to the null handler which correctly returns (True, None).

Impact

Any tap that syncs a nullable boolean field from an API that returns null or "" for unset values is affected. Confirmed reproduction via tap-hubspot contacts stream after its migration to the HubSpot CRM v3 API, where the API documentation explicitly states unset properties are returned as null.

Test plan

  • transform(None, {"type": ["null", "boolean"]}, {}) returns None
  • transform("", {"type": ["null", "boolean"]}, {}) returns None
  • transform(True, {"type": ["null", "boolean"]}, {}) returns True
  • transform(False, {"type": ["null", "boolean"]}, {}) returns False
  • transform("true", {"type": ["null", "boolean"]}, {}) returns True
  • transform("false", {"type": ["null", "boolean"]}, {}) returns False
  • transform(None, {"type": "boolean"}, {}) still raises SchemaMismatch (non-nullable field)

🤖 Generated with Claude Code

transform_recur always moves null to the end of the type list so that
null is tried last. For a schema type ["null", "boolean"] this means
boolean is attempted first.

The boolean handler previously reached bool(None) = False and returned
(True, False) without raising, so the null handler was never reached.
The same applied to empty strings: bool("") = False.

As a result, any field typed ["null", "boolean"] whose API value was
None or "" was silently written as false instead of null, corrupting
data in every downstream destination.

Fix: return (False, None) early in the boolean handler when data is
None or "", allowing transform_recur to fall through to the null
handler which correctly returns (True, None).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant