[refine](column) enforce nullable nested types for array and map#62491
[refine](column) enforce nullable nested types for array and map#62491Mryange wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one blocking issue.
be/src/core/data_type/data_type_array.cpp: forcing every array child throughmake_nullable()changes the FE-visible type metadata that BE exports.DataTypeArray::to_protobuf()now always reportscontains_null=true, and FE'sFoldConstantRuleOnBE.convertToNereidsType()reconstructs NereidsArrayTypefrom that flag. That means BE constant folding can no longer round-tripARRAY<... NOT NULL>precisely. A concrete case isCAST([1] AS ARRAY<INT NOT NULL>): after folding, FE will seeARRAY<INT>instead of the original non-null-child type. FE still treatsArrayType.containsNullas semantically significant in exact matching (fe/fe-type/.../ArrayType.java,Type.matchExactType()), so this is a real behavior regression, not just an internal refactor.
Critical checkpoint conclusions:
- Goal of the task: standardize BE array/map nested types as nullable. The code mostly does that, but it does not preserve the existing FE-visible array child nullability contract on the BE->FE constant-folding path, so the goal is not achieved safely end-to-end.
- Minimality/focus: the patch is small and focused, but it misses one required follow-through path (
PTypeDescexport / FE reconstruction). - Concurrency: no concurrency concerns in the touched code.
- Lifecycle/static init: no special lifecycle concerns.
- Config changes: none.
- Compatibility: this changes transmitted complex-type metadata behavior for arrays and is not compatibility-safe as written.
- Parallel code paths: I checked the related SerDe/cast/type-factory paths. The array protobuf path is the one that remains inconsistent with FE expectations.
- Special condition checks: the new
DORIS_CHECKassertions are reasonable for invariant enforcement. - Test coverage: insufficient. There is no regression test covering FE<->BE round-trip of
ARRAY<... NOT NULL>through BE constant folding or protobuf type export. - Test result changes: none.
- Observability: not needed for this refactor.
- Transaction/persistence/data-write concerns: none.
- FE/BE variable passing: no new fields, but existing transmitted type metadata is affected and not preserved correctly.
- Performance: no material issue identified.
- Other issues: none beyond the blocking metadata regression above.
Because of the array type-metadata regression, I am requesting changes rather than approving.
|
|
||
| DataTypeArray::DataTypeArray(const DataTypePtr& nested_) : nested {nested_} {} | ||
| DataTypeArray::DataTypeArray(const DataTypePtr& nested_) { | ||
| DataTypePtr nullable_nested = make_nullable(nested_); |
There was a problem hiding this comment.
Wrapping every array child in make_nullable() changes the type metadata you export later. DataTypeArray::to_protobuf() uses nested->is_nullable(), and FE's FoldConstantRuleOnBE.convertToNereidsType() reconstructs ArrayType from that flag. After this change, a folded expression like CAST([1] AS ARRAY<INT NOT NULL>) will round-trip back to FE as ARRAY<INT> because BE now always reports contains_null=true. FE still treats ArrayType.containsNull as part of exact type matching, so this is a real regression in the constant-folding path, not just an internal invariant cleanup. Please preserve the original child-nullability flag on the exported type metadata path or add a compatibility fix/test for it.
|
run buildall |
|
/review |
There was a problem hiding this comment.
Blocking findings:
-
fe/fe-type/src/main/java/org/apache/doris/catalog/ArrayType.java:80
This PR no longer preserves previously storedcontainsNullvalues in FE catalog types. After replay/deserialization, every runtime consumer now observestrue(getContainsNull(),matchesType(),toSql(),toThrift()), so older metadata or external schemas that carriedcontainsNull=falsesilently change shape when FE reuses or re-serializes them. -
fe/fe-type/src/main/java/org/apache/doris/catalog/MapType.java:98and:105
MapTypenow flattens both nested nullability bits totrueas well. That removes information still consumed by downstream FE integrations such as Iceberg/Paimon schema conversion and FE->BE thrift export, so required nested fields can no longer be distinguished from optional ones.
Critical checkpoint conclusions:
- Goal of current task: Make BE/FE complex nested types consistently nullable. The BE-side invariant change looks aligned with that goal, but the FE catalog changes go further and break compatibility for previously stored nested-nullability metadata. No test demonstrates replay/round-trip safety.
- Change scope: Not as small/focused as it should be. Besides internal type normalization, it changes FE catalog equality, SQL rendering, thrift serialization, and external integration behavior.
- Concurrency: No meaningful concurrency or lock-order issues in the touched paths.
- Lifecycle/static initialization: No special lifecycle or static-init issues found.
- Configuration: No config changes.
- Compatibility: Blocking issue. Persisted/catalog/thrift type semantics change without compatibility handling or version gating.
- Parallel code paths: External catalog conversion paths (notably Iceberg/Paimon) still depend on these FE nullability bits and are affected.
- Conditional logic/comments: New comments explain the intended invariant, but they do not justify removing compatibility-preserving behavior.
- Test coverage: Insufficient for a compatibility-sensitive type-system change. Only local unit expectations were updated; no replay, thrift round-trip, or external catalog coverage was added.
- Modified test results: Updated unit assertions are internally consistent, but they do not prove correctness for persisted metadata or downstream integrations.
- Observability: No extra observability required.
- Transaction/persistence: Persistence/serialization semantics are involved on the FE side and are not preserved.
- Data writes/modifications: I did not find a direct storage-write regression in the touched code.
- FE/BE variable passing: Existing type metadata sent through thrift changes meaning on the FE side without compatibility handling.
- Performance: No blocking performance concern found.
- Other issues: None beyond the two compatibility regressions above.
Requesting changes until the FE catalog types either preserve stored nested-nullability bits, or the PR adds a complete/versioned compatibility migration plus tests that cover replay and downstream integrations.
| * Always returns {@code true}. Array elements are always nullable in Doris. | ||
| */ | ||
| public boolean getContainsNull() { | ||
| return containsNull; |
There was a problem hiding this comment.
This breaks replay/round-trip compatibility for existing FE metadata that was serialized with containsNull=false. The field is still retained for deserialization, but after this change every runtime consumer sees true, so replayed types silently change shape and FE can no longer faithfully re-emit the original schema through matchesType(), toSql(), or thrift. If the goal is only to normalize BE internals, the FE catalog type still needs to preserve the stored bit or introduce an explicit compatibility migration.
| TTypeNode node = new TTypeNode(); | ||
| container.types.add(node); | ||
| Preconditions.checkNotNull(keyType); | ||
| Preconditions.checkNotNull(valueType); |
There was a problem hiding this comment.
This forces both map nested-nullability bits to true on every FE->BE thrift serialization. That is a compatibility regression for existing FE metadata / external schemas that still carry isKeyContainsNull=false or isValueContainsNull=false, and it also removes information still consumed by downstream FE integrations such as DorisTypeToIcebergType and DorisToPaimonTypeVisitor. Please keep the stored bits intact unless a versioned migration updates every persisted schema and its readers together.
c589160 to
453b7d4
Compare
|
run buildall |
What problem does this PR solve?
Problem Summary:
This PR makes the nested types inside
ArrayandMapexplicitly nullable in BE type implementations, instead of relying on implicit caller-side conventions.DataTypeArraynow always stores nullable nested element typeDataTypeMapnow always stores nullable key/value typesDataTypeArraySerDeandDataTypeMapSerDeare updated to follow the same invariantRelease note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)