Conversation
Signed-off-by: Evgeny <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR introduces hybrid (per-direction) quantization, allowing a
Confidence Score: 4/5Two P1 defects — a TypeError in GroupedLinear's hybrid quantize path when None quantizers appear, and FSDP2 gathering None on Hopper/L40 for columnwise-only Float8 sub-storages — should be addressed before merging. Two confirmed P1 bugs: (1)
Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class Quantizer {
+quantize(tensor)
+make_empty(shape)
+update_quantized(src, dst)
}
class HybridQuantizer {
+rowwise_quantizer: Quantizer
+columnwise_quantizer: Quantizer
+quantize_impl(tensor)
+make_empty(shape)
+supports_only_rowwise_all_gather()
}
class QuantizedTensorStorage {
+fsdp_buffer_fields()
+fsdp_extract_buffers()
+fsdp_assign_gathered()
}
class HybridQuantizedTensorStorage {
+_rowwise_storage
+_columnwise_storage
+dequantize()
+prepare_for_saving()
+view()
}
class HybridQuantizedTensor {
+fsdp_pre_all_gather()
+fsdp_post_all_gather()
+detach()
+__torch_dispatch__()
}
class Float8TensorStorage {
+_data: Tensor
+_transpose: Tensor
+fsdp_buffer_fields() returns _data unconditionally
}
class MXFP8TensorStorage {
+_rowwise_data: Tensor
+_columnwise_data: Tensor
+fsdp_buffer_fields()
+fsdp_extract_buffers()
+fsdp_assign_gathered()
}
Quantizer <|-- HybridQuantizer
QuantizedTensorStorage <|-- HybridQuantizedTensorStorage
HybridQuantizedTensorStorage <|-- HybridQuantizedTensor
QuantizedTensorStorage <|-- Float8TensorStorage
QuantizedTensorStorage <|-- MXFP8TensorStorage
HybridQuantizer --> Quantizer : rowwise_quantizer
HybridQuantizer --> Quantizer : columnwise_quantizer
HybridQuantizedTensorStorage --> QuantizedTensorStorage : _rowwise_storage
HybridQuantizedTensorStorage --> QuantizedTensorStorage : _columnwise_storage
Reviews (4): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile |
timmoon10
left a comment
There was a problem hiding this comment.
Overall I think this moves us in a good direction. I see some minor bugs, as well as bugs reported by @greptile-apps.
| rowwise_result = self.rowwise_quantizer.quantize(tensor) | ||
| columnwise_result = self.columnwise_quantizer.quantize(tensor) |
There was a problem hiding this comment.
Do we handle the case where not all usages are needed? I'd expect something like:
| rowwise_result = self.rowwise_quantizer.quantize(tensor) | |
| columnwise_result = self.columnwise_quantizer.quantize(tensor) | |
| rowwise_result = self.rowwise_quantizer.quantize(tensor) if self.rowwise_usage else None | |
| columnwise_result = self.columnwise_quantizer.quantize(tensor) if self.columnwise_usage else None |
| requires_grad: bool = False, | ||
| pin_memory: bool = False, | ||
| ) -> HybridQuantizedTensor: | ||
| self.rowwise_quantizer.internal = True |
There was a problem hiding this comment.
Could we just set internal=True in the constructor? I don't think we ever need PyTorch tensor functionality in the per-usage data.
There was a problem hiding this comment.
This would not work under FSDP2.
| def factory(role): | ||
| if role == "linear_weight": | ||
| return HybridQuantizer( | ||
| rowwise_quantizer=_make_fp8_quantizer(), | ||
| columnwise_quantizer=_make_mxfp8_quantizer(), | ||
| ) | ||
| if role == "linear_input": | ||
| return HybridQuantizer( | ||
| rowwise_quantizer=_make_fp8_quantizer(), | ||
| columnwise_quantizer=_make_nvfp4_quantizer(), | ||
| ) | ||
| if role in ("linear_grad_output", "linear_grad_input"): | ||
| return HybridQuantizer( | ||
| rowwise_quantizer=_make_mxfp8_quantizer(), | ||
| columnwise_quantizer=_make_nvfp4_quantizer(), | ||
| ) | ||
| return None |
There was a problem hiding this comment.
This is horrifying. Good test.
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
Signed-off-by: Evgeny <etsykunov@nvidia.com>
for more information, see https://pre-commit.ci
| raise ValueError( | ||
| "GroupedLinear quantizer list mixes HybridQuantizer and non-hybrid" | ||
| f" quantizers ({hybrid_count} hybrid, {len(non_none) - hybrid_count}" | ||
| " non-hybrid). This combination is not supported: neither" | ||
| " `tex.split_quantize` nor `_hybrid_split_quantize` can consume a" | ||
| " heterogeneous list. Make the CustomRecipe `qfactory` return a" | ||
| " consistent type (all HybridQuantizer or all non-hybrid) across" | ||
| " every GEMM for the same role." |
There was a problem hiding this comment.
_hybrid_split_quantize TypeError on None-containing quantizer lists
_is_hybrid_quantizer_list explicitly allows None entries in the quantizer list (it filters them with non_none = [q for q in quantizers if q is not None]) and returns True when all non-None entries are HybridQuantizer. However, _hybrid_split_quantize's type guard is:
if not all(isinstance(q, HybridQuantizer) for q in quantizers):
raise TypeError(...)isinstance(None, HybridQuantizer) is False, so this raises TypeError when any None entry exists — before even reaching [q.rowwise_quantizer for q in quantizers]. Since tex.split_quantize in the non-hybrid path accepts None quantizers, None entries are a valid input to the GroupedLinear quantizer list, making this a real runtime error whenever a list like [HybridQuantizer, None, HybridQuantizer] is passed.
Fix option A — tighten _is_hybrid_quantizer_list to treat None+Hybrid as unsupported (simplest):
if hybrid_count > 0 and len(non_none) < len(quantizers):
raise ValueError("None quantizers are not supported alongside HybridQuantizer.")Fix option B — filter None inside _hybrid_split_quantize and propagate None entries as pass-through:
if not all(isinstance(q, HybridQuantizer) for q in quantizers if q is not None):
raise TypeError(...)
row_quantizers = [q.rowwise_quantizer if q is not None else None for q in quantizers]
col_quantizers = [q.columnwise_quantizer if q is not None else None for q in quantizers]
Description
Hybrid (per-direction) quantization. Functional.
C++ optimizations (fusions, etc.) will come in the next PRs.
Ecosystem integration (functional):
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: