There is a use_fp flag for the offline_quantize tool in saxml/tool to quantize the weight in fp8 but still has to be stored in int8(
|
# This is needed since fp8 cannot be saved. |
). If that is always the case, is there any example showcasing how to load a checkpoint in int8 but interpret as fp8?
@jianlijianli @zhangqiaorjc
There is a use_fp flag for the offline_quantize tool in saxml/tool to quantize the weight in fp8 but still has to be stored in int8(
praxis/praxis/layers/quantization/operations.py
Line 776 in 3f4cbb4