Add 2D Autoencoder and GMM Integration on Siracusa Target#190
Add 2D Autoencoder and GMM Integration on Siracusa Target#190Aldrago98 wants to merge 30 commits into
Conversation
diaconuccalin
left a comment
There was a problem hiding this comment.
Submitting the first third part of the review, I will finish going through the rest of the files as soon as possible.
There was a problem hiding this comment.
I think that these changes were made to adjust your local work env, and should not be pushed to main. Please revert them.
There was a problem hiding this comment.
The only changes to this file are blank lines. Please revert them completely.
| BatchNorm_fp32( | ||
| ${data_in}, ${scale}, ${bias}, ${mean}, ${variance}, | ||
| ${data_out}, ${batch_size}, ${channel_size}, ${window_size} | ||
| ${data_out}, ${batch_size}, ${channel_size}, ${window_size}, ${epsilon}, ${channels_first} |
There was a problem hiding this comment.
Let's add tests for this new feature (one test with channels_first 0, another with channels_first 1, and one more with non-defaul epsilon value)
There was a problem hiding this comment.
Let's revert all the changes in this file. It seems to me that it was a temporary fix to assign the n_cores to a new variable, which is not needed anymore, so no need for the new variable either.
| if engine is not None: | ||
| node.attrs["engine"] = engine.name | ||
| if hasattr(engine, "n_cores"): | ||
| node.attrs["n_cores"] = engine.n_cores |
There was a problem hiding this comment.
This is not a good approach IMO. The number of cores is not a node attribute (conceptually, the node attributes should follow the ones that exist in the real ONNX nodes). Plus, this issue of passing the information about the number of cores should already be solved, and the value should already exist in the operator representation, it's passed here. If this value doesn't get passed in your case, we should identify the root cause.
There was a problem hiding this comment.
Looking a little more into it, maybe you need to add NeurekaEngine in the list here.
| tilerModel.addTensorDimToModel(ctxt, tensorName) | ||
|
|
||
| for idx, shapeDim in enumerate(_buffer.shape): | ||
| shape = [_buffer.shape] if isinstance(_buffer.shape, int) else _buffer.shape |
There was a problem hiding this comment.
I think that we should identify the root cause for which this fix was needed and fix it in that location, rather than here (transforming there the shape in an enumerable, rather than here.
|
|
||
| schedule = TilingSchedule({}, {}, [], []) | ||
| repScheme = VariableReplacementScheme({}, {}) | ||
| inputLoadSchedule: List[Dict[str, HyperRectangle]] = [{}] |
There was a problem hiding this comment.
Why is this change needed?
| shape = cls._normalizedShape(buffer.shape) | ||
| outputLoadSchedule[0][addrName] = HyperRectangle((0,) * len(shape), shape) | ||
|
|
||
| schedule = TilingSchedule(inputBaseOffsets, outputBaseOffsets, inputLoadSchedule, outputLoadSchedule) |
There was a problem hiding this comment.
Same here, why is this needed? Why separate the input and output? And why change the schedule from an empty one?
| if inputShapes[1] == () or inputShapes[1] == []: | ||
| inputShapes[1] = (1,) | ||
|
|
||
| # Scalars and singletons should broadcast to the tensor operand, |
|
|
||
| def __init__(self, name: str = '', shape = [1], values = [0]): | ||
| super().__init__(name, shape, values) | ||
| # Some Neureka lowering paths inspect global constants before type inference |
There was a problem hiding this comment.
Which ones? This looks to me like an unstable patch that doesn't really solve the root cause (why they inspect before type inference?)
diaconuccalin
left a comment
There was a problem hiding this comment.
Second part of my review. Working on the third and hopefully final part :)
| node = tensor.inputs[0] | ||
| input_values = [] | ||
| for input_tensor in node.inputs: | ||
| value = self._evaluate_constant_tensor(input_tensor) |
There was a problem hiding this comment.
No good reason for having this kind of recursivity.
| def __init__(self): | ||
| super().__init__() | ||
|
|
||
| def _evaluate_constant_tensor(self, tensor: gs.Tensor): |
There was a problem hiding this comment.
What is the purpose of this function? Why do they need to be evaluated?
| return None | ||
| input_values.append(value) | ||
|
|
||
| if node.op == "Constant": |
There was a problem hiding this comment.
IMO it's bad practice to separate node-specific checks and operations like this from either points where we would previously do this association to nodes, or to node-specific functions (if an issue appears, or a future developer wants to add a new node, they will also need to modify here, and it's not easy to find this location).
|
|
||
| if ret: | ||
| self.operatorRepresentation['mode'] = node.attrs['mode'] | ||
| self.operatorRepresentation['value'] = 0 |
There was a problem hiding this comment.
Let's move this to the else of "if 'value' in node.attrs", so it's more visible
| return True | ||
|
|
||
| return ret | ||
| if len(node.inputs) in (2, 3): |
There was a problem hiding this comment.
Nitpick comment (: I think it would be cleaner and easier to understand with separate ifs (if len(input) >= 2, if len(input) >= 3, etc), like you did below
| for (uint32_t buf = 0; buf < DeeployNetwork_num_outputs; buf++) { | ||
| uint32_t count = DeeployNetwork_outputs_bytes[buf] / sizeof(OUTPUTTYPE); | ||
| printf("OUTPUT %u %u\r\n", buf, count); | ||
| for (uint32_t i = 0; i < count; i++) { |
There was a problem hiding this comment.
Is this part a debugging leftover?
| if (abs_actual > scale) { | ||
| scale = abs_actual; | ||
| } | ||
| float tolerance = FLOAT_ABS_TOL + FLOAT_REL_TOL * scale; |
There was a problem hiding this comment.
Interesting solution for having a relative error check. Let's wait for the opinion of the others as well (@runwangdl @Xeratec @Victor-Jung), if they think we should keep it like this.
| printf("Actual: %4d ", actual); | ||
| printf("Diff: %4d at Index %12u in Output %u\r\n", diff, i, buf); | ||
| } | ||
| #if ISOUTPUTFLOAT == 1 |
There was a problem hiding this comment.
why this change from if to #if?
| float tolerance = FLOAT_ABS_TOL + FLOAT_REL_TOL * scale; | ||
|
|
||
| if ((diff < -1e-4) || (diff > 1e-4) || isnan(diff)) { | ||
| if ((abs_diff > tolerance) || isnan(diff)) { |
There was a problem hiding this comment.
Same here, let's wait for other opinions.
| printf("Actual: %4d ", actual); | ||
| printf("Diff: %4d at Index %12u in Output %u\r\n", diff, i, buf); | ||
| } | ||
| #if ISOUTPUTFLOAT == 1 |
There was a problem hiding this comment.
Same here, why change from if to #if?
diaconuccalin
left a comment
There was a problem hiding this comment.
Review finished. Other than the comments I left across the files, it would be useful to have some more testing for all the modified elements. Namely:
- batchnorm with and without epsilon, with and without channel first
- larger ConvTranspose2D with tighter memory limits, to force tiling
- multiplication where B is not scalar, but has the same size as A
- ReduceLogSumExp with one and more reduction axis
- Pad and Slice, with thight mem limits for tiling
- Full 2D autoencoder and GMM models, with tighter mem limits for tiling
- MaxPool on non-square input, since you inverted the width and hight
There was a problem hiding this comment.
We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?
There was a problem hiding this comment.
We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?
There was a problem hiding this comment.
We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?
There was a problem hiding this comment.
Let's move this test in a new subdir inside DeeployTest/Tests/Kernels/FP32/Conv
There was a problem hiding this comment.
Let's move this test in a new subdir inside DeeployTest/Tests/Kernels/FP32/Conv
There was a problem hiding this comment.
Why do we have 2 different GMM models, one here, and one in DeeployTest/Tests/Models/Autoencoder2D/GMM?
| gen_args_list.extend(args.input_offset_map) | ||
|
|
||
| if tiling: | ||
| if hasattr(args, 'cores'): |
There was a problem hiding this comment.
Why is this extra pass of the cores argument needed? I already see it around line 413 of this file.
| #include "DeeployPULPMath.h" | ||
| #include "pmsis.h" | ||
|
|
||
| __attribute__((noinline, optnone)) void PULP_ConvTranspose2d_fp32_fp32_fp32_CHW( |
There was a problem hiding this comment.
optnone seems like a debugging leftover, is there a reason to keep it?
| PULPConv2DTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPFloatConv2DBindings, | ||
| tileConstraint = Conv2DTileConstraint()) | ||
|
|
||
| PULPConv2DUntiledTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPFloatConv2DBindings, |
There was a problem hiding this comment.
Non-sensical to have tiling ready bindings for untiled operation, please remove.
| PULPMaxPool2DTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPMaxPool2DBindings, | ||
| tileConstraint = MaxPoolCTileConstraint()) | ||
|
|
||
| PULPMaxPool2DUntiledTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPMaxPool2DBindings, |
There was a problem hiding this comment.
Non-sensical to have tiling ready bindings for untiled operation, please remove.
The goal of this branch is to implement a 2D autoencoder model integrated with a Gaussian Mixture Model (GMM) on the Siracusa target with Neureka support.
The implementation was developed incrementally through the following steps:
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.