Add 2D Autoencoder and GMM Integration on Siracusa Target by Aldrago98 · Pull Request #190 · pulp-platform/Deeploy

Aldrago98 · 2026-05-06T15:17:15Z

The goal of this branch is to implement a 2D autoencoder model integrated with a Gaussian Mixture Model (GMM) on the Siracusa target with Neureka support.

The implementation was developed incrementally through the following steps:

initial support for a generic target;
porting to Siracusa without tiling;
introduction of tiled support on Siracusa;
final integration with Siracusa + Neureka.

Added

Changed

Fixed

PR Merge Checklist

The PR is rebased on the latest devel commit and pointing to devel.
Your PR reviewed and approved.
All checks are passing.
The CHANGELOG.md file has been updated.
If the docker was modified, change back its link after review.

diaconuccalin

Submitting the first third part of the review, I will finish going through the rest of the files as soon as possible.

diaconuccalin · 2026-05-13T15:09:49Z

I think that these changes were made to adjust your local work env, and should not be pushed to main. Please revert them.

diaconuccalin · 2026-05-13T15:10:14Z

The only changes to this file are blank lines. Please revert them completely.

diaconuccalin · 2026-05-13T15:23:36Z

    BatchNorm_fp32(
        ${data_in}, ${scale}, ${bias}, ${mean}, ${variance},
-        ${data_out}, ${batch_size}, ${channel_size}, ${window_size}
+        ${data_out}, ${batch_size}, ${channel_size}, ${window_size}, ${epsilon}, ${channels_first}


Let's add tests for this new feature (one test with channels_first 0, another with channels_first 1, and one more with non-defaul epsilon value)

diaconuccalin · 2026-05-20T11:26:11Z

Let's revert all the changes in this file. It seems to me that it was a temporary fix to assign the n_cores to a new variable, which is not needed anymore, so no need for the new variable either.

diaconuccalin · 2026-05-20T11:34:36Z

            if engine is not None:
                node.attrs["engine"] = engine.name
+                if hasattr(engine, "n_cores"):
+                    node.attrs["n_cores"] = engine.n_cores


This is not a good approach IMO. The number of cores is not a node attribute (conceptually, the node attributes should follow the ones that exist in the real ONNX nodes). Plus, this issue of passing the information about the number of cores should already be solved, and the value should already exist in the operator representation, it's passed here. If this value doesn't get passed in your case, we should identify the root cause.

Looking a little more into it, maybe you need to add NeurekaEngine in the list here.

diaconuccalin · 2026-05-20T12:45:10Z

            tilerModel.addTensorDimToModel(ctxt, tensorName)

-            for idx, shapeDim in enumerate(_buffer.shape):
+            shape = [_buffer.shape] if isinstance(_buffer.shape, int) else _buffer.shape


I think that we should identify the root cause for which this fix was needed and fix it in that location, rather than here (transforming there the shape in an enumerable, rather than here.

diaconuccalin · 2026-05-20T12:45:40Z

-
-        schedule = TilingSchedule({}, {}, [], [])
        repScheme = VariableReplacementScheme({}, {})
+        inputLoadSchedule: List[Dict[str, HyperRectangle]] = [{}]


Why is this change needed?

diaconuccalin · 2026-05-20T12:49:39Z

+            shape = cls._normalizedShape(buffer.shape)
+            outputLoadSchedule[0][addrName] = HyperRectangle((0,) * len(shape), shape)
+
+        schedule = TilingSchedule(inputBaseOffsets, outputBaseOffsets, inputLoadSchedule, outputLoadSchedule)


Same here, why is this needed? Why separate the input and output? And why change the schedule from an empty one?

diaconuccalin · 2026-05-20T12:53:10Z

        if inputShapes[1] == () or inputShapes[1] == []:
            inputShapes[1] = (1,)

+        # Scalars and singletons should broadcast to the tensor operand,


diaconuccalin · 2026-05-20T12:55:37Z


+    def __init__(self, name: str = '', shape = [1], values = [0]):
+        super().__init__(name, shape, values)
+        # Some Neureka lowering paths inspect global constants before type inference


Which ones? This looks to me like an unstable patch that doesn't really solve the root cause (why they inspect before type inference?)

diaconuccalin

Second part of my review. Working on the third and hopefully final part :)

diaconuccalin · 2026-05-20T13:40:18Z

+        node = tensor.inputs[0]
+        input_values = []
+        for input_tensor in node.inputs:
+            value = self._evaluate_constant_tensor(input_tensor)


No good reason for having this kind of recursivity.

diaconuccalin · 2026-05-20T13:41:11Z

    def __init__(self):
        super().__init__()

+    def _evaluate_constant_tensor(self, tensor: gs.Tensor):


What is the purpose of this function? Why do they need to be evaluated?

diaconuccalin · 2026-05-20T13:42:24Z

+                return None
+            input_values.append(value)
+
+        if node.op == "Constant":


IMO it's bad practice to separate node-specific checks and operations like this from either points where we would previously do this association to nodes, or to node-specific functions (if an issue appears, or a future developer wants to add a new node, they will also need to modify here, and it's not easy to find this location).

diaconuccalin · 2026-05-20T13:44:09Z


        if ret:
            self.operatorRepresentation['mode'] = node.attrs['mode']
+            self.operatorRepresentation['value'] = 0


Let's move this to the else of "if 'value' in node.attrs", so it's more visible

diaconuccalin · 2026-05-20T13:52:02Z

+                return True

-        return ret
+            if len(node.inputs) in (2, 3):


Nitpick comment (: I think it would be cleaner and easier to understand with separate ifs (if len(input) >= 2, if len(input) >= 3, etc), like you did below

diaconuccalin · 2026-05-22T14:41:37Z

+    for (uint32_t buf = 0; buf < DeeployNetwork_num_outputs; buf++) {
+      uint32_t count = DeeployNetwork_outputs_bytes[buf] / sizeof(OUTPUTTYPE);
+      printf("OUTPUT %u %u\r\n", buf, count);
+      for (uint32_t i = 0; i < count; i++) {


Is this part a debugging leftover?

diaconuccalin · 2026-05-22T14:43:56Z

+      if (abs_actual > scale) {
+        scale = abs_actual;
+      }
+      float tolerance = FLOAT_ABS_TOL + FLOAT_REL_TOL * scale;


Interesting solution for having a relative error check. Let's wait for the opinion of the others as well (@runwangdl @Xeratec @Victor-Jung), if they think we should keep it like this.

diaconuccalin · 2026-05-22T14:44:11Z

-          printf("Actual: %4d  ", actual);
-          printf("Diff: %4d at Index %12u in Output %u\r\n", diff, i, buf);
-        }
+#if ISOUTPUTFLOAT == 1


why this change from if to #if?

diaconuccalin · 2026-05-22T14:44:32Z

+      float tolerance = FLOAT_ABS_TOL + FLOAT_REL_TOL * scale;

-      if ((diff < -1e-4) || (diff > 1e-4) || isnan(diff)) {
+      if ((abs_diff > tolerance) || isnan(diff)) {


Same here, let's wait for other opinions.

diaconuccalin · 2026-05-22T14:44:42Z

-          printf("Actual: %4d  ", actual);
-          printf("Diff: %4d at Index %12u in Output %u\r\n", diff, i, buf);
-        }
+#if ISOUTPUTFLOAT == 1


Same here, why change from if to #if?

diaconuccalin

Review finished. Other than the comments I left across the files, it would be useful to have some more testing for all the modified elements. Namely:

batchnorm with and without epsilon, with and without channel first
larger ConvTranspose2D with tighter memory limits, to force tiling
multiplication where B is not scalar, but has the same size as A
ReduceLogSumExp with one and more reduction axis
Pad and Slice, with thight mem limits for tiling
Full 2D autoencoder and GMM models, with tighter mem limits for tiling
MaxPool on non-square input, since you inverted the width and hight

diaconuccalin · 2026-05-22T15:02:39Z

We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?

diaconuccalin · 2026-05-22T15:02:43Z

We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?

diaconuccalin · 2026-05-22T15:02:47Z

We already have a Conv2D test in DeeployTest/Tests/Kernels/FP32/Conv/Regular_2D, why the need for this one?

diaconuccalin · 2026-05-22T15:03:21Z

Let's move this test in a new subdir inside DeeployTest/Tests/Kernels/FP32/Conv

diaconuccalin · 2026-05-22T15:03:37Z

Let's move this test in a new subdir inside DeeployTest/Tests/Kernels/FP32/Conv

diaconuccalin · 2026-05-22T15:07:42Z

Why do we have 2 different GMM models, one here, and one in DeeployTest/Tests/Models/Autoencoder2D/GMM?

diaconuccalin · 2026-05-22T15:17:40Z

        gen_args_list.extend(args.input_offset_map)

    if tiling:
+        if hasattr(args, 'cores'):


Why is this extra pass of the cores argument needed? I already see it around line 413 of this file.

diaconuccalin · 2026-05-22T15:44:49Z

+#include "DeeployPULPMath.h"
+#include "pmsis.h"
+
+__attribute__((noinline, optnone)) void PULP_ConvTranspose2d_fp32_fp32_fp32_CHW(


optnone seems like a debugging leftover, is there a reason to keep it?

diaconuccalin · 2026-05-22T15:58:15Z

 PULPConv2DTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPFloatConv2DBindings,
                                                        tileConstraint = Conv2DTileConstraint())

+PULPConv2DUntiledTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPFloatConv2DBindings,


Non-sensical to have tiling ready bindings for untiled operation, please remove.

diaconuccalin · 2026-05-22T15:58:21Z

 PULPMaxPool2DTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPMaxPool2DBindings,
                                                           tileConstraint = MaxPoolCTileConstraint())

+PULPMaxPool2DUntiledTilingReadyBindings = TilingReadyNodeBindings(nodeBindings = PULPMaxPool2DBindings,


Non-sensical to have tiling ready bindings for untiled operation, please remove.

francesco.aldrigo.e and others added 24 commits May 6, 2026 16:49

Fix Debugger

9760e89

track files

07f4f94

ConvTranspose2D generic

d721076

fix batch norm dimension

352a7eb

model 2d

3e0bbb5

Fix Batch norm Dimension

3f0aa2a

costant padding support

8f2341b

float notation fix

b1e7efb

ConvTranspose2D

3ced973

Memory Allocation

95f93ed

fix Tiling

1c790a9

Siracusa Workaround

c69112c

Warning Fix

4bc5086

Format Folder

63a8185

Fix ConvTranspose2D Kernel

7edd22c

Numeric Errors

1c5a406

GMM Model

13a3524

small error

98be957

last Update

85ad725

MicroBlocchi

633356f

Autoencoder2D + Neureka

5ff6727

GMM + Collapsed Padding

287fb76

formatting

07c1f47

New Model

3169a59

diaconuccalin assigned Aldrago98 May 6, 2026

diaconuccalin added the Feature Addition of new features label May 6, 2026

Aldrago98 changed the title ~~FIORIRE2~~ Add 2D Autoencoder and GMM Integration on Siracusa Target May 6, 2026

Aldrago98 added 3 commits May 13, 2026 11:09

folder refactor

58e93e4

Merge branch 'pulp-platform:devel' into FIORIRE2

50fed19

Launch.json and cmake fix

02efee2

Aldrago98 added 3 commits May 13, 2026 12:15

launch.json import fix

2e472cc

Engine neureka fix

612e0dd

N-Eureka operations

29c57b2

diaconuccalin requested changes May 20, 2026

View reviewed changes

diaconuccalin requested changes May 22, 2026

View reviewed changes

Conversation

Aldrago98 commented May 6, 2026

Added

Changed

Fixed

PR Merge Checklist

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

diaconuccalin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees