[torchlib] Reimplement as_strided without an ONNX loop by Copilot · Pull Request #2928 · microsoft/onnxscript

Copilot · 2026-06-03T18:33:49Z

aten_as_strided was lowered to a private ONNX function (_aten_as_strided_onnx) that built gather indices via an unrolled loop of Expand/Range/SequenceInsert/ConcatFromSequence ops. This graph is hard for downstream passes to constant-fold.

Since aten_as_strided is already trace_only, when size, stride, and storage_offset are concrete at trace time the indices can be computed once with NumPy and emitted as a constant. The SymInt inputs can also be dynamic (runtime values), so a second path builds the indices with ONNX ops while still avoiding any Loop/Scan.

Changes

ops/core.py: Replace the loop implementation with two paths sharing the same index math, where for each output position the storage index is storage_offset + Σ_d i_d · stride[d] and the result is Reshape(self, [-1]) + Gather:
- Static fast path (all of size/stride/storage_offset are concrete ints): fold the indices into a single constant index tensor.
- Dynamic path (any SymInt is a runtime value): build the indices with ONNX ops (Range/Mul/Unsqueeze/Add). The per-dimension contributions are unrolled at trace time since the rank is always static, so no Loop/Scan is emitted. Runtime SymInt values are assumed to be INT64 and reshaped to scalars directly, and mixed static/dynamic dimensions are supported.
- A default storage_offset=None is normalized to 0 so the dynamic path does not emit an invalid Reshape of a missing input.
ops/core.py: Remove the now-unused private _aten_as_strided_onnx function.
deduce_type_constraints_test.py: Drop _aten_as_strided_onnx from _SKIP_FUNCTIONS_WITH_LOOP_OR_SCAN, since no loop/scan remains.
tests/function_libs/torch_lib/e2e_ops_tests.py: Add end-to-end coverage for both paths — static (multi-dimensional with non-zero storage_offset, single dimension, overlapping strides, scalar/empty size) and dynamic (size derived from the input shape, with and without storage_offset).

Implementation

# Static fast path
indices = np.array(storage_offset, dtype=np.int64)
for dim, (dim_size, dim_stride) in enumerate(zip(size, stride)):
    add_value = np.arange(dim_size, dtype=np.int64) * dim_stride
    broadcast_shape = [1] * len(size)
    broadcast_shape[dim] = dim_size
    indices = indices + add_value.reshape(broadcast_shape)
self_flatten = op.Reshape(self, op.Constant(value_ints=[-1]))
result = op.Gather(self_flatten, op.Constant(value=ir.tensor(indices)))

# Dynamic path (any SymInt is a runtime value; SymInts are assumed INT64)
indices = op.Reshape(storage_offset, empty_shape)
for dim in range(rank):
    dim_size = op.Reshape(size[dim], empty_shape)
    dim_stride = op.Reshape(stride[dim], empty_shape)
    add_value = op.Mul(op.Range(zero, dim_size, one), dim_stride)
    unsqueeze_axes = [axis for axis in range(rank) if axis != dim]
    if unsqueeze_axes:
        add_value = op.Unsqueeze(add_value, op.Constant(value_ints=unsqueeze_axes))
    indices = op.Add(indices, add_value)
result = op.Gather(self_flatten, indices)

The empty-size case naturally yields a 0-d index tensor, producing a scalar output. Both paths were checked against torch.as_strided for multi-dimensional, non-zero storage_offset, single-dimension, scalar/empty-size, and mixed static/dynamic inputs.

justinchuby · 2026-06-03T18:45:58Z

@copilot the inputs marked as SymInt can be dynamic. Can you update the logic to account for this situation?

Copilot · 2026-06-03T18:57:32Z

@copilot the inputs marked as SymInt can be dynamic. Can you update the logic to account for this situation?

Added a dynamic path in 93d3247: when any of size/stride/storage_offset is a runtime value, the indices are built with ONNX ops (Range/Mul/Unsqueeze/Add) instead of a folded constant. The per-dimension contributions are still unrolled at trace time (no loop/scan) since the rank is static, and SymInt values are cast to INT64 for consistent arithmetic. The static fast path (all concrete) still folds to a single constant Gather. Verified both paths against torch.as_strided for multi-dim, non-zero storage_offset, scalar/empty-size, mixed static/dynamic, and int32/int64 SymInt inputs.

justinchuby · 2026-06-03T19:33:59Z

@copilot can you add good coverage on the new logic in tests/function_libs/torch_lib/e2e_ops_tests.py?

justinchuby · 2026-06-03T19:38:13Z

@copilot also add more detailed comment to explain what each line of the implementation is doing in relation to the torch op, as it is a more complicated op, for readability

Copilot · 2026-06-03T19:40:29Z

@copilot can you add good coverage on the new logic in tests/function_libs/torch_lib/e2e_ops_tests.py?

Added e2e coverage in tests/function_libs/torch_lib/e2e_ops_tests.py exercising both paths: static (multi-dim with non-zero storage_offset, single dim, overlapping strides, scalar/empty size) and dynamic (size derived from the input shape, with and without storage_offset). Writing the dynamic tests surfaced a bug where a default storage_offset=None reached the dynamic path and produced an invalid Reshape; fixed by normalizing None to 0. All tests pass against torch.as_strided.

Copilot · 2026-06-03T19:44:02Z

@copilot also add more detailed comment to explain what each line of the implementation is doing in relation to the torch op, as it is a more complicated op, for readability

Added detailed line-by-line comments to aten_as_strided, explaining the storage-index math in relation to torch.as_strided and annotating each step of both the static and dynamic paths.

justinchuby · 2026-06-03T20:14:25Z

+    empty_shape = op.Constant(value=ir.tensor(np.array([], dtype=np.int64)))
+    # Start the running index from storage_offset, cast to an INT64 scalar so all
+    # the arithmetic below has a consistent dtype regardless of how the SymInt
+    # runtime values are typed (e.g. int32 SymInts).


@copilot I think we can assume all symints are int64.

Removed the INT64 Cast ops from the dynamic path and now reshape the SymInt values directly, assuming they are already INT64.

Copilot

Pull request overview

This PR refactors the torchlib lowering for aten::as_strided to avoid generating an ONNX Loop/Scan when constructing gather indices, enabling simpler downstream constant-folding. It introduces a static fast path that precomputes the index tensor with NumPy when all size/stride/storage_offset values are known at trace time, and a dynamic path that builds the same index math using ONNX ops without loops.

Changes:

Reimplemented aten_as_strided in ops/core.py as a Reshape([-1]) + Gather with (1) a NumPy-constant index fast path and (2) an ONNX-op dynamic index path (no Loop/Scan).
Removed the now-unused private _aten_as_strided_onnx lowering and unblocked type-constraint deduction by removing it from the “skip loop/scan” list.
Added new E2E tests covering several as_strided scenarios (static and dynamic shapes/offsets).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`tests/function_libs/torch_lib/e2e_ops_tests.py`	Adds E2E export coverage for `torch.as_strided` across static and dynamic cases.
`onnxscript/function_libs/torch_lib/ops/core.py`	Replaces loop-based index construction with static NumPy-constant and dynamic ONNX-op paths.
`onnxscript/function_libs/tools/torch_lib/deduce_type_constraints_test.py`	Removes `_aten_as_strided_onnx` from the loop/scan skip list since it no longer exists.

+    # Start the running index from storage_offset as an INT64 scalar; SymInt
+    # runtime values are assumed to be INT64.
+    indices = op.Reshape(storage_offset, empty_shape)
+    for dim in range(rank):
+        # Reshape this dimension's size and stride to INT64 scalars.
+        dim_size = op.Reshape(size[dim], empty_shape)
+        dim_stride = op.Reshape(stride[dim], empty_shape)


Initial plan

cc9c69b

Copilot AI assigned Copilot and justinchuby Jun 3, 2026

github-project-automation Bot added this to ONNX Script Review Board Jun 3, 2026

github-project-automation Bot moved this to Todo in ONNX Script Review Board Jun 3, 2026

Copilot started work on behalf of justinchuby June 3, 2026 18:33 View session

Copilot AI linked an issue Jun 3, 2026 that may be closed by this pull request

[torchlib] Reimplement as_strided #1753

Open

[torchlib] Reimplement as_strided with static index computation

e42e168

Copilot AI changed the title ~~[WIP] Reimplement as_strided for efficiency~~ [torchlib] Reimplement as_strided without an ONNX loop Jun 3, 2026

Copilot finished work on behalf of justinchuby June 3, 2026 18:42

Copilot AI requested a review from justinchuby June 3, 2026 18:42

Copilot started work on behalf of justinchuby June 3, 2026 18:46 View session

[torchlib] Handle dynamic SymInt inputs in as_strided

93d3247

Copilot finished work on behalf of justinchuby June 3, 2026 18:58

justinchuby reviewed Jun 3, 2026

View reviewed changes

Comment thread onnxscript/function_libs/torch_lib/ops/core.py Outdated

Apply suggestion from @justinchuby

bd89878

Copilot started work on behalf of justinchuby June 3, 2026 19:34 View session

[torchlib] Add e2e coverage for as_strided; handle None storage_offset

9b32501

Copilot finished work on behalf of justinchuby June 3, 2026 19:41

Copilot AI requested a review from justinchuby June 3, 2026 19:41

Copilot started work on behalf of justinchuby June 3, 2026 19:41 View session

Add detailed comments to aten_as_strided implementation

6a15666

Copilot finished work on behalf of justinchuby June 3, 2026 19:44

justinchuby reviewed Jun 3, 2026

View reviewed changes

Copilot started work on behalf of justinchuby June 3, 2026 20:14 View session

justinchuby requested a review from Copilot June 3, 2026 20:16

justinchuby added the module: torchlib Related to the torch/aten function lib in development label Jun 3, 2026

Assume INT64 SymInts in as_strided dynamic path

72e1de5

Copilot started reviewing on behalf of justinchuby June 3, 2026 20:16 View session

Copilot finished work on behalf of justinchuby June 3, 2026 20:17

Copilot AI requested a review from justinchuby June 3, 2026 20:17

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torchlib] Reimplement as_strided without an ONNX loop#2928

[torchlib] Reimplement as_strided without an ONNX loop#2928
Copilot wants to merge 7 commits into
mainfrom
copilot/torchlib-reimplement-as-strided

Copilot AI commented Jun 3, 2026 •

edited

Loading

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

justinchuby Jun 3, 2026

Uh oh!

Copilot AI Jun 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Implementation

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

justinchuby commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

Copilot AI commented Jun 3, 2026

Uh oh!

justinchuby Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 3, 2026 •

edited

Loading