Conversation
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Greptile SummaryThis PR adds a new Confidence Score: 5/5Safe to merge; all remaining findings are P2 style/quality improvements with no blocking correctness issues All four findings are P2: a truncated docstring, a potential int32 dtype concern (uncertain without running against the actual TE kernel), a test coverage gap, and a minor fallthrough comment. None are definitive runtime breakages in the model itself. The core model mapping logic is sound and the test structure is reasonable. examples/pytorch/qwen3_moe/model.py — verify Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["input_ids (batch, seq_len)"] --> B["embed_tokens\n(nn.Embedding)"]
B --> C["RotaryPositionEmbedding\nfreqs"]
C --> D
subgraph LAYER["Qwen3MoeDecoderLayer (×N)"]
D["hidden_states"] --> E["te.MultiheadAttention\n(fused LN + QKV + QK-norm + RoPE + attn + O)"]
E --> F["+ residual"]
F --> G["te.RMSNorm\npost_attention_layernorm"]
G --> H
subgraph MOE["Qwen3MoeBlock"]
H["hidden_flat (tokens, hidden)"] --> I["Qwen3MoeRouter\n(softmax + top-k)"]
I --> J["moe_permute_with_probs"]
J --> K["te_ops.GroupedLinear\n(gate+up, int32 tokens_per_expert⚠)"]
K --> L["te_ops.SwiGLU"]
L --> M["te_ops.GroupedLinear\n(down)"]
M --> N["moe_unpermute\n(prob-weighted combine)"]
end
N --> O["+ residual"]
end
O --> P["te.RMSNorm\nfinal norm"]
P --> Q["te.Linear\nlm_head"]
Q --> R["logits (batch, seq_len, vocab_size)"]
Reviews (5): Last reviewed commit: "Merge branch 'main' into vibe_qwen3" | Re-trigger Greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Description
A almost pure TE module implementation of Qwen3 Moe model
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: