Fix FluxControlNet img2img/inpainting with num_images_per_prompt>1 by Ricardo-M-L · Pull Request #13519 · huggingface/diffusers

Ricardo-M-L · 2026-04-21T03:18:03Z

What does this PR do?

FluxControlNetImg2ImgPipeline and FluxControlNetInpaintPipeline already pre-resize control_image to batch_size * num_images_per_prompt, but only reshape control_mode to [1, 1]:

if control_mode is not None:
    control_mode = torch.tensor(control_mode).to(device, dtype=torch.long)
    control_mode = control_mode.reshape([-1, 1])

Inside FluxControlNetModel the union branch feeds control_mode to nn.Embedding and concatenates the result to encoder_hidden_states:

controlnet_mode_emb = self.controlnet_mode_embedder(controlnet_mode)
encoder_hidden_states = torch.cat([controlnet_mode_emb, encoder_hidden_states], dim=1)

So when num_images_per_prompt > 1 the embedded mode stays at [1, 1, inner_dim] while encoder_hidden_states is [B*num_images_per_prompt, seq, inner_dim], and the concat fails:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.

This matches the behaviour already implemented in FluxControlNetPipeline (non-img2img), which validates that control_mode is a single int and broadcasts it to the full image batch:

if not isinstance(control_mode, int):
    raise ValueError("For `FluxControlNet`, `control_mode` should be an `int` or `None`")
control_mode = torch.tensor(control_mode).to(device, dtype=torch.long)
control_mode = control_mode.view(-1, 1).expand(control_image.shape[0], 1)

This PR brings the img2img and inpainting pipelines in line with the same validation + broadcast so num_images_per_prompt > 1 works end-to-end.

Fixes #10741

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu

`FluxControlNet[Img2Img|Inpaint]Pipeline` pre-resize `control_image` to `batch_size * num_images_per_prompt` but only reshape `control_mode` to `[1, 1]`, so the ControlNet-Union embedding emitted shape `[1, 1, inner_dim]` fails to concat with `encoder_hidden_states` of shape `[B*num_images_per_prompt, seq, inner_dim]`: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list. Mirror `FluxControlNetPipeline`: reject non-int `control_mode` for a single ControlNet, then `.view(-1, 1).expand(control_image.shape[0], 1)` so the mode broadcasts to every image in the batch. Fixes huggingface#10741

github-actions Bot added pipelines size/S PR with diff < 50 LOC labels Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix FluxControlNet img2img/inpainting with num_images_per_prompt>1#13519

Fix FluxControlNet img2img/inpainting with num_images_per_prompt>1#13519
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:fix-flux-controlnet-img2img-inpaint-multi-images

Ricardo-M-L commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ricardo-M-L commented Apr 21, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant