Skip to content

Fix FluxControlNet img2img/inpainting with num_images_per_prompt>1#13519

Open
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:fix-flux-controlnet-img2img-inpaint-multi-images
Open

Fix FluxControlNet img2img/inpainting with num_images_per_prompt>1#13519
Ricardo-M-L wants to merge 1 commit intohuggingface:mainfrom
Ricardo-M-L:fix-flux-controlnet-img2img-inpaint-multi-images

Conversation

@Ricardo-M-L
Copy link
Copy Markdown
Contributor

What does this PR do?

FluxControlNetImg2ImgPipeline and FluxControlNetInpaintPipeline already pre-resize control_image to batch_size * num_images_per_prompt, but only reshape control_mode to [1, 1]:

if control_mode is not None:
    control_mode = torch.tensor(control_mode).to(device, dtype=torch.long)
    control_mode = control_mode.reshape([-1, 1])

Inside FluxControlNetModel the union branch feeds control_mode to nn.Embedding and concatenates the result to encoder_hidden_states:

controlnet_mode_emb = self.controlnet_mode_embedder(controlnet_mode)
encoder_hidden_states = torch.cat([controlnet_mode_emb, encoder_hidden_states], dim=1)

So when num_images_per_prompt > 1 the embedded mode stays at [1, 1, inner_dim] while encoder_hidden_states is [B*num_images_per_prompt, seq, inner_dim], and the concat fails:

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 1 for tensor number 1 in the list.

This matches the behaviour already implemented in FluxControlNetPipeline (non-img2img), which validates that control_mode is a single int and broadcasts it to the full image batch:

if not isinstance(control_mode, int):
    raise ValueError("For `FluxControlNet`, `control_mode` should be an `int` or `None`")
control_mode = torch.tensor(control_mode).to(device, dtype=torch.long)
control_mode = control_mode.view(-1, 1).expand(control_image.shape[0], 1)

This PR brings the img2img and inpainting pipelines in line with the same validation + broadcast so num_images_per_prompt > 1 works end-to-end.

Fixes #10741

Before submitting

Who can review?

@sayakpaul @yiyixuxu

`FluxControlNet[Img2Img|Inpaint]Pipeline` pre-resize `control_image` to
`batch_size * num_images_per_prompt` but only reshape `control_mode` to
`[1, 1]`, so the ControlNet-Union embedding emitted shape
`[1, 1, inner_dim]` fails to concat with `encoder_hidden_states` of
shape `[B*num_images_per_prompt, seq, inner_dim]`:

  RuntimeError: Sizes of tensors must match except in dimension 1.
  Expected size 2 but got size 1 for tensor number 1 in the list.

Mirror `FluxControlNetPipeline`: reject non-int `control_mode` for a
single ControlNet, then `.view(-1, 1).expand(control_image.shape[0], 1)`
so the mode broadcasts to every image in the batch.

Fixes huggingface#10741
@github-actions github-actions Bot added pipelines size/S PR with diff < 50 LOC labels Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pipelines size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FluxControlNetImg2ImgPipeline doesn't support generating more than one image

1 participant