Product and version
- Extension: Windows AI Studio / Foundry Toolkit
1.4.3 (VS Code extension)
- OS: Windows
- Hardware: Intel Core Ultra 7, NVIDIA RTX GPU, Intel Arc GPU, Intel NPU
Summary
When Foundry catalog refresh is throttled or fails, the model dropdown falls back to an older built-in list and excludes models that are already downloaded locally.
After that, requests fail with Model not found even though local runtime endpoints still expose at least one usable local model.
Impact
- Previously available local models disappear from the UI after catalog failures.
- Inference requests fail with
Model not found errors.
- Users cannot reliably continue offline/local-only workflows during catalog outages or throttling.
- The experience appears broken even when local model assets are present and usable.
Observed behavior
- Catalog refresh repeatedly returns
429 TooManyRequests / QuotaExceeded from Azure Foundry catalog.
- Model picker appears to reset to fallback/built-in entries and/or stale selection.
- Requests fail with:
Model qwen3.5-2b-cuda-gpu:2 not found
- Local service remains reachable and returns model data:
GET http://localhost:5272/v1/models returns qwen2.5-coder-1.5b-instruct-cuda-gpu:4
GET http://localhost:5272/openai/models returns qwen2.5-coder-1.5b-instruct-cuda-gpu:4 and qwen3.5-2b-cuda-gpu:2
GET http://localhost:5272/foundry/list returns []
- Additional parsing/runtime incompatibility appears for some newer models (for example qwen3.5 config parsing/model type), but the core issue is fallback not preserving known downloaded models.
Expected behavior
- If catalog refresh fails, fallback should include all successfully downloaded local models discovered from local storage and/or local runtime endpoints.
- The model picker should remain functional with local-only models, without requiring successful catalog calls.
- Previously selected model should only be invalidated if local runtime confirms it is unavailable.
- User should receive a clear warning that catalog is unavailable and local fallback is being used.
Requested fix
Please add an automatic fallback merge strategy:
- On catalog failure, build model list from local sources first:
- downloaded model registry on disk
- local runtime model endpoints
- Merge with cached catalog entries when available, instead of replacing local list.
- Never drop downloaded local models from picker due to remote catalog errors.
- If selected model id is missing from catalog but present locally, allow local resolution and execution.
- Add a visible status banner:
Catalog unavailable, showing local models only.
- Add telemetry for fallback path usage and count of locally recovered models.
Representative errors seen
Received too many requests in a short amount of time. Retry again after 1 seconds.
Failed: Fetching model list from Foundry Catalog
Model qwen3.5-2b-cuda-gpu:2 not found
Repro steps
- Download one or more local models in Foundry Toolkit.
- Open model picker and trigger refresh while catalog is throttled (
429).
- Observe model list reset/fallback behavior.
- Attempt inference with previously selected local model id.
- Observe
Model not found despite local assets being present.
Workaround today
- Manually reselect a known-good local model id that still resolves in local runtime.
- Avoid repeated catalog refresh attempts while
429 is active.
- Workaround is not obvious and is unreliable for normal users.
Product and version
1.4.3(VS Code extension)Summary
When Foundry catalog refresh is throttled or fails, the model dropdown falls back to an older built-in list and excludes models that are already downloaded locally.
After that, requests fail with
Model not foundeven though local runtime endpoints still expose at least one usable local model.Impact
Model not founderrors.Observed behavior
429 TooManyRequests / QuotaExceededfrom Azure Foundry catalog.Model qwen3.5-2b-cuda-gpu:2 not foundGET http://localhost:5272/v1/modelsreturnsqwen2.5-coder-1.5b-instruct-cuda-gpu:4GET http://localhost:5272/openai/modelsreturnsqwen2.5-coder-1.5b-instruct-cuda-gpu:4andqwen3.5-2b-cuda-gpu:2GET http://localhost:5272/foundry/listreturns[]Expected behavior
Requested fix
Please add an automatic fallback merge strategy:
Catalog unavailable, showing local models only.Representative errors seen
Received too many requests in a short amount of time. Retry again after 1 seconds.Failed: Fetching model list from Foundry CatalogModel qwen3.5-2b-cuda-gpu:2 not foundRepro steps
429).Model not founddespite local assets being present.Workaround today
429is active.