Endpoint: https://planetarycomputer.microsoft.com/api/stac/v1/search
Symptom: ~70% of POST /search calls with realistic spatial+temporal parameters either time out at 30 s with no response, or eventually return HTTP 504 after urllib3 exhausts internal retries.
Status pages: Azure Service Health is green; nothing posted on microsoft/PlanetaryComputer/issues in the prior 48 h.
Reporter: steve@impactobservatory.com (Impact Observatory)
Repro (curl)
Picked a non-trivial AOI (Northern Alberta, S2-L2A summer 2021):
for i in $(seq 1 10); do
code=$(curl -s -o /dev/null -w "%{http_code}" --max-time 30 \
-H 'Content-Type: application/json' \
-X POST -d '{"collections":["sentinel-2-l2a"],
"intersects":{"type":"Point","coordinates":[-115.0,57.0]},
"datetime":"2021-06-01/2021-08-31","limit":50}' \
"https://planetarycomputer.microsoft.com/api/stac/v1/search")
echo -n "$code "
done
Two consecutive runs from a US-East egress, ~02:30–02:55 UTC 2026-05-02:
000 000 200 000 000 200 000 200 000 000 # 3 ok / 7 timeouts
000 000 200 000 000 200 # 2 ok / 4 timeouts
000 = curl timed out at 30 s with no response received from the LB. Trivial requests (e.g. GET /api/stac/v1/collections and POST /search with {"collections":["sentinel-2-l2a"],"limit":1} and no spatial/temporal filter) succeed reliably and fast (~600 ms), so the degradation appears specific to non-trivial searches that scan multiple partitions.
Application-level traceback (one of many)
From pystac_client==0.7.x driving the same query shape:
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='planetarycomputer.microsoft.com', port=443):
Max retries exceeded with url: /api/stac/v1/search
(Caused by ResponseError('too many 504 error responses'))
…
pystac_client.exceptions.APIError: HTTPSConnectionPool(host='planetarycomputer.microsoft.com', port=443):
Max retries exceeded with url: /api/stac/v1/search (Caused by ResponseError('too many 504 error responses'))
The 504s are observed both before and after urllib3's internal retry budget is exhausted, which suggests origin-side timeouts rather than LB throttling (LB throttling typically returns 429 or 503).
Impact
We hit this on a multi-tile geospatial regression workflow (Sentinel-2 → land-cover predictions). With a 5× retry-with-exponential-backoff wrapper on every pystac_client.Client.search(...) call, success probability per call rises from ~30% to ~99.8% — but tail latency adds ~2.5 minutes per affected call, and at this failure rate roughly 1 in 600 calls still bursts through the wrapper and fails the entire downstream pipeline.
Asks
- Is there an in-progress incident or capacity event we can track? (Couldn't find one anywhere public.)
- Would Microsoft consider publishing an incident-status page for the public PC STAC endpoint, similar to GitHub Status or Cloudflare Status? Right now consumers have to probe to determine health.
- If the 504s are coming from a specific backend (e.g.
pgstac instance pool), is there a known query shape we should avoid (e.g. intersects + datetime + multiple-orbit collections at once)?
Happy to share more curl/traceback samples if useful.
Endpoint:
https://planetarycomputer.microsoft.com/api/stac/v1/searchSymptom: ~70% of
POST /searchcalls with realistic spatial+temporal parameters either time out at 30 s with no response, or eventually return HTTP 504 afterurllib3exhausts internal retries.Status pages: Azure Service Health is green; nothing posted on
microsoft/PlanetaryComputer/issuesin the prior 48 h.Reporter: steve@impactobservatory.com (Impact Observatory)
Repro (curl)
Picked a non-trivial AOI (Northern Alberta, S2-L2A summer 2021):
Two consecutive runs from a US-East egress, ~02:30–02:55 UTC 2026-05-02:
000= curl timed out at 30 s with no response received from the LB. Trivial requests (e.g.GET /api/stac/v1/collectionsandPOST /searchwith{"collections":["sentinel-2-l2a"],"limit":1}and no spatial/temporal filter) succeed reliably and fast (~600 ms), so the degradation appears specific to non-trivial searches that scan multiple partitions.Application-level traceback (one of many)
From
pystac_client==0.7.xdriving the same query shape:The 504s are observed both before and after
urllib3's internal retry budget is exhausted, which suggests origin-side timeouts rather than LB throttling (LB throttling typically returns 429 or 503).Impact
We hit this on a multi-tile geospatial regression workflow (Sentinel-2 → land-cover predictions). With a 5× retry-with-exponential-backoff wrapper on every
pystac_client.Client.search(...)call, success probability per call rises from ~30% to ~99.8% — but tail latency adds ~2.5 minutes per affected call, and at this failure rate roughly 1 in 600 calls still bursts through the wrapper and fails the entire downstream pipeline.Asks
pgstacinstance pool), is there a known query shape we should avoid (e.g.intersects+datetime+ multiple-orbit collections at once)?Happy to share more curl/traceback samples if useful.