Skip to content

Fix conflicting attrs during concat#561

Draft
gtrevisan wants to merge 2 commits into
devfrom
glt/fix-concat-attrs
Draft

Fix conflicting attrs during concat#561
gtrevisan wants to merge 2 commits into
devfrom
glt/fix-concat-attrs

Conversation

@gtrevisan
Copy link
Copy Markdown
Member

@gtrevisan gtrevisan commented May 28, 2026

I realized that the upcoming py3.14 spawn/forkserver issues will affect my LRU-cached get_metadata method.
with the new python version a slightly different time will be returned for each process and therefore xarray will flag attribute conflicts.

the LRU cache was supposed to prevent this, as the metadata should be resolved (and its cache should be primed) as a first thing by the logger:

metadata = get_metadata()

as a dirty fix, we can drop conflicts while concatenating and reapply them on the final product, here:

ds = xr.concat(self.results.values(), dim="idx", combine_attrs="no_conflicts")

@samc24 feel free to dig around whether there is a better way to solve this!
or LMK if you cannot replicate it -- any multiproc multishot workflow should do, including the testing ones.

eg:

ERROR tests/test_output_setting.py::test_output_exists
xarray.structure.merge.MergeError: combine_attrs='no_conflicts',
but some values are not the same. Merging
{ [...] 'time': '2026-05-28T15:23:11.087869', [...] }
with
{ [...] 'time': '2026-05-28T15:23:11.430584', [...] }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses xarray concatenation failures caused by conflicting dataset attributes (notably workflow metadata like a per-process timestamp) in multi-process workflows, which becomes more prominent under spawn/forkserver process start methods.

Changes:

  • Import get_metadata into disruption_py/settings/output_setting.py.
  • Change xr.concat(..., combine_attrs=...) from "no_conflicts" to "drop_conflicts" in DatasetOutputSetting.concat().
  • Re-apply workflow metadata to the final concatenated dataset via ds.attrs.update(get_metadata()).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +328 to +329
ds = xr.concat(self.results.values(), dim="idx", combine_attrs="drop_conflicts")
ds.attrs.update(get_metadata())
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment on lines +328 to +329
ds = xr.concat(self.results.values(), dim="idx", combine_attrs="drop_conflicts")
ds.attrs.update(get_metadata())
Comment on lines 97 to 100
for ds_list in [dict_out.values(), [dt.to_dataset() for dt in dt_out.values()]]:
xr.testing.assert_identical(
xr.testing.assert_equal(
ds_out, xr.concat(ds_list, dim="idx").sortby(["shot", "time"])
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants