Skip to content

FIX: Use subprocess isolation for macOS torch tests instead of skipping#4430

Closed
Heyy-Himanshuu wants to merge 1 commit into
shap:masterfrom
Heyy-Himanshuu:fix/macos-torch-subprocess-isolation
Closed

FIX: Use subprocess isolation for macOS torch tests instead of skipping#4430
Heyy-Himanshuu wants to merge 1 commit into
shap:masterfrom
Heyy-Himanshuu:fix/macos-torch-subprocess-isolation

Conversation

@Heyy-Himanshuu

Copy link
Copy Markdown

fixes #4075

On macOS, 17 torch-related tests across 5 files are completely skipped due to a known upstream PyTorch/OpenMP segfault (pytorch/pytorch#121101). When run without skipping, the tests either segfault or hang indefinitely, killing the entire CI suite.

This PR adds subprocess isolation to tests/conftest.py that:

  • Detects skipif markers referencing GH TRACKER: MacOS segmentation fault for torch test #4075 at collection time via pytest_collection_modifyitems
  • On macOS, replaces them with a subprocess_isolation marker
  • Runs those tests in a fresh subprocess (subprocess.run) with a 5-minute timeout
  • Reports segfaults/timeouts as FAILED with diagnostics instead of hanging the suite
  • Uses an environment variable (_SHAP_SUBPROCESS_CHILD) to prevent recursion in the child process

No test files are modified — the hook intercepts markers dynamically at runtime. On Linux/Windows, the hooks are no-ops and tests run normally as before.

Why not pytest-forked?

pytest-forked uses os.fork(), which is unsafe with multithreaded C libraries on macOS and can make the problem worse.

How it works

Step Parent process Child subprocess
Collection _SHAP_SUBPROCESS_CHILD unset → replaces skipif with subprocess_isolation _SHAP_SUBPROCESS_CHILD=1 → removes skipif so test runs in-process
Execution Hook wrapper spawns subprocess.run(...) Test runs directly with clean OpenMP state
Recursion? No — child has no subprocess_isolation marker No — runs in-process normally

Checklist

  • All pre-commit checks pass.
  • Unit tests added (if fixing a bug or adding a new feature)

On macOS, torch tests segfault due to an OpenMP/libomp conflict between
PyTorch and LightGBM (upstream: pytorch/pytorch#121101). Previously,
all 17 affected tests were skipped entirely on Darwin, resulting in zero
torch test coverage on macOS CI.

This commit adds subprocess isolation to conftest.py that:
- Detects skipif markers referencing GH #4075 at collection time
- Replaces them with a subprocess_isolation marker on macOS
- Runs those tests in a fresh subprocess (clean OpenMP state)
- Reports segfaults/timeouts as FAILED instead of hanging the suite
- Uses an environment variable to prevent recursion in the child

No test files are modified — the hook intercepts markers dynamically.

Closes #4075

@CloseChoice CloseChoice left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and would be great if we could get the tests back to work, but I would prefer a decorator for the affected tests, something like:

@subprocess_isolated(reason="torch/OpenMP segfault on macOS, GH #4075")

The pytest magic is overly complex and error-prone.

@daidahao

Copy link
Copy Markdown
Contributor

Thanks for the PR and would be great if we could get the tests back to work, but I would prefer a decorator for the affected tests, something like:

@subprocess_isolated(reason="torch/OpenMP segfault on macOS, GH #4075")

The pytest magic is overly complex and error-prone.

@CloseChoice Hi, I believe the failing macOS tests have been fixed by #4545.

@Heyy-Himanshuu Heyy-Himanshuu closed this by deleting the head repository Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TRACKER: MacOS segmentation fault for torch test

3 participants