std: refactor the TLS implementation#126523
Conversation
|
rustbot has assigned @Mark-Simulacrum. Use |
|
The Miri subtree was changed cc @rust-lang/miri |
This comment has been minimized.
This comment has been minimized.
As discovered by Mara in rust-lang#110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with rust-lang#117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`. Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones: * the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code. * I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps. @rustbot label +A-thread-locals
ff3e202 to
f3facf1
Compare
|
The Miri subtree was changed cc @rust-lang/miri |
Mark-Simulacrum
left a comment
There was a problem hiding this comment.
I didn't review the implementations in detail (I think they are just moved, not actually changed, right?) -- left some comments inline, but most are non-blocking, just curiosity/slight improvements.
In general I'm happy to r+ with some of the nits fixed/questions answered.
| pub(super) use linux::register; | ||
| pub(super) use list::run; | ||
| } else if #[cfg(all( | ||
| target_thread_local, |
There was a problem hiding this comment.
Since this is present in both lists, should we outline it and just #[cfg] the whole destructors module? Or are we glob-importing it or something?
| #[cfi_encoding = "i"] | ||
| #[repr(transparent)] | ||
| #[allow(non_camel_case_types)] | ||
| pub struct c_int(#[allow(dead_code)] pub libc::c_int); |
There was a problem hiding this comment.
Is there a reason this doesn't use the same c_int? (libc vs core::ffi)
| //! Instead, we use one TLS key to register a callback which will run | ||
| //! iterate through the destructor list. |
There was a problem hiding this comment.
| //! Instead, we use one TLS key to register a callback which will run | |
| //! iterate through the destructor list. | |
| //! Instead, we use one TLS key to register a callback which will | |
| //! iterate through the destructor list. |
| static DTORS: StaticKey = StaticKey::new(Some(run)); | ||
|
|
||
| // Setting the key value to something other than NULL will result in the | ||
| // destructor being run at thread exit. |
There was a problem hiding this comment.
Maybe StaticKey is doing something I don't really understand (yet), but the idea that we don't have a way to register TLS destructors and yet can register a callback on thread exit doesn't really connect for me. Maybe we want to say that we can't register more than one destructor or something? (But then, how does this work when there's non-Rust code in play)
|
I updated the documentation, it should be a bit clearer now. |
|
@bors r+ |
|
@bors rollup=never Big refactor of TLS, don't want to have to guess when bisecting to this. |
This comment has been minimized.
This comment has been minimized.
|
💔 Test failed - checks-actions |
a2f078d to
50a02ed
Compare
|
I forgot that Emscripten is also in the UNIX family... |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (5a3e2a4): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResults (primary -0.1%, secondary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 692.546s -> 692.589s (0.01%) |
As discovered by Mara in #110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with #117276, I have therefore moved all of the
thread_local_key/thread_local_dtormodules to thethread_localmodule insysand merged them into a new structure, so that future porters ofstdcan simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking atsys/thread_local/mod.rs.Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
#[thread_local]static instead of in the key, eliminating one indirection layer and drastically simplifying its code.Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.
@rustbot label +A-thread-locals