Mark Waiter m_cv as guarded by m_mutex#295
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please copy-paste |
ryanofsky
left a comment
There was a problem hiding this comment.
Code review ACK fa35501. Thanks for the fix!
re: #295 (comment)
This allows to catch possible deadlock bugs at compile time.
I think PR description should be clearer that this does fix a specific deadlock not just "possible deadlock bugs", so I might edit it before merging (hope you don't mind)
| [&running, &tc, i](auto&& results) { | ||
| assert(results.getResult() == static_cast<int32_t>(100 * (i+1))); | ||
| running -= 1; | ||
| Lock lock(tc.waiter->m_mutex); |
There was a problem hiding this comment.
In commit "Mark Waiter m_cv as guarded by m_mutex" (fa35501)
Note if we wanted to make this more optimal and avoid the "hurry up and wait" scenario described in https://en.cppreference.com/cpp/thread/condition_variable/notify_one, we would unlock before notifying as recommended there, which the MP_GUARDED_BY annotation prevents:
--- a/test/mp/test/test.cpp
+++ b/test/mp/test/test.cpp
@@ -511,6 +511,7 @@ KJ_TEST("Make simultaneous IPC calls on single remote thread")
assert(results.getResult() == static_cast<int32_t>(100 * (i+1)));
running -= 1;
Lock lock(tc.waiter->m_mutex);
+ lock.unlock();
tc.waiter->m_cv.notify_all();
}));
}Still worth having the annotation here, but this shows why the annotation could be undesirable in other cases.
This change fixes a lost wakeup bug in the "Make simultaneous IPC calls on single remote thread" test which is probably responsible for the hang reported bitcoin/bitcoin#35491. The bug has existed since the test was introduced in 1238170.
The change adds an
MP_GUARDED_BY(m_mutex)annotation toWaiter::m_cv, and locks the mutex before notifying the condition variable in the test to satisfy the guarded-by requirement.Locking the mutex prevents a hang in the test that could happen because if there is no lock between the
running -= 1state update and notifying the condition variable, there is no guarantee that the test is actively waiting on the condition variable when the notify call is made, so the notify might do nothing, and the test could wait forever.(edited by ryanofsky)