@ares1025 reports that one box ceased to return usable data when querying the operational DS (either from the CLI or via RESTCONF). The initial sign of trouble appears to be a callback in veliad-system which crashes with the following error:
sysrepo[9088]: sysrepo-cpp: User callback threw an exception: [org.freedesktop.DBus.Error.UnknownObject] Unknown object '/org/freedesktop/timedate1'.
veliad-system[9088]: terminate called after throwing an instance of 'sdbus::Error'
veliad-system[9088]: what(): [org.freedesktop.DBus.Error.UnknownObject] Unknown object '/org/freedesktop/timedate1'.
...and this BT:
#5 0xb6ae02f8 in std::terminate() () from /lib/libstdc++.so.6
#6 0xb6d14f18 in sysrepo::(anonymous namespace)::handleExceptionFromCb (ex=warning: could not convert 'std::exception' from the host encoding (ANSI_X3.4-1968) to UTF-32.
This normally should not happen, please file a bug report.
..., exceptionHandler=<optimized out>) at /home/ci/build/build/sysrepo-cpp-custom/src/Subscription.cpp:79
#7 0xb6d29a84 in sysrepo::(anonymous namespace)::operGetItemsCb (session=<optimized out>, subscriptionId=3032467628, moduleName=<optimized out>, subXPath=<optimized out>, requestXPath=0xcc1228 "/*", requestId=144137417, requestId@entry=3067800712, parent=parent@entry=0xb4bfc9a8, privateData=0xd009c8) at /home/ci/build/build/sysrepo-cpp-custom/src/Subscription.cpp:126
#8 0xb6d8f208 in sr_shmsub_oper_get_listen_process_module_events (oper_get_subs=0xcf3668, conn=0xd009c8) at /home/ci/build/build/sysrepo-custom/src/shm_sub.c:3905
#9 0xb6d4ae48 in sr_subscription_process_events (subscription=subscription@entry=0xce6fb8, session=session@entry=0x0, wake_up_in=0xb4bfca48, wake_up_in@entry=0xb4bfca40) at /home/ci/build/build/sysrepo-custom/src/sysrepo.c:5752
#10 0xb6d916a0 in sr_shmsub_listen_thread (arg=0xce6fb8) at /home/ci/build/build/sysrepo-custom/src/shm_sub.c:4994
#11 0xb68f4694 in ?? () from /lib/libc.so.6
This "should not happen" and we can fix this particular error, but this is only when it starts getting interesting. Since the failing thing is an "pull" operational callback, it triggers failures in other processes which try to push to the operational DS:
sysrepo[9106]: Locking a rwlock failed (sr_shmmod_lock: Connection timed out), read lock held by running process 9106 (CID 86), running process 2238 (CID 87).
cla-sysrepod[9080]: terminate called after throwing an instance of 'sysrepo::ErrorWithCode'
cla-sysrepod[9080]: what(): Session::applyChanges: Couldn't apply changes: SR_ERR_TIME_OUT
cla-sysrepod[9080]: Locking a rwlock failed (sr_shmmod_lock: Connection timed out), read lock held by running process 9106 (CID 86), running process 2238 (CID 87). (SR_ERR_TIME_OUT)
sysrepo[9080]: Locking a rwlock failed (sr_shmmod_lock: Connection timed out), read lock held by running process 9106 (CID 86), running process 2238 (CID 87).
veliad-hardware[9106]: terminate called after throwing an instance of 'sysrepo::ErrorWithCode'
veliad-hardware[9106]: what(): Session::applyChanges: Couldn't apply changes: SR_ERR_TIME_OUT
veliad-hardware[9106]: Locking a rwlock failed (sr_shmmod_lock: Connection timed out), read lock held by running process 9106 (CID 86), running process 2238 (CID 87). (SR_ERR_TIME_OUT)
...and also (as expected), there's quite some slowdown for other processes (such as this warning from velia-health.service which manages to recover just fine, though):
main[348]: [PERFORMANCE][TOO_SLOW] addResourcesToInventory/applyChanges 4300ms
main[348]: [PERFORMANCE][TOO_SLOW] void velia::alarms::addResourcesToInventory(sysrepo::Session, const std::map<std::__cxx11::basic_string<char>, std::vector<std::__cxx11::basic_string<char> > >&) 4312ms
Here's a bunch of backtraces, such as from cla-sysrepo:
#5 0xb6a5e2f8 in std::terminate() () from /lib/libstdc++.so.6
#6 0xb6a6a238 in __cxa_throw () from /lib/libstdc++.so.6
#7 0xb6ef79d8 in sysrepo::throwIfError (code=code@entry=13, msg=..., c_session=<optimized out>) at /home/ci/build/build/sysrepo-cpp-custom/src/utils/exception.cpp:41
#8 0xb6ee53a4 in sysrepo::Session::applyChanges (this=0x1346f00, timeout=...) at /home/ci/build/per-package/sysrepo-cpp/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/shared_ptr_base.h:1672
#9 0x0045771c in operator() (__closure=<optimized out>, data=...) at /home/ci/build/build/cla-sysrepo-custom/src/sysrepo/Daemon.cpp:419
#10 0x00499180 in std::function<void(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, long long, unsigned long long, int, unsigned int, short, unsigned short, signed char, unsigned char, cla::Binary, cla::Enum, cla::IdentityRef, cla::DeletedProperty, cla::LeafListAnyXml, cla::Decimal64, cla::Empty>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::variant<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, double, long long, unsigned long long, int, unsigned int, short, unsigned short, signed char, unsigned char, cla::Binary, cla::Enum, cla::IdentityRef, cla::DeletedProperty, cla::LeafListAnyXml, cla::Decimal64, cla::Empty> > > > const&)>::operator() (this=0x13897ec, __args#0=...) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/std_function.h:593
#11 cla::appliance::bidi::DualAmp::poll (this=this@entry=0x13897d0) at /home/ci/build/build/cla-sysrepo-custom/src/appliances/BidiAmp.cpp:97
#12 0x0049c25c in cla::appliance::bidi::DualCPlus1572::poll (this=0x13897d0) at /home/ci/build/build/cla-sysrepo-custom/src/appliances/BidiAmp.cpp:176
#13 0x005991a0 in std::__invoke_impl<void, void (cla::driver::TopDriver::* const&)(), std::shared_ptr<cla::driver::TopDriver>&> (__f=<optimized out>, __t=<synthetic pointer>...) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/invoke.h:73
#14 std::__invoke<void (cla::driver::TopDriver::* const&)(), std::shared_ptr<cla::driver::TopDriver>&> (__fn=<optimized out>) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/invoke.h:98
#15 std::invoke<void (cla::driver::TopDriver::* const&)(), std::shared_ptr<cla::driver::TopDriver>&> (__fn=<optimized out>) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/functional:122
#16 operator() (__closure=<optimized out>) at /home/ci/build/build/cla-sysrepo-custom/src/drivers/Events.cpp:44
#17 std::__invoke_impl<void, cla::driver::interface::Registrations::registerTimer(std::weak_ptr<cla::driver::TopDriver>, cla::driver::driver_callback_t, cla::driver::millisecs_t) const::<lambda()>&> (__f=...) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/invoke.h:63
#18 std::__invoke_r<void, cla::driver::interface::Registrations::registerTimer(std::weak_ptr<cla::driver::TopDriver>, cla::driver::driver_callback_t, cla::driver::millisecs_t) const::<lambda()>&> (__fn=...) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/invoke.h:113
#19 std::_Function_handler<void(), cla::driver::interface::Registrations::registerTimer(std::weak_ptr<cla::driver::TopDriver>, cla::driver::driver_callback_t, cla::driver::millisecs_t) const::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /home/ci/build/per-package/cla-sysrepo/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/std_function.h:292
#20 0xb6e312f4 in ev_invoke_pending () from /lib/libev.so.4
#21 0xb6e34d10 in ev_run () from /lib/libev.so.4
#22 0xb6a97f3c in ?? () from /lib/libstdc++.so.6
#23 0xb6872694 in ?? () from /lib/libc.so.6
Another one in veliad-hardware:
#5 0xb6aaf2f8 in std::terminate() () from /lib/libstdc++.so.6
#6 0xb6abb238 in __cxa_throw () from /lib/libstdc++.so.6
#7 0xb6c509d8 in sysrepo::throwIfError (code=code@entry=13, msg=..., c_session=<optimized out>) at /home/ci/build/build/sysrepo-cpp-custom/src/utils/exception.cpp:41
#8 0xb6c3e3a4 in sysrepo::Session::applyChanges (this=0xb3ffe7a0, timeout=...) at /home/ci/build/per-package/sysrepo-cpp/host/opt/ext-toolchain/arm-buildroot-linux-gnueabihf/include/c++/15.1.0/bits/shared_ptr_base.h:1672
#9 0x004b0428 in velia::utils::valuesPush (session=..., values=..., foreignRemovals=..., ourRemovals=...) at /home/ci/build/build/velia-custom/src/utils/sysrepo.cpp:134
#10 0x00486af4 in operator() (__closure=<optimized out>) at /home/ci/build/build/velia-custom/src/ietf-hardware/sysrepo/Sysrepo.cpp:162
#11 0xb6ae8f3c in ?? () from /lib/libstdc++.so.6
#12 0xb68c5694 in ?? () from /lib/libc.so.6
We should investigate why that D-Bus service disappeared in the first place, but apart from that, let's also take a look at all the pull operational CBs, and make them more robust.
@ares1025 reports that one box ceased to return usable data when querying the
operationalDS (either from the CLI or via RESTCONF). The initial sign of trouble appears to be a callback inveliad-systemwhich crashes with the following error:...and this BT:
This "should not happen" and we can fix this particular error, but this is only when it starts getting interesting. Since the failing thing is an "pull" operational callback, it triggers failures in other processes which try to push to the
operationalDS:...and also (as expected), there's quite some slowdown for other processes (such as this warning from
velia-health.servicewhich manages to recover just fine, though):Here's a bunch of backtraces, such as from
cla-sysrepo:Another one in
veliad-hardware:We should investigate why that D-Bus service disappeared in the first place, but apart from that, let's also take a look at all the pull operational CBs, and make them more robust.