Throttled problem
-
Hi @toggledbits
I found this very old post that talked about a way to limit device reading to avoid the throttled problem, because it's not a question of logic, it's that the device actually sends a lot of information, in my case the NUT ups installed in HE.
https://smarthome.community/topic/687/flapping-device?_=1737652139854
It mentions engine section of reactor.yaml by setting update_rate_limit, but I looked in the current MSR documentation and I can't find this information, so I don't know if it's still valid, its effect and parameters.
My situation is simple, when I have a UPS problem the NUT is sending dozens of reports per second and then I have the throttled problem. The same rule applies when the power is normal.
This is the rule, and the parameter that fails is the Tripp Lite UPS status.
All error is the same scenario.
[latest-25016]2025-01-23T12:01:32.753Z <Rule:WARN> (13) NUT Disconected (rule-l4djr0p7 in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttl> [latest-25016]2025-01-23T12:01:32.756Z <Rule:WARN> (27) Falta de Energia (rule-l4h9ceod in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Thrott> [latest-25016]2025-01-23T12:01:32.769Z <Rule:WARN> (73) UPS Battery Low (rule-l4hj850o in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttl> [latest-25016]2025-01-23T12:01:32.772Z <Rule:WARN> (74) UPS Comm Fail (rule-l4kbs5cp in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttlin> [latest-25016]2025-01-23T12:01:32.776Z <Rule:WARN> (76) UPS Utility Back (rule-l4hjhs6m in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Thrott> [latest-25016]2025-01-23T12:01:32.780Z <Rule:WARN> UPS On Battery (rule-l4hjuka5 in Datacenter) update rate 121/min exceeds limit (120/min)! Logic loop? Throttling> [latest-25016]2025-01-23T12:01:32.781Z <Rule:WARN> UPS Info (rule-l4gheo63 in Datacenter) update rate 121/min exceeds limit (120/min)! Logic loop? Throttling... [latest-25016]2025-01-23T12:01:40.757Z <Rule:WARN> (13) NUT Disconected (rule-l4djr0p7 in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttl> [latest-25016]2025-01-23T12:01:40.759Z <Rule:WARN> (27) Falta de Energia (rule-l4h9ceod in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Thrott> [latest-25016]2025-01-23T12:01:40.776Z <Rule:WARN> (73) UPS Battery Low (rule-l4hj850o in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttl> [latest-25016]2025-01-23T12:01:40.777Z <Rule:WARN> (74) UPS Comm Fail (rule-l4kbs5cp in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Throttlin> [latest-25016]2025-01-23T12:01:40.778Z <Rule:WARN> (76) UPS Utility Back (rule-l4hjhs6m in Warning) update rate 121/min exceeds limit (120/min)! Logic loop? Thrott>
Thanks.
-
@wmarcolin search the manual for "time series"
-
toggledbitswrote on Jan 24, 2025, 7:30 PM last edited by toggledbits Jan 24, 2025, 2:32 PM
when I have a UPS problem the NUT is sending dozens of reports per second
What's the UPS problem? Maybe that could be addressed in NUTController if its some odd failure mode of NUT that I can detect?
What you likely don't want to do is allow the issue to place an even larger load on the system by increasing the throttling limit.
-
@tunnus hi!
I like what I've read, I think it could be a solution, but it's still not very clear how to configure it, since I want to save various attributes of the device.
I'll try it out and post the result here for you, but it could be a way forward.
Thank you. -
@toggledbits hi!
This NUT UPS issue is something I've been unable to get to work perfectly for months, not to mention year. I even posted a message to you recently.
I tried to use an old solution from the HE community instead of the MSR solution. The information it returns is much simpler, I had to add attributes to the drive, but the problem turns out to be the same, the NUT stops working, the communication between the UPS and the VM where the NUT is located is not stable. HE upsd drive result: Satale date.
dev:1572025-01-24 08:21:16.586 PMinfo connected to upsd on 192.168.50.8:3493 - monitoring TrippLite every 30 seconds dev:1572025-01-24 08:21:01.183 PMinfo connected to upsd on 192.168.50.8:3493 - monitoring TrippLite every 30 seconds dev:1572025-01-24 08:20:31.149 PMerror telnet connect error: java.net.ConnectException: Connection refused (Connection refused) dev:1572025-01-24 08:20:01.110 PMinfo disconnected from upsd dev:1572025-01-24 08:20:00.082 PMerror telnet status: send error: Broken pipe (Write failed) dev:1572025-01-24 08:11:15.553 PMinfo connected to upsd on 192.168.50.8:3493 - monitoring TrippLite every 30 seconds dev:1572025-01-24 08:10:45.514 PMerror telnet connect error: java.net.ConnectException: Connection refused (Connection refused) dev:1572025-01-24 08:10:30.421 PMerror telnet connect error: java.net.ConnectException: Connection refused (Connection refused) dev:1572025-01-24 08:10:00.277 PMinfo disconnected from upsd dev:1572025-01-24 08:10:00.252 PMerror telnet status: receive error: Stream is closed dev:1572025-01-24 08:10:00.228 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.216 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.203 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.122 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.111 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.098 PMwarn upsd: Stale data dev:1572025-01-24 08:10:00.069 PMwarn upsd: Stale data dev:1572025-01-24 08:09:30.112 PMwarn upsd: Stale data dev:1572025-01-24 08:09:30.101 PMwarn upsd: Stale data
In other words, after switching from MSR to HE control, the problem continues, changing the error Throttled to ECONNREFUSED.
Making a new test, returning to the MSR's NUTControl configuration below, with the same parameters as the HE drive.
- id: nut enabled: true implementation: NUTController name: NUT UPS Controller config: server: 192.168.50.8 # modify the IP address as needed port: 3493 # optional, default shown username: "Reactor" # optional, no user auth if not set password: "Mac1967" # optional, must be specified if username is used
The error.log log on MSR is:
[latest-25016]2025-01-25T02:10:03.776Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:05.800Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:07.807Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:07.815Z <Controller:ERR> Controller NUTController#nut is off-line! [latest-25016]2025-01-25T02:10:09.907Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:11.924Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:13.927Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:15.939Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:18.124Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:20.128Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:22.150Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:24.172Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:26.182Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-]
IMPORTANT, even with this error log, the NUT control in the MSR does not stop working, it persists until it manages to communicate and collect the data, but this error scenario overloads the MSR.
Perhaps the time solution indicated by @tunnus could help.
In addition, parts of the ro reactor.log:
[latest-25016]2025-01-25T02:06:46.777Z <Structure:INFO> Structure#1 loading controller interface nut (NUTController) [latest-25016]2025-01-25T02:06:46.779Z <NUTController:null> Module NUTController v24303 [latest-25016]2025-01-25T02:06:46.779Z <Controller:INFO> Loaded NUTController version "0.1.24303"; Patrick Rigney/Kedron Holdings LLC <patrick@toggledbits.com> https://reactor.toggledbits.com/docs/NUTC> [ . . [latest-25016]2025-01-25T02:06:46.847Z <Structure:INFO> Starting controller NUTController#nut . . [latest-25016]2025-01-25T02:06:47.394Z <Engine:INFO> [Engine]Engine#1 master timer tick, local time "1/24/2025 9:06:47 PM" (TZ offset -300 mins from UTC) [latest-25016]2025-01-25T02:06:47.395Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-25T02:06:47.397Z <HubitatController:NOTICE> HubitatController#hubitatC7 performing initial connection and inventory [latest-25016]2025-01-25T02:06:47.398Z <HubitatController:NOTICE> HubitatController#hubitatC8 performing initial connection and inventory [latest-25016]2025-01-25T02:06:47.405Z <NUTController:INFO> NUTController#nut connected [latest-25016]2025-01-25T02:06:47.455Z <NUTController:INFO> NUTController#nut setting client username (Reactor) and password [latest-25016]2025-01-25T02:06:47.460Z <wsapi:INFO> wsapi: connection from ::ffff:192.168.50.9 [latest-25016]2025-01-25T02:06:47.511Z <Controller:NOTICE> Controller NUTController#nut is now online. [latest-25016]2025-01-25T02:06:47.511Z <NUTController:INFO> NUTController#nut client ready [latest-25016]2025-01-25T02:06:47.516Z <wsapi:INFO> wsapi: connection from ::ffff:192.168.50.133 . . [latest-25016]2025-01-25T02:10:03.774Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-25T02:10:03.776Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:03.776Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (1 fails) [latest-25016]2025-01-25T02:10:03.777Z <NUTController:NOTICE> NUTController#nut connection closed [latest-25016]2025-01-25T02:10:03.777Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (1 fails) . . [latest-25016]2025-01-25T02:10:05.799Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-25T02:10:05.800Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:05.801Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (2 fails) [latest-25016]2025-01-25T02:10:05.801Z <NUTController:NOTICE> NUTController#nut connection closed [latest-25016]2025-01-25T02:10:05.802Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (2 fails) [latest-25016]2025-01-25T02:10:07.803Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-25T02:10:07.807Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-25T02:10:07.815Z <Controller:ERR> Controller NUTController#nut is off-line! [latest-25016]2025-01-25T02:10:07.816Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (3 fails) [latest-25016]2025-01-25T02:10:07.829Z <NUTController:NOTICE> NUTController#nut connection closed [latest-25016]2025-01-25T02:10:07.830Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (3 fails)
IN SUMMARY:
- the connection parameters IP, login and password are correct, in the log you can see that it connects.
- NUTControl works until it starts a sequence of failures.
- In order to re-establish the service, I put a stop and start process in the crontab every 10 minutes, the problem is that it can remain idle all this time and fail.
- I've tried changing all the synchronization and reading times in the NUT parameters, I've even increased them to 90 seconds for data collection(ie. POLLFREQALERT 30 // HOSTSYNC 30), and it doesn't change the scenario, at a certain point there's saturation and the UPS stops responding.2
So I think the idea of making the MSR have a different collection time, longer than other devices, can at least reduce the error log, but it doesn't solve the problem.
Thanks.
-
@wmarcolin here's something I've used for a sensor that updates too frequently:
id: virtual4b name: "Lay-Z-Spa temp" capabilities: temperature_sensor: attributes: value: model: time series entity: "mqtt>layzspa_states" attribute: "temperature_sensor.value" interval: 1 # minutes retention: 1 # minutes aggregate: sma precision: 0 primary_attribute: temperature_sensor.value type: ValueSensor
Now I can use this virtual entity instead of a real one in a rule without throttling or logging problems (log files rotating too often etc)
-
Somewhere between...
[latest-25016]2025-01-25T02:06:47.511Z <NUTController:INFO> NUTController#nut client ready
and
[latest-25016]2025-01-25T02:10:03.774Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready...
is another message I need to see. NUTController will log the reason for a disconnect prior to reconnecting.
Using a time series is a band-aid that may mask the symptom but won't fix the problem. I don't recommend it, because it really doesn't reduce the system load -- it still needs to handle the change notifications, it's just doing it in a different subsystem.
-
@wmarcolin pull the 25026 release of NUTController. I made some changes to the recovery discipline and added more info messages. Let's see if it helps us get to the bottom of this.
-
@toggledbits download and test now.
-
This post is deleted!
-
Hi @toggledbits !!
I installed the new version, and at the same time created a log file specifically for NUTController, below is what I captured in the first 20 minutes, remembering that every 10 minutes I stop and restart the service to mitigate the problem.
[latest-25016]2025-01-27T18:42:31.243Z <NUTController:null> Module NUTController v25026 [latest-25016]2025-01-27T18:42:31.840Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:42:31.848Z <NUTController:INFO> NUTController#nut connected to "192.168.50.8":3493 [latest-25016]2025-01-27T18:42:31.848Z <NUTController:INFO> NUTController#nut setting client username (Reactor) and password [latest-25016]2025-01-27T18:42:31.887Z <NUTController:INFO> NUTController#nut client ready after authentication [latest-25016]2025-01-27T18:42:31.960Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 items returned [latest-25016]2025-01-27T18:42:32.007Z <NUTController:INFO> NUTController#nut initializing TrippLite [latest-25016]2025-01-27T18:50:01.750Z <NUTController:NOTICE> NUTController#nut connection closed; attempting reconnect... [latest-25016]2025-01-27T18:50:01.808Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:50:01.810Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T18:50:01.811Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (1 fails) [latest-25016]2025-01-27T18:50:03.811Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:50:03.812Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T18:50:03.812Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (2 fails) [latest-25016]2025-01-27T18:50:05.897Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:50:05.898Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T18:50:05.911Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (3 fails) . . repeat same fail 4....28 . [latest-25016]2025-01-27T18:50:58.732Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:50:58.733Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T18:50:58.734Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (29 fails) [latest-25016]2025-01-27T18:51:00.740Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:51:00.741Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T18:51:00.742Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (30 fails) [latest-25016]2025-01-27T18:51:02.762Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T18:51:02.764Z <NUTController:INFO> NUTController#nut connected to "192.168.50.8":3493 [latest-25016]2025-01-27T18:51:02.764Z <NUTController:INFO> NUTController#nut setting client username (Reactor) and password [latest-25016]2025-01-27T18:51:02.764Z <NUTController:INFO> NUTController#nut client ready after authentication [latest-25016]2025-01-27T18:51:02.826Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 items returned [latest-25016]2025-01-27T18:51:02.875Z <NUTController:INFO> NUTController#nut initializing TrippLite [latest-25016]2025-01-27T19:00:01.389Z <NUTController:NOTICE> NUTController#nut connection closed; attempting reconnect... [latest-25016]2025-01-27T19:00:01.440Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:00:01.440Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T19:00:01.441Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (1 fails) [latest-25016]2025-01-27T19:00:03.441Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:00:03.444Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T19:00:03.445Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (2 fails) [latest-25016]2025-01-27T19:00:05.454Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:00:05.455Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T19:00:05.462Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (3 fails) . . repeat same fail 4....28 . [latest-25016]2025-01-27T19:00:58.245Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:00:58.246Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T19:00:58.246Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (29 fails) [latest-25016]2025-01-27T19:01:00.246Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:01:00.248Z <NUTController:ERR> NUTController#nut unable to establish communication with "192.168.50.8":3493: [Error] connect ECONNREFUSED 192.168.50.8:3493 [-] [latest-25016]2025-01-27T19:01:00.248Z <NUTController:INFO> NUTController#nut recycling/reconnecting in 2000ms (30 fails) [latest-25016]2025-01-27T19:01:02.250Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.168.50.8":3493; waiting for ready... [latest-25016]2025-01-27T19:01:02.250Z <NUTController:INFO> NUTController#nut connected to "192.168.50.8":3493 [latest-25016]2025-01-27T19:01:02.250Z <NUTController:INFO> NUTController#nut setting client username (Reactor) and password [latest-25016]2025-01-27T19:01:02.251Z <NUTController:INFO> NUTController#nut client ready after authentication [latest-25016]2025-01-27T19:01:02.310Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 items returned [latest-25016]2025-01-27T19:01:02.367Z <NUTController:INFO> NUTController#nut initializing TrippLite
-
toggledbitsreplied to wmarcolin on Jan 27, 2025, 8:47 PM last edited by toggledbits Jan 27, 2025, 3:48 PM
@wmarcolin said in Throttled problem:
remembering that every 10 minutes I stop and restart the service to mitigate the problem.
OK. So basically, the log is showing exactly that. It's showing that every 10 minutes, the NUT service is closing the connection, and the service takes 60 seconds, almost exactly, to restart before NUTController can connect to it.
Try turning off the 10 minute restart and see what it does.
-
I removed the crontab stop and restart process with a set time, and then restarted the MSR to follow the new logs, below.
[latest-25016]2025-01-28T14:28:58.785Z <NUTController:null> Module NUTController v25026 [latest-25016]2025-01-28T14:28:59.438Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.16> [latest-25016]2025-01-28T14:28:59.444Z <NUTController:INFO> NUTController#nut connected to "192.168.50.8":3493 [latest-25016]2025-01-28T14:28:59.444Z <NUTController:INFO> NUTController#nut setting client username (Reactor) > [latest-25016]2025-01-28T14:28:59.544Z <NUTController:INFO> NUTController#nut client not yet ready; waiting [latest-25016]2025-01-28T14:28:59.544Z <NUTController:INFO> NUTController#nut client ready after authentication [latest-25016]2025-01-28T14:29:00.631Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 ite> [latest-25016]2025-01-28T14:29:00.688Z <NUTController:INFO> NUTController#nut initializing TrippLite [latest-25016]2025-01-28T15:04:17.580Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:04:17.580Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:04:27.602Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:04:27.603Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:04:37.610Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:04:37.610Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:04:47.618Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:04:47.619Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:04:57.682Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:04:57.682Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:05:07.687Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:05:07.687Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:05:18.056Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:05:18.056Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:05:28.059Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:05:28.059Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T15:05:38.068Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T15:05:38.068Z <NUTController:CRIT> !DATA-STALE . . start the DAT-STALE . [latest-25016]2025-01-28T16:59:13.198Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T16:59:23.199Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T16:59:23.200Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T16:59:33.202Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T16:59:33.203Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T16:59:43.213Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T16:59:43.214Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T16:59:53.227Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T16:59:53.227Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:00:03.330Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:00:03.330Z <NUTController:CRIT> !DATA-STALE . . nothing change than I restart the process ( . #systemctl restart nut-driver.service #systemctl restart nut-server.service #systemctl restart nut-monitor.service . [latest-25016]2025-01-28T17:00:13.332Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:00:13.332Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:00:23.336Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:00:23.336Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:00:33.362Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:00:33.363Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:00:43.414Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:00:43.414Z <NUTController:CRIT> !DRIVER-NOT-CONNECTED [latest-25016]2025-01-28T17:00:44.544Z <NUTController:NOTICE> NUTController#nut connection closed; attempting re> [latest-25016]2025-01-28T17:00:44.596Z <NUTController:NOTICE> NUTController#nut starting NUT client with "192.16> [latest-25016]2025-01-28T17:00:44.597Z <NUTController:INFO> NUTController#nut connected to "192.168.50.8":3493 [latest-25016]2025-01-28T17:00:44.597Z <NUTController:INFO> NUTController#nut setting client username (Reactor) > [latest-25016]2025-01-28T17:00:44.598Z <NUTController:INFO> NUTController#nut client ready after authentication [latest-25016]2025-01-28T17:00:44.648Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 ite> [latest-25016]2025-01-28T17:00:54.705Z <NUTController:INFO> NUTController#nut initial query succeeded with 1 ite> [latest-25016]2025-01-28T17:00:54.748Z <NUTController:INFO> NUTController#nut initializing TrippLite [latest-25016]2025-01-28T17:28:37.934Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:28:37.934Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:28:47.942Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:28:47.942Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:28:57.945Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:28:57.946Z <NUTController:CRIT> !DATA-STALE [latest-25016]2025-01-28T17:29:07.959Z <NUTController:ERR> NUTController#nut failed to get detail for TrippLite:> [latest-25016]2025-01-28T17:29:07.960Z <NUTController:CRIT> !DATA-STALE . . but after 8min DATA-STALE again .
When it failed I consulted upsc directly and the answer I got was the one below.
root@main:/home/wilson/reactor/logs# upsc TrippLite@192.168.50.8 Init SSL without certificate database Error: Data stale root@main:/home/wilson/reactor/logs#
Importantly, in MSR the NUT service does not crash, but it stops updating.
See that the device goes into an uninformed/excluded status.
I don't think the problem lies with the MSR/NUTConttoler, it's a matter of the NUT itself managing the UPS. I've been in this situation for more than a year and I can't get out of the problem loop, so some time ago I adopted the stop and restart crontab to alleviate the problem.
-
Looking for a solution, I implement this script.
#!/bin/bash # Nome do UPS configurado em /etc/nut/ups.conf UPS_NAME="TrippLite" # Comando para verificar o status do UPS if upsc "$UPS_NAME@localhost" | grep -q "DATA-STALE"; then echo "$(date): DATA-STALE detectado. Reiniciando o serviço NUT." >> /var/log/monitor_nut.log systemctl restart nut-driver.service systemctl restart nut-server.service systemctl restart nut-monitor.service else echo "$(date): Serviço funcionando normalmente." >> /var/log/monitor_nut.log fi
We'll see if it eases the problem, but here's the situation with the NUT.
-
toggledbitswrote on Jan 29, 2025, 2:51 AM last edited by toggledbits Jan 28, 2025, 9:52 PM
OK. That's enough info to determine it's a NUT problem (or driver problem), at least.
You may want to look at this. You should be troubleshooting at the Network UPS Tools level, looking at your (operating) system log files. Restarting the service isn't going to fix it.
2/15