MSR sometimes not getting Hubitat events - where to start?
-
I have MSR connected to two different Hubitat C7's, one at my house, and one at my parents house (at the other end of a VPN connection). I am sometimes finding that MSR doesn't have the current state of devices on the Hubitats. I'm not sure how to determine if the Hubitat never sent the status or MSR never received it.
For example, last night I activated a Google Assistant routine that, among other things, flips a virtual switch on my Hubitat. When that switch is turned on, Reactor make a http request that turns down the temperature on my thermostat.
This morning i noticed that the thermostat had not turned down overnight. Checking the Hubitat, I found that the virtual switch was on, but in the Reactor rule it was showing the state as 'false'.
I also have a rule that monitors whether the outdoor lights are on after sunset, because I had a few days where some of them didn't turn on when expected. I've had a few false alerts from the monitoring rule because it thought one of those lights was off when it was actually on. Again the Hubitat showed the switch in one state and MSR showed it in another (incorrect) state.
I have no idea how many status updates MSR might have missed (or Hubitat might have failed to send) for devices that wouldn't trip a monitoring rule or result in the wrong thermostat setting.
In each case, if I go to the device page in the Hubitat and click 'refresh' then Reactor gets the correct status, although the status in Hubitat doesn't change as it was already correct.
Any ideas for what I should check or where to begin? Looking in the Reactor logs wasn't helpful, as the thing I'm looking for a non-event when Reactor didn't get the switch state update. I do see the switch event in the Hubitat logs.
Is it possible to force MSR to do a full refresh, similar to what it would do on a Reactor restart, on a schedule? That should clear up any out-of-sync devices.
This has occurred across at least 3 or 4 of the most recent MSR builds, am I'm running MSR in docker on a RPi4.
-
toggledbitswrote on Dec 17, 2021, 4:23 PM last edited by toggledbits Dec 17, 2021, 12:03 PM
Can you look through your logs and find out if you have any of the following:
error on websocket to
closed during opening negotiation
closed; connection lost with
unparseable ws event
not getting server pings when expected
You may need to go back through rotated logs as well as the current.
Edit: And, you are not using
use_posturl
in your config, correct?Edit 2: And make sure all of your devices are published in MakerAPI
-
in reverse order:
Not all devices are published, but all the devices I'm having issues with are published. I have a few Google Mini speakers and one virtual motion sensor that aren't exposed to Maker API.I don't have use_posturl in any files in the config folder
I have log entries for two of the messages.. Two logs for 'not getting server pings when expected' and 7 for 'closed; connection lost with'. All refer to the second Hubitat at my parent's house. About half the logs' timestamps indicate they were caused by a daily reboot of my router, and the other half are likely times I was messing with the router and caused the VPN tunnel to restart.
So none of the logs appear to account for the issues with out-of-sync devices on my own hub.
-
Have you tried restarting MSR and specifically NOT refreshing the device on Hubitat? That would be useful to figure out if Maker API handing back correct info consistently, but for that check to be valid, nothing on the Hubitat side can change (i.e. don't do Refresh or anything else to the device).
-
I think I have... but I can't remember if that also fixed it. I'll wait for it to happen again and try the MSR restart and then report back.
-
toggledbitswrote on Dec 17, 2021, 10:10 PM last edited by toggledbits Dec 17, 2021, 5:13 PM
OK. The handling of the WebSocket (events) connection is pretty vanilla, but that may not be enough to detect every possible failure mode, so I've made some changes to make it pretty aggressive (and tunable), downright ruthless by default. I'll be curious to see if that picks up anything on your local hub. I'm sure it will give you notices on the VPN connection. Look for that later this evening.
Edit: by the way, not sure what the devices are in question here, but this popped up for me today: https://community.hubitat.com/t/2-3-0-119-c7-sending-off-to-z-wave-outlet-doing-nothing/85487
I have a ton of the affected device. I guess I have some extra work coming.
-
Build 21351 just posted. I'd like to see what happens for you on this release.
-
Since the device that got out of sync last night is a virtual switch, all the z-wave communication, polling, and refreshing isn't in play. Usually flipping the virtual device would register on MSR. Last night it didn't.
Possibly (or likely) a factor is that Google Assistant turns off every indoor light at the same time is turns on this virtual switch. While all the commands seemed to execute correctly (some of the lights are already off, so I can't tell if the off command for those failed), MSR didn't see the virtual switch change state.
This works most nights, so it may take a while to catch it failing again. Maybe I should move the 'all off' to a Hubutat or MSR rule instead of having the Google routine dump so many commands to Hubitat all at once.
I tried to update to the new version (using the recreate / pull latest image function in Portainer) and it still seems to be running 21349. I'll try again in the morning to see if it pulls the latest.
-
I got the new version. I'm getting the orange bell icon at the top with this alert:
Hubitat "hubitat_xxxxx" is reconnecting because the events feed seems to have become unresponsive.
Last 6:28:59 PM; this alert has repeated 56 times.I am only getting this from the Hubitat that is connected via VPN. I have not seen any alerts for my hub.
-
toggledbitswrote on Dec 19, 2021, 1:59 AM last edited by toggledbits Dec 18, 2021, 9:00 PM
Also read here: https://smarthome.community/post/10887
I'm not surprised that the VPN connection dies frequently. The more aggressive check will help it recover more quickly. For your own hub, let it run a while. It seems like your misses are sporadic, and so may be the zombification of the connection.
-
I'm running OpenVPN client and server on two DD-WRT routers. They are less than a mile apart and are both attached to 1 Mbps fiber internet. The other network is basically another subnet on my LAN. Ping times are ~5ms between the Pi4 running Reactor and the Hubitat, and I'm recording video on my Blue Iris server from a camera at the other end of the tunnel without any issues. The connection seems to be pretty solid.
The firewall should be allowing everything through between the two machines. Is there any particular traffic that might be getting blocked or dropped that would result in MSR detecting a loss of connection to the Hubitat? I can check the router logs to see if they show anything that might be relevant.
-
It's a WebSocket connection. The application layer protocol is report only, meaning that Hubitat sends data to Reactor, but there is nothing going from Reactor to Hubitat at the application layer. In the session layer, there is a protocol-required ping-pong. That will occur every 60 seconds. You might make sure there are no HTTP connection time lengths being enforced (WebSockets connections ride within HTTP). If that's controllable in your configuration, it should be indefinite/infinite. You might look at the
reactor.log
file (always) to see what it says about its unhappiness with the connection; that could give you a clue. -
So I think in my case it really was just quiet on the entity updates, resulting in a lot of warnings and restarts. I added the Hubitat Ping app to my hub and set it to ping it's gateway every 60 seconds. I know from running several ping jobs on my own hub that they show up as recently updated entities on MSR every time they fire. Sure enough, since starting the polling on their hub there have been no more websocket restarts or warnings.I also haven't had any out-of-sync devices yet, so I'll continue to wait until I see that again and try the MSR restart instead of the Hubitat refresh to see what the result is.
-
4/13