I've managed to use MSR UI on iOS devices to some degree*, so that although UI elements (e.g. rule sets) are not visible in portrait mode, you've seen them in landscape. Now with recents builds (24302) this does not work anymore, elements (rule sets, entities) are not anymore visible in landscape mode.
Does anyone have similar experiences? Using iOS 18 and Safari/Chrome browser.
( *Drag & drop of rule conditions have never worked on a mobile)
Hi @toggledbits,
I have lots of logs with this:
<Engine:ERR> Assignment to alarm ignored -- expression-driven global cannot be set by assignmentAny hints to where look at to avoid this? Thanks.
Hi @toggledbits
I'd like to update my controllers with these new features, but I'm struggling to find any guidance in the docs - and in general to understand the context.
Could you please elaborate more? Thanks.
I have the following ACL defined:
groups: admin: users: - admin applications: true api_acls: # This ACL allows users in the "admin" group to access the API - url: "/api" group: admin allow: true log: true # This ACL allows anyone/thing to access the /api/v1/alive API endpoint - url: "/api/v1/alive" allow: trueAnd I have authenticated to MSR as "admin" user. However, I'm getting "access denied" when trying to access http://*******:8111/api/v1/log
So what I'm missing, is my ACL incorrectly defined?
Using build 24302 on Docker.
Thanks to @toggledbits for adding a custom CSS. I've started doing a darker Reactor style.
Here's the file: https://gist.github.com/dbochicchio/825098ac13b7f8cac22012eae37ff7ce
A couple of things are still too bright and I'll eventually catch-up. Just place it under your /config directory, naming the file as customstyles.css. Hard refresh your browser.
Hi!
In Home Assistant I sometimes uses the TTS, either to my Sonos or Google speakers. With reactor in Vera I also use TTS.
But in MSR I can't select the TTS-service. It's simply not there. Am I missing something, or is this the case, so far?
Thanks!
/Fanan
Hi
I have just connected a bunch of EzloPi controllers to MSR to import some ESP based devices etc.
They all seemed to have worked and imported in to MSR apart from I have one missing device. It is a Digital Gas Sensor device.
This is how that device looks in the Ezlo API.
Devices Info:
_id: "10696001" deviceTypeId: "ezlopi" parentDeviceId: "10696000" category: "level_sensor" subcategory: "" gatewayId: "457a5069" batteryPowered: false name: "Gas Sensor Digital" type: "sensor" reachable: true persistent: true serviceNotification: false armed: false roomId: "" security: "no" ready: true status: "idle" parentRoom: true protectConfig: "default"Items Info:
_id: "20696001" deviceId: "10696001" hasGetter: true hasSetter: false name: "smoke_density" show: true valueType: "substance_amount" scale: "parts_per_million" value: 2.7472610473632812 valueFormatted: "2.75" status: "idle"There is also an Analog Gas sensor that one did import in to MSR OK.
68d63dab-b871-4f44-912b-cf6e0b9eb4c6-image.png
Devices Info:
_id: "10696000" deviceTypeId: "ezlopi" parentDeviceId: "10696000" category: "security_sensor" subcategory: "gas" gatewayId: "457a5069" batteryPowered: false name: "Gas Sensor Analog" type: "sensor" reachable: true persistent: true serviceNotification: false armed: false roomId: "" security: "no" ready: true status: "idle" parentRoom: true protectConfig: "default"Items Info:
_id: "20696000" deviceId: "10696000" hasGetter: true hasSetter: false name: "gas_alarm" show: true valueType: "token" enum: 0: "no_gas" 1: "combustible_gas_detected" 2: "toxic_gas_detected" 3: "unknown" valueFormatted: "no_gas" value: "no_gas" status: "idle"And this is how this MQ2 Gas Sensor looks like on their dashboard:
Digital
cb77dfa3-4af5-4d06-9635-89207a716a89-image.png
Analog
4fb4da1b-e946-4b89-876c-bcd9f5699b6c-image.png
They have an EzloPi website here you can create your own sensor projects using ESP boards, which is very interesting stuff!
And I just wrote on the Ezlo forum here, how to connect an EzloPi controller to MSR.
THANKS.
Build 21228 has been released. Docker images available from DockerHub as usual, and bare-metal packages here.
Home Assistant up to version 2021.8.6 supported; the online version of the manual will now state the current supported versions; Fix an error in OWMWeatherController that could cause it to stop updating; Unify the approach to entity filtering on all hub interface classes (controllers); this works for device entities only; it may be extended to other entities later; Improve error detail in messages for EzloController during auth phase; Add isRuleSet() and isRuleEnabled() functions to expressions extensions; Implement set action for lock and passage capabilities (makes them more easily scriptable in some cases); Fix a place in the UI where 24-hour time was not being displayed.A couple of things for you @toggledbits, since you mentioned that this release has new features and some tweaks are expected.
Local expressions cannot be deleted. Pushing the X button has no effect for me.
When cloning an entity action, the result is strange (first is cloned one, second is the original action):
a92ea094-9e2c-4aaa-bf47-2d07a6ffdbd0-image.png
When changing the action on the cloned element, the params are added to the original one. See screenshot:
92ac3011-83c8-466b-bd23-47d483ad7a52-image.png
Dark theme has a couple of strange contrasts. One is visible in the previous screenshots (white text on yellow background). Another one is in groups (blue text on blue background):
9b3c4988-53ef-44e6-9672-30e744cacb75-image.png
Overall, I found blue, yellow, red and green (in buttons and forms) to be too bright.
On the bright side:
I love the new script action: thank you! The dark theme is a great start to avoid getting blinded at night I promise I'll try very soon the new features around actions. Thanks!@toggledbits
I just upgraded to version MSR 24293, bare metal running on Fedora. Upon restart, I am getting a error banner:
I followed the new directions about npm
npm i --no-save --no-package-lock --omit dev
Any idea what the issue is?
Seems like switching the UI to the newly added dark mode (thank you for this) does nothing. The UI stays in light mode and only a few buttons turn into dark mode (see screenshot)
Things I have tried:
Hard refresh
Different browser
Different computer
Restarting Reactor
Failed troubleshooting attempts:
No errors in Chrome console
No relevant errors in Reactor log (can still PM the full log file)
Reactor version: latest-24293-ea42a81d
Hardware: Odroid N2+
Linux version: Ubuntu 24.04.1 LTS
3df2806f-9146-485b-9ec1-d056e91cefe5-image.png Dark mode enabled
ff823023-c079-4684-b01f-d6ac6527d31a-image.png Light mode enabled
Good morning,
I have a service MQTT service that needs a restart occasionally. The add-on (Smartbed MQTT) is for the smart bed base for my bed. It has a "safety light" that I can control from HAAS & MSR as a light entity, and also moves the head of the bed to a preset at bedtime, and then lies it back flat in the morning The problem is, from time to time, the light becomes "unavailable" Restarting from the Add-ons tab in HAAS always fixes it, but I should be able to detect when it happens when "light.tempur_pedic_safety_lights" is not true or false, i.e., unavailable.
What I don't know how to do is how to restart that service. Does anybody have experience in restarting add-ons from MSR?
Running:
Reactor (Multi-hub) latest-24212-3ce15e25 ZWaveJSController [0.1.24232]HAAS:
RPi5-64 (8GB) Core 2024.7.3 Supervisor 2024.08.0 Operating System 13.0 Frontend 20240710.0Hi!
Is it possible to generate two additional log files, the first being the replica of what is displayed on screen by the Rule History widgets and the other with Recently Changed Entities?
And could I configure the generation of one file per day, and delete the older ones? For example, store the last 5 days?
And being more ambitious, does Windget have an icon to open these TXT files in the navigated?
Well, we're approaching Christmas, so here's my request to Santa Claus @toggledbits 🙂
Hi @toggledbits
I'm working on a controller to generate llm response from a prompt in reactor. I have http response coming thru an http request action at the moment, capturing the response inside a local variable. So, it's practically sync.
I want to create a controller, so I don't have to rely on a proxy (and have a simpler architecture), and duplicate absurd http actions, but AFAIK in the current implementation, actions are async only. But if I have multiple requests going on, I cannot be sure what it's really inside an attribute. I also thought that something like a correlation id when sending the request could be used to identity multiple responses, but I wanted to double check with you before starting with something too complicated. I also noticed that some actions in home assistant (ie forecast) are sync and I'm wondering if you have any plan or hint to address this situation. Thanks.
Thanks.
@togglebits I am curious as to why the tilt_sensor.state (primary) = NULL. I believe it should show true or false. I have to use binary_sensor.state instead in my rules.
Again, not sure if this is related to Reactor/ZwaveJSController implementation or the actual Z-Wave JS UI docker version. I have copied, below, the attributes of the tilt sensor in hopes it can help.
Thanks in advance.
Reactor version 23302
ZWaveJSController version 23254
Z-Wave JS UI version 9.3.0.724519f
zwave-js version 12.2.3
@toggledbits I have noticed after upgrading both Reactor and ZWaveJSController to version 24257 that two of my devices/entities, TILT-ZWAVE2.5-ECO and Zooz ZSE18, had their entity re-named in an unusual way and also appears to be duplicated.
Reactor version 24257
ZWaveJSController version 24257
Z-Wave JS UI version 9.18.1
zwave-js version 13.2.0
Vestibule Motion Sensor State attributes/partial screenshot of entities it created. All entities have the same attributes.
motion_sensor.state=true x_zwave_values.Notification_Home_Security_Motion_sensor_status=8 zwave_device.capabilities=[113] zwave_device.endpoint=0 zwave_device.failed=null zwave_device.manufacturer_info=null zwave_device.node_id=23 zwave_device.valueId=[113,"Notification","Home Security","Home Security","Motion sensor status","Motion sensor status"] zwave_device.version_info=nullTilt Sensor Door State and Tilt Sensor Door State Simple attributes/partial screenshot of entities it created. All entities have similar attributes with exception of x_zwave_values.Notification_Access_Control_Door_State = 22 or 23.
tilt_sensor.state=true x_zwave_values.Notification_Access_Control_Door_state=22 zwave_device.capabilities=[113] zwave_device.endpoint=0 zwave_device.failed=null zwave_device.manufacturer_info=null zwave_device.node_id=24 zwave_device.valueId=[113,"Notification","Access Control","Access Control","Door state","Door state"] zwave_device.version_info=null tilt_sensor.state=true x_zwave_values.Notification_Access_Control_Door_state_simple=22 zwave_device.capabilities=[113] zwave_device.endpoint=0 zwave_device.failed=null zwave_device.manufacturer_info=null zwave_device.node_id=24 zwave_device.valueId=[113,"Notification","Access Control","Access Control","Door state (simple)","Door state (simple)"] zwave_device.version_info=null tilt_sensor.state=false x_zwave_values.Notification_Access_Control_Door_state=23 zwave_device.capabilities=[113] zwave_device.endpoint=0 zwave_device.failed=null zwave_device.manufacturer_info=null zwave_device.node_id=24 zwave_device.valueId=[113,"Notification","Access Control","Access Control","Door state","Door state"] zwave_device.version_info=null tilt_sensor.state=false x_zwave_values.Notification_Access_Control_Door_state_simple=23 zwave_device.capabilities=[113] zwave_device.endpoint=0 zwave_device.failed=null zwave_device.manufacturer_info=null zwave_device.node_id=24 zwave_device.valueId=[113,"Notification","Access Control","Access Control","Door state (simple)","Door state (simple)"] zwave_device.version_info=nullI'm slowly migrating all my stuff to MQTT under MSR, so I have a central place to integrate everything (and, in a not-so-distant future, to remove virtual devices from my Vera and leave it running zwave only).
Anyway, here's my reactor-mqtt-contrib package:
Contrib MQTT templates for Reactor. Contribute to dbochicchio/reactor-mqtt-contrib development by creating an account on GitHub.
Simply download yaml files (everything or just the ones you need) and you're good to go.
I have mapped my most useful devices, but I'll add others soon. Feel free to ask for specific templates, since I've worked a lot in the last weeks to understand and operate them.
The templates are supporting both init and query, so you have always up-to-date devices at startup, and the ability to poll them. Online status is supported as well, so you can get disconnected devices with a simple expression.
Many-many thanks to @toggledbits for its dedication, support, and patience with me and my requests 🙂
[Solved] Is there a cap or max number of devices a Global Reaction should not exceed?
-
@wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:
Looking at the log, I don't understand what sequence the MSR performed, but I see that all of the above actions were sent to Hubitat
Hmmm. I can't answer for the Hubitat part, but I explain the order. The first three actions in nGarden<SET> are Run Reaction, so these enqueue those reactions with the executive -- they are not run in-line. That's why you see the three "Enqueueing" lines, followed by a resume of nGarden<Set> from step 4. We see the action output for device 298, which is step 3 (numbered from 0), before the enqueue messages because enqueueing itself is an asynchronous operation, so the executive quickly started the three Run Reaction enqueue requests, then ran the device 298 Entity Action. Running an entity action is asynchronous, so the executive had to wait for that operation to finish. Since it went into a wait state, the tasks for the three Run Reaction enqueues could run, so they did. When they were done and the 298 device action was finished sending, nGarden<SET> could then resume from step 4 (numbered from 0, so 5 as we look at it). That's device 296 so we see that on the next line. Again, device actions have to wait for the send, so execution paused of nGarden<Set> paused there, which allowed nLight Garden ON, the first of the three enqueued reactions, to start and send its first command to device 97. That blocked that reaction, so nLight Security ON was next in the queue and it started and sent its first device action to 197. That blocked that reaction, so nLight Corredor Evening started and ran its first action against 419. It blocked, of course, so everything paused about 70ms until nGarden<Set> became the first ready task, so it resumed at step 5 (from 0, or 6 as we count from 1). And so on, until all were sent.
I'm not sure what your pacing configuration was at this point, but overall it appears about right for the number of tasks sent. It's hard to tell without more debug on, and maybe I'll add some "standard" messages about device queueing while we're looking at this (since debug on a Controller instance can be very large and a bit like sipping from a firehouse).
One thing to note also is that each Entity Action blocks while sending -- the reaction waits for the hub to acknowledge the request. For that to happen, the request must be sent, and the hub has to give an HTTP 200 (OK) response to the request (if it gives an error, that would be logged, and there are no errors logged in this snippet). So at the least, the hub has acknowledged the request, but that doesn't mean it has completed the request, let alone that the request was successful in its overall execution (e.g. manipulating the device). That's a different and much bigger problem.
I'm not done looking at this. I want to study the timing more carefully as well. There's something about it that doesn't seem right to me. As I said, I'm going to add some more standard (non-debug) diagnostic output to this while we're looking at it, and roll a new release later today, for you to try and send me new logs.
-
Ok, as you can see from the picture, my Hubitat is next to my Asus router, where the Vera Plus used to be. In theory radio waves from the router should interfere with the Zigbee because both frequencies are in 2.4, and should not in any way interfere with the Hubitat that works at a frequency of 900 Mhz. But I'm trying everything, and I've already moved the Hubitat 2 meters away from the router.
The vera was already disconnected and I have also removed it and put it away, who knows, maybe in the future I will find some use for it.
With respect to your comment of battery devices, I understand, but it is not the case, as I mentioned above I tried to make a sequence of lights, the 6 devices are 4 Aeontec Micro Switch G2 (DSC26103), 1 Zipato Bulb 2, and another Everspring AN145, that is all connected to the electricity all the time. Your battery point is very valid, but it is not what has caused me panic.
Finally, your comment about Hubitat being faster than Vera is what I read everywhere, but unfortunately, this is not what is happening to me at the moment. Slow reactions, even using Hubitat's own dashboard commands it takes a while for the action to happen, so I go back to the mesh network issue above. Maybe now moving it can help.
But still remains the question of the lack of response to MSR at the same speed.
I will now read the post of @toggledbits to see what he comments.
Thank you very much for your attention.
-
Build 21351 just posted. This has a fix for a problem in the setup of the task queue for HubitatController that is likely causing it to not pace as expected. Let's give this a try, and in addition, try a few different values for
action_pace
. I would start from 25 and work up. -
@toggledbits once again thank you for your kind attention.
What depends on me to send log, do tests I am at your disposal. Just tell me what I should do that I will be promptly attending.
@gwp1 in a previous message I had informed that I had moved Hubitat 2 meters away from my central office, which has my Asus router, no-break, the modems of the internet providers. I have just radicalized, and now the Hubitat is more than 20 meters away and open space, well away from all this magnetic field and radio frequency of a possible interference, let's see if this helps communication.
So now it remains to investigate why the very fast MSR running on a dedicated notebook (Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 8Gb RAM, 300Gb SSD), maybe running over the Hubitat.
Thanks
-
@wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:
Just tell me what I should do that I will be promptly attending.
You are always helpful, and that's appreciated. For the moment, just run it as it comes. Let's see what happens. Start with a 25 for
action_pace
, and if you continue to have issues, move it up to 50. If 50 isn't enough, then I'll ask for logs. Let me know how it goes. -
OK, changed action_pace to 25 (it was 100), and upgraded to build 21351.
-
@toggledbits in this last upgrade process, I got these zombie processes. How is it possible to kill them? Thanks.
-
Stop Reactor. Then grab the
reactor.log
file and upload it to me. I'm going to DM you a link.After you've uploaded the log file, remove the
storage/states/reaction_queue.json
file and restart Reactor. -
@toggledbits done boss!!
-
I think that in another track you have also received this message, which is occurring exactly when running the above event of turning on several lights.
-
@wmarcolin Very good. That means the more aggressive checks are working. It appears that Hubitat's event socket is a good bit more fragile than its Hass equal. I will add an option to the next release to silence this warning (although the reconnect will still be logged in the log file). You should also be able to see the effect of the reconnects on the system entity's
x_hubitat_sys.reconnects
counter.For comparison, I don't get these errors unless I force them. The one restart shown below was because I upgraded the hub to 2.3.0.120.
-
TL;DR: Hubitat needs more aggressive WebSocket connection health tests and recovery, and that's been added as of 21351. When recovery is needed (which should be very rare), device states may be delayed up to 120 seconds. Don't use WiFi for your Reactor host or hub in production. If your Reactor host and hub aren't on the same network segment (LAN), you may see more reconnects. If you see reconnects when your Reactor host and hub are on the same network segment, you likely have a network quality issue.
I want to explain how I understand the problem reported, and how the fix (which is more of a workaround) works.
The events websocket is one of two channels used by HubitatController to get data from the hub. When HubitatController connects to the hub, it begins with a query to MakerAPI to fetch the bulk data for all devices -- "give me everything". Thereafter, the events socket provides (only) updates (changes). Being a WebSocket connection, it has a standard-required implementation of ping-pong that both serves to keep the connection alive and test the health of the connection. In Reactor, I use a standard library to provide the WebSocket implementation, and this library is in wide use, so while it's almost certainly not bug free (nothing is), it has sufficient exposure to be considered trustworthy. I imagine Hubitat does the same thing, but since it's a closed system, I don't know for sure; they may use something common, or they may have rolled their own, or they may have chosen some black sheep from among many choices for whatever reason. In any case, neither Hubitat nor Reactor implement the WebSocket protocol itself, we just use our respective WebSocket libraries to open and manage the connections and send/receive data.
Apparently there is a failure mode for the connection, and we don't know if it's on the Hubitat (Java) side or in the nodejs package, where the events can stop coming, but apparently the ping-pong mechanism continues to work for the connection, otherwise it would be torn down/flagged as closed/error by the libraries on both ends. There's no easy way to tell if Hubitat has stopped sending messages or the nodejs library has stopped receiving or passing them, and since the libraries/packages on both ends are black boxes as far as I'm concerned, I don't really care, I just want it to work better. So...
HubitatController versions prior to 21351 relied solely on the WebSocket's native ping-pong mechanism to describe connection health, as Reactor does for Hass and even its own UI-to-Engine connection (lending credence to the theory that the nodejs library is not the cause). But for Hubitat it appears the WebSocket ping-pong alone is not enough, so 21351 has introduced some additional tests at the application layer. If any of these fails, the connections are closed and re-opened. When reopened, a full device/state inventory is done again as usual, so the current state of all devices is reestablished. Any missed device updates during the "dead time" would be corrected by this inventory.
By the way, one of the things that exacerbates the problem with the Hubitat events WebSocket is that it's a one-way connection at its application later: Hubitat only transmits. There is no message I can send over the WebSocket for which I could expect a speedy reply as proof of health. I have to find other things to do through MakerAPI in an attempt to force Hubitat to send me data over the WebSocket, and this takes more time as well. If there was two-way communication, it would be a lot easier and faster to know if the connection was healthy.
So the question that remains, then, is what is that timing? By default, HubitatController will start its aggressive recovery at 60 seconds of channel silence. If the channel then remains silent for an additional 60 seconds, the close/re-open recovery occurs. So even if the connection fails, the maximum time to recovery and correct state of all devices will be just over 120 seconds. So even in worst-case conditions, entity states should not lag more than that. Given that these stalls are the exception rather than the rule for most users, these pauses should be rare.
There is one tuning parameter that may be useful to set on VPN connections or any other "distanced" connection (i.e. any connection where the Reactor host and the hub are not on the same network segment, and in particular may traverse connection-managing software or hardware like proxies, stateful packet inspection and intrusion detection systems, load balancers, etc.). That is
websocket_ping_interval
, which will be added to the next build. This will set the interval, in milliseconds, between pings (default 60000). This should be sufficiently narrow to prevent some VPNs from aborting the socket in some cases (see the WebSocket missive at the end), but if not, smaller values can be tried, at the expense of additional network traffic and a slight touch on CPU. If the reconnects don't improve significantly, a different VPN option should be chosen.And this brings me to two recommendations:
- You should not use a WiFi connection for either the Reactor host or hub in production use. These are fine for testing and experimentation, but are an inappropriate choice in production for both reliability and performance reasons.
- If you use a VPN between the Reactor host and the hub, "subscription VPNs" are probably best avoided, as these will be the most aggressive in connection management and cause the most disconnects and failures. That's because they are tuned for surfing web traffic and checking email, basically, where the connections are open-query-response-close — connections don't stay open very long, typically. There are optimizations of HTTP where connections are kept open after a response to allow for a follow-up query (e.g. request an embedded image after requesting a document), but these are generally much shorter than the expected infinite open of a WebSocket connection (more on this at the bottom). Point-to-point VPNs that you set up and manage yourself are likely to provide better stability and performance (e.g. PPTP, SSH tunnels, etc.).
I will also add this: in my network, I do not get Hubitat WebSocket stalls and reconnects. I have had to force them through various devious means to test the behavior I've just implemented. I owned a commercial data center in the San Francisco Bay Area, with a managed network offered to clients with 100% uptime service level agreements. I built and maintained that network. My home network is a reflection of that — good quality equipment, meticulous cabling, sensible architecture (scaled down appropriately for the lesser scope and demands), and active data collection and monitoring. My network runs clean, and when there are problems, I know it (and where). If your Reactor host and Hubitat hub are on the same network segment, hardwired and not WiFi, and you are getting reconnects, I think you should audit your network quality. Something isn't happy. It only takes one bad cable, or one bad connector on one end of one cable, to cause a lot of problems.
---
For anyone interested, one reason why WebSockets can be troublesome in network environments where connection management may be done between the endpoints is that a WebSocket typically begins its life as an HTTP request. The client makes an HTTP request to the server with specific headers that ask that the connection to be "converted" (they call it "upgraded") from HTTP to WebSocket. If the server agrees, the connection becomes persistent and a new session layer is introduced. But because the connection starts as HTTP, any interstitial proxy or device that is managing the connection as it passes through may mistake it for a plain HTTP (web page) request, and when the connection doesn't tear itself down after a short period the proxy/device thinks is reasonable for HTTP requests, it forces the issue and sends a disconnect to both ends. This is necessary because tracking open connections consumes memory and CPU on these devices, and in a commercial ISP environment this could mean tens of thousands or hundreds of thousands of open connections at a single interface/gateway. So to keep from being overwhelmed, these devices may just time out those connections (on a predictable schedule/timeout, or just due to load), but because it's not really an HTTP connection at that point (it's been upgraded to a WebSocket), the proxy/device is breaking a connection that both the server and client expect to be persistent, and that can then cause all kinds of problems, the most benign of which is forcing the two endpoints to reconnect frequently and waste a lot of time and bandwidth doing it. On a LAN, you typically don't have these problems, because the two endpoints have no stateful management between them (network switches, if present between, just pass traffic, not manage connections), so barring network problems, there's no reason for them to be disconnected until either asks to close.
-
@toggledbits master!
Another lesson in knowledge, and dedication to understanding and solving problems.
Well, I spent two days reading your post before trying to answer anything.
First starting from the end, in my case my network is all Giga, CAT7 cables with industrial connectors. All the cabling of the house came with cables ready to not run the risk of redoing connectors. Then I even hired a company to certify the network, which also uses management switches that I can validate the network. So ping inside my house between any equipment is < 1ms.
From the technical side what I see is that the connection between MSR and Hubitat is fragile, and it becomes even more so when MSR is much faster in its actions without Hubitat responses.
What you would be doing for a future version is strengthening the validation of this communication, in particular, to return the status of given orders. That is, if you tell the MSR to turn on a light, make sure that the return state says that the light is on.
I was in doubt, in case of saying that it was not turned on, would there be a reset? Could this be a summary?
-
Ping (and traceroute) are not network quality tools. They are path tools. To measure link quality, since you said you had managed switches, you'd need to look at the error counts on each port. That may be worth a squiz.
As @Alan_F has observed, you can get restarts from a quiet channel that is just naturally quiet. If you don't have a lot of devices, and things aren't changing often, it's very likely to see a quiet channel. HubitatController probes by picking a device and making it refresh, which usually causes some event activity on the WebSocket channel. But it's possible that the device it picks (randomly) may not do much when refreshed. You can set
probe_device
to the device ID (number) of a device that consistently causes events when asked to refresh to mitigate that random effect. That may take some experimentation, using the Hubitat UI to ask devices to refresh, and watching for green highlights in the Reactor Entities list. When you find a device that consistent "lights" when you refresh it on Hubitat, you've probably got a reliable probe device.I have noted in my own network that certain devices (that shall be unnamed) are sufficiently chatty that I'm at no risk of a quiet channel. Let's just say anything that monitors energy is a good canary in the mine.
-
The ping tests were to validate the connection, speed, below is the screen of the switch that the MSR and Hubitat are on, no errors. I have a weekly reset programmed, so this information is from last Friday until now.
Ok, so what I should set now is a device connection probe, something like the APP Device Watchdog. I'm going this way.
Anyway, I think I mentioned before, I ended up moving Hubitat far away from my wifi router on Saturday, and since last night after deactivating the devices again one by one, I started rebuilding my mesh. I already see good results with the move away, several devices that previously had trouble responding, now seem to work better. Hopefully, by Wednesday I will have finished the rebuild, and will be able to check how the network behavior and ping-pong between MSR and Hubitat is now with the latest version of MSR.
Thanks.
-
Very good. In my limited experience with this new heuristic, Z-Wave devices seem to be the most predictable for probes among the basic devices. In the current build, whatever you choose must support the
Refresh
(Hubitat native) capability (akax_hubitat_Refresh
in Reactor). In the next build, the config valueprobe_action
will be available to let you use a different action, if you need to, andprobe_parameters
(an object) containing any key/value parameter pairs that the command may need (optional).I've also found that the "Hub Information" app (or more correctly, the device that this app manages) makes a good probe target. It has some useful data, and also very reliably updates fields when commanded to do so, even at a high frequency, so the next build will have specific support for this app/device if it finds it among the hub's inventory.
-
@toggledbits hi!
I have been operating for two days with version 23360, and the chaos described in the messages above is no longer happening. The recurrent failures of leaving an action without executing as a whole are no longer oberved.
Thanks for all your efforts!
A question, in version 21353 we started seeing the message below, which you made possible after deactivating the alert.
Is it possible to make some kind of query, or trigger some action when this message happens? I would like for example to send a Telegram to notify me.
Thanks.
-
Yes, there are at least two ways to do that now. Any thoughts on where you might look?
Edit: Same question as this thread: https://smarthome.community/topic/832/identifying-when-msr-cannot-connect-to-home-assistant
And this thread is wandering, so maybe let's leave it alone.
-
-
Hi, I asked @toggledbits to reopen this very long thread, to give a testimony of a situation that I believe can help others.
I apologize for the long message, let's recapitulate the history.
The discussion was based on possible device limits on MSR actions, which Patrick explained there would not be, but the situation described was very similar to the scenario I posted in my message, of numerous failures on actions sent from MSR to Hubitat (December 16, 12:28am). That MSR was much faster than Hubitat could process.
I posted an example where I had to execute a series of Reactions that would turn on lights, and also turn on outlets, it always failed, I demonstrated that the MSR would activate everything, but Hubitat would not execute.
Then came the topic and orientation that my Hubitat should not be next to my WiFi router that could be interfering with the Hubitat signal, I even sent a picture on December 17 and provided the change %(#ff8000)[(TIP 1)].
Well obviously, as the change of position was big, I had to redo 3 times the entire mesh network, adding and removing devices. I followed the instruction of many masters, do the network from the center, i.e., from Hubitat to the outside, including first devices that use electric power because they are repeaters, and then those with exclusive battery power %(#ff8000)[(TIP 2)].
Well, our friend Patrick releases version 21351 and then 23360 where he adds much more aggressive management in the communication MSR x Hubitat, it improved a lot. But unfortunately, I kept having problems.
Then I asked for help again to go forward and see what to do to improve, we entered in the theme that several posts from the Hubitat community mentioned devices that use the S0 security, that this creates problems in the mesh network by high traffic of unnecessary information, new action remove and include again the devices that had S0 %(#ff8000)[(TIP 3)], another action that helped the network. What was not possible, we reactivated the Vera hub and put these devices back in, removing them from the Hubitat network.
We were evolving, but the situation persisted, actions that triggered many actions to Hubitat could still have failures, and the worst actions using Hubitat's own dashboard were also not being executed.
Well, I returned to the discussion of when I changed the Vera to Hubitat, which highlighted several points of change, but one very bothered me, the Hubitat Z-Wave signal, much weaker than the vera (https://smarthome.community/topic/776/switching-from-vera-to-hubitat/9?_=1644189698709).
Well 4 days ago (2/2), moved by the courage I opened my Hubitat and followed the post (https://community.hubitat.com/t/external-antenna/81396/28) %(#ff8000)[(TIP 4)], and installed an external antenna for z-wave, here I show that I bought and installed (https://community.hubitat.com/t/elevation-c7-possible-faulty-z-wave-radio/52977/91). In this post the discussion started with the theme S0 and S2, and went into the antenna theme.
MY TESTIMONY OF WHAT HAPPENED
A revolution, my Hubitat got a new life, it is another equipment:
- Before I had 15 direct devices in the hub, today after 4 days there are already 36 of 64, and I see that every day is increasing as the network is being restructured. There was the absurdity of equipment in the same environment as the HE, but behind a column, using two other devices to reach the HE that was less than 4 meters away, now communicates directly;
- There was almost no equipment that communicated at 100kbps, most were between 9.6 and 40, now most are already 100, and a small number, 5/64 are at 9.6 kbps;
- Remember the thing where I had to turn on several lights and power outlets all together? it didn't fail anymore, as the devices speak better and faster like the HE, I don't see this failure anymore;
- I also talked about actions commanded by the Dashboard that the device did not respond to, it is not happening anymore either.
In summary, in my case that 3/4 of the devices are not repeaters, they use batteries, my z-wave mesh network had a lack of repeaters, and this generated a generalized degradation. Now, with this better signal, if not eliminated the problem, I reduced it to almost zero.
Now pay attention, the operation of putting up the external antenna seems simple, but it is not. The antenna connector that is soldered to the board is very small and difficult to handle. So if you go this way, look for a cell phone repair shop, they will surely have the best technique for this change.
Thank you, and sorry again for the long message.
@gwp1 maybe this can help if you still have a problem. Your tip to move the HE away from the Wifi was also precious, thank you.
@SweetGenius your comment that there might be an overwhelming in the hub was correct, the action_pace action was a help, but the signal improvement I describe was the solution when I have a more fast response of the devices, reducing the overwhelm. Thanks.
@toggledbits our last messages, before I wanted to incinerate Hubitat, also helped a lot on the way. Thank you for all your dedication.