Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Unsolved
Collapse
Discussion Forum to share and further the development of home control and automation, independent of platforms.
  1. Home
  2. Software
  3. Multi-System Reactor
  4. [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?
[Solved] Local expression in Rule does not evaluate as they used to do
CrilleC
Topic thumbnail image
Multi-System Reactor
Reactor (Multi-System/Multi-Hub) Announcements
toggledbitsT
Build 21228 has been released. Docker images available from DockerHub as usual, and bare-metal packages here. Home Assistant up to version 2021.8.6 supported; the online version of the manual will now state the current supported versions; Fix an error in OWMWeatherController that could cause it to stop updating; Unify the approach to entity filtering on all hub interface classes (controllers); this works for device entities only; it may be extended to other entities later; Improve error detail in messages for EzloController during auth phase; Add isRuleSet() and isRuleEnabled() functions to expressions extensions; Implement set action for lock and passage capabilities (makes them more easily scriptable in some cases); Fix a place in the UI where 24-hour time was not being displayed.
Multi-System Reactor
Home Assistant 2025.11.2 and latest-25315
CrilleC
Topic thumbnail image
Multi-System Reactor
Notice to Docker + ARM Users (RPi 3/4/5 and others)
toggledbitsT
This post does not apply to users of Intel/AMD-based systems. If you are using a Reactor image tagged latest-amd64 or stable-amd64, then this post does not apply to you. It also does not apply to bare-metal installs; it's for users of docker images on ARM-based systems only (principally Raspberry Pi hosts, but could be others). After January 15, 2026, I will no longer produce the aarch64-tagged docker image for Reactor. The ARM images will be arm64 for 64-bit operating systems, and armv7l for 32-bit operating systems. For those of you running a container from the aarch64 image today, this will be a relatively simple change: you just need to switch the image used for your docker container to a differently-tagged image. If you are using docker-compose, then this is a relatively simple matter of changing the image line in your docker-compose.yaml file and then stopping (docker-compose down) and restarting (docker-compose up -d) your Reactor daemon. But there's a catch... not all of you can safely just switch from the aarch64 image to the arm64 image. And, you can't just trust the output of uname -m, for example, because this exposes the CPU architecture, but not the word size of the OS running on that CPU. For Raspberry Pi systems, the transition to 64-bit operating systems was long (starting in 2016) and not always obvious — although there was a first "official" 64-bit OS for RPis in 2020, it did not become a default recommendation in the Raspberry Pi Imager until 2021, and then that was only the default for Pi 3/4 systems with >4GB RAM; it was 2022 before it was universally recommended for all 64-bit CPUs regardless of RAM size. Depending on when you first imaged your RPi system and what default you may have been offered/chosen, you could today easily have a 64-bit CPU Raspberry Pi running a 32-bit version of the operating system. Upgrades along the way would not change this; changing it to fully 64-bit requires a full reimage of the system. To establish if your OS is 64- or 32-bit, log in to your Pi and run: sudo dpkg-architecture -q DEB_HOST_ARCH. If the response is arm64 or aarch64, then you are running a 64-bit OS and you should use the arm64-tagged image. If it's anything else, you are running a 32-bit OS, and you should use the armv7l-tagged image. pi@rpi4-1:~ $ sudo dpkg-architecture -q DEB_HOST_ARCH armhf pi@rpi4-1:~ $ uname -m aarch64 pi@rpi4-1:~ $ In the example above, the uname command reports that the CPU is 64-bit architecture (aarch64), which is true for the host on which I ran these commands, but the DEB_HOST_ARCH value is armhf, indicating a 32-bit operating system. This system has to use the armv7l-tagged image. Other systems will have their own ways of determining the word size of the running OS. Since the majority of Reactor users running ARM systems are on Raspberry Pis, I am able to supply the above instructions, but if you happen to have a different ARM system, you'll need to do some web searching to figure out how to expose that information. Or, you can just try the arm64 image, and if it doesn't start up, try the armv7l image. Remember to always back up your system before making any changes. For everyone, please make this change as soon as possible, and if you have any trouble finding a working image, please (1) go back to the current aarch64 image; and (2) let me know in this thread along with as much detail about your host system as you can offer (including the output of the dpkg-architecture command mentioned above).
Multi-System Reactor
Requesting a proper ARM64/aarch64 Docker image (Pi 5 support)
M
Hi, I'm in the process of migrating from a Raspberry Pi 4 (ARMv7) to a Raspberry Pi 5 (ARMv8/aarch64), but I’ve run into an issue: there is no proper ARMv8/aarch64 image available. None of the existing images run on the Pi 5 - they all exit immediately with code 139 (segmentation fault), which typically indicates that the binaries inside the image are not compatible with the ARM64/aarch64 architecture used by the Pi 5. Would it be possible to publish a correct ARMv8/aarch64 (linux/arm64) image? Building one should be relatively straightforward using docker buildx with multi-arch support. For example, my own Node.js images are built this way: docker buildx build --push \ -t <localrepo>/<project>:<tag> \ --platform=linux/arm64,linux/amd64 \ --file ./apps/<project>/Dockerfile . This produces both the AMD64 and ARM64/v8 variants automatically. Also, as a side note, it may be best to avoid using Alpine as the base image for the ARM64 build, since musl-based builds often cause compatibility issues and unnecessary headaches. A glibc-based base image (e.g., Debian or Ubuntu) tends to work far more reliably on ARM64, especially for Node.js applications. @toggledbits - tagging you in case you missed this. Thanks, mgvra
Multi-System Reactor
Script action and custom timers
therealdbT
Sorry to write here without trying, but I’m flying today. Am I correct if i say that script action with alarm() makes it possible to execute a reaction in a given interval, lets say 15 seconds or 3.5 minutes? That sounds amazing, since I’ve used weird tricks, including a custom controller, just to do this.
Multi-System Reactor
Help resolve change in behaviour post update
CatmanV2C
Topic thumbnail image
Multi-System Reactor
Reactor w/HA 2025.11 error on set_datetime service call setting only time
CrilleC
@toggledbits Do you know if this is related to that PR or is it a change they made in 2025.11.1? [latest-25310]2025-11-11T13:16:24.319Z <HassController:INFO> HassController#hass perform x_hass_input_datetime.set_datetime on Entity#hass>input_datetime_vvb_dag with { "time": "10:45" } [latest-25310]2025-11-11T13:16:24.320Z <HassController:INFO> HassController#hass: sending payload for x_hass_input_datetime.set_datetime on Entity#hass>input_datetime_vvb_dag action: { "type": "call_service", "service_data": { "date": (null), "time": "10:45", "datetime": (null), "timestamp": (null) }, "domain": "input_datetime", "service": "set_datetime", "target": { "entity_id": "input_datetime.vvb_dag" } } [latest-25310]2025-11-11T13:16:24.321Z <HassController:ERR> HassController#hass request 1762866984320<2025-11-11 14:16:24> (call_service) failed: [Error] Not a parseable type for dictionary value @ data['date'] [-] [latest-25310]2025-11-11T13:16:24.321Z <HassController:WARN> HassController#hass action x_hass_input_datetime.set_datetime({ "time": "10:45" }) on Entity#hass>input_datetime_vvb_dag failed! [latest-25310]2025-11-11T13:16:24.321Z <HassController:INFO> Service call payload: {"type":"call_service","service_data":{"date":null,"time":"10:45","datetime":null,"timestamp":null},"domain":"input_datetime","service":"set_datetime","target":{"entity_id":"input_datetime.vvb_dag"},"id":1762866984320} [latest-25310]2025-11-11T13:16:24.322Z <HassController:INFO> Service data: {"fields":{"date":{"example":"\"2019-04-20\"","selector":{"text":{"multiline":false,"multiple":false}}},"time":{"example":"\"05:04:20\"","selector":{"time":{}}},"datetime":{"example":"\"2019-04-20 05:04:20\"","selector":{"text":{"multiline":false,"multiple":false}}},"timestamp":{"selector":{"number":{"min":0,"max":9223372036854776000,"mode":"box","step":1}}}},"target":{"entity":[{"domain":["input_datetime"]}]}} [latest-25310]2025-11-11T13:16:24.322Z <Engine:ERR> Engine#1 reaction rule-mgb8pfhs:S step 0 perform x_hass_input_datetime.set_datetime failed: [Error] Not a parseable type for dictionary value @ data['date'] [-] [latest-25310]2025-11-11T13:16:24.322Z <Engine:INFO> Engine#1 action args: { "time": "10:45" } [latest-25310]2025-11-11T13:16:24.322Z <Engine:INFO> Resuming reaction Sätt Schema VVB i Home Assistant<AKTIV> (rule-mgb8pfhs:S) from step 1 [latest-25310]2025-11-11T13:16:24.323Z <HassController:INFO> HassController#hass perform x_hass_input_datetime.set_datetime on Entity#hass>input_datetime_vvb_natt with { "time": "03:00", "timestamp": 0 } [latest-25310]2025-11-11T13:16:24.323Z <HassController:INFO> HassController#hass: sending payload for x_hass_input_datetime.set_datetime on Entity#hass>input_datetime_vvb_natt action: { "type": "call_service", "service_data": { "date": (null), "time": "03:00", "datetime": (null), "timestamp": 0 }, "domain": "input_datetime", "service": "set_datetime", "target": { "entity_id": "input_datetime.vvb_natt" } } [latest-25310]2025-11-11T13:16:24.324Z <HassController:ERR> HassController#hass request 1762866984323<2025-11-11 14:16:24> (call_service) failed: [Error] Not a parseable type for dictionary value @ data['date'] [-] [latest-25310]2025-11-11T13:16:24.324Z <HassController:WARN> HassController#hass action x_hass_input_datetime.set_datetime({ "time": "03:00", "timestamp": 0 }) on Entity#hass>input_datetime_vvb_natt failed! [latest-25310]2025-11-11T13:16:24.324Z <HassController:INFO> Service call payload: {"type":"call_service","service_data":{"date":null,"time":"03:00","datetime":null,"timestamp":0},"domain":"input_datetime","service":"set_datetime","target":{"entity_id":"input_datetime.vvb_natt"},"id":1762866984323} [latest-25310]2025-11-11T13:16:24.324Z <HassController:INFO> Service data: {"fields":{"date":{"example":"\"2019-04-20\"","selector":{"text":{"multiline":false,"multiple":false}}},"time":{"example":"\"05:04:20\"","selector":{"time":{}}},"datetime":{"example":"\"2019-04-20 05:04:20\"","selector":{"text":{"multiline":false,"multiple":false}}},"timestamp":{"selector":{"number":{"min":0,"max":9223372036854776000,"mode":"box","step":1}}}},"target":{"entity":[{"domain":["input_datetime"]}]}} [latest-25310]2025-11-11T13:16:24.324Z <Engine:ERR> Engine#1 reaction rule-mgb8pfhs:S step 1 perform x_hass_input_datetime.set_datetime failed: [Error] Not a parseable type for dictionary value @ data['date'] [-] [latest-25310]2025-11-11T13:16:24.324Z <Engine:INFO> Engine#1 action args: { "time": "03:00", "timestamp": 0 } [latest-25310]2025-11-11T13:16:24.325Z <Engine:INFO> Resuming reaction Sätt Schema VVB i Home Assistant<AKTIV> (rule-mgb8pfhs:S) from step 2 [latest-25310]2025-11-11T13:16:24.325Z <Engine:INFO> Sätt Schema VVB i Home Assistant<AKTIV> all actions completed.
Multi-System Reactor
Reactor Version 25310 : Office Light control via rule in reactor no longer working since last update.
P
Hello, I currently have an office light (connected via a Leviton Zwave Dimmer switch) controlled from a Gen5 Aeotech Zwave switch installed on my Synology 720+ NAS. I run HA(2025.11.10) in a virtual machine from my NAS and Reactor on the container manager of the same NAS. Prior to updating to 25304 the rule I had set to turn the light on to a specific dimming value worked correctly. Now the rule appears to follow the decision tree, however the reaction does not trigger setting the dimming or turning on the office light? Strangely I can still turn the light on and off as well as dim it directly from HASS..? I have tried using the ''try this action'' button in the rules reaction setting and it will not control the light and does not throw an error flagÉ Please help, P.S Reactor has been rock steady for me over the last few years and I'm a big fan of this solution.
Multi-System Reactor
[Solved] alarm() in global expression throws error in log.
CrilleC
Topic thumbnail image
Multi-System Reactor
[Solved] Define function issue in latest-25304
CrilleC
Topic thumbnail image
Multi-System Reactor
No Upgrade Notification for Build 25308?
CatmanV2C
FWIW I'm no longer getting a notification from MSR that there's an update. Just thought I'd mention it C
Multi-System Reactor
Strange behavior in MSR latest-25304 with disabled groups in Reaction
therealdbT
Topic thumbnail image
Multi-System Reactor
[Reactor] Variables not updating correctly in latest-25201-2aa18550
therealdbT
Topic thumbnail image
Multi-System Reactor
The reaction stopped working (Google Nest max playing a video)
F
Topic thumbnail image
Multi-System Reactor
Handling Dead Entities and Renamed Entities
PablaP
Hello all.. been a minute! I recently rebuilt my Z wave network and migrated to a new z wave stick. In order to prevent any downtime I kept my original z wave network up and ran a docker version of Z Wave JS UI with my new controller. This way I could add device by device without having any devices down. I finally moved all the devices over to my new stick today. The final step was to migrate everything from my Docker instance of Z Wave JS UI to the HA add-on of Z Wave JS UI. However during this migration some of the names didn't populate correctly which I later managed to import back into Z Wave JS UI. The issue was in Reactor it is stuck on the default names and the entities are not updating. I removed the controller from Reactor, restarted, hard refreshed, and added the controller back however the new entity names have not updated. Also it seems like the old entities from my previous instance of Z Wave JS UI are lingering and not being marked as dead (I believe a certain amount of time needs to lapse before they're marked as dead in Reactor). My goal is to basically purge all the entities for the 'ZWaveJS' controller in Reactor so it can pull all the updated entity names and only the entities that exist in Z Wave JS UI. I cannot find a quick way to do this, I know entities can be deleted one by one, but with over 100 entities this would take long I am guessing that if I added the controller with a new name in in the Reactor config it would pull the updated entities and names but I think that would break my rules since the entity IDs would change (I made sure to name all the entities the exact same as they were previously to prevent this issue).
Multi-System Reactor
Strange behavior for MQTT templates using payload and attributes
therealdbT
Topic thumbnail image
Multi-System Reactor
[MSR] reactor-mqtt-contrib package for additional MQTT templates
therealdbT
I'm slowly migrating all my stuff to MQTT under MSR, so I have a central place to integrate everything (and, in a not-so-distant future, to remove virtual devices from my Vera and leave it running zwave only). Anyway, here's my reactor-mqtt-contrib package: https://github.com/dbochicchio/reactor-mqtt-contrib Simply download yaml files (everything or just the ones you need) and you're good to go. I have mapped my most useful devices, but I'll add others soon. Feel free to ask for specific templates, since I've worked a lot in the last weeks to understand and operate them. The templates are supporting both init and query, so you have always up-to-date devices at startup, and the ability to poll them. Online status is supported as well, so you can get disconnected devices with a simple expression. Many-many thanks to @toggledbits for its dedication, support, and patience with me and my requests
Multi-System Reactor
HA 2025.9.4 Supported Yet?
CatmanV2C
Tangentially did I miss 2025.9.4 getting blessed in MSR? I've been holding off Cheers C
Multi-System Reactor
Rule Set UI bug - RESOLVED
3
Topic thumbnail image
Multi-System Reactor

[Solved] Is there a cap or max number of devices a Global Reaction should not exceed?

Scheduled Pinned Locked Moved Multi-System Reactor
40 Posts 5 Posters 6.7k Views 5 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • wmarcolinW wmarcolin

    @toggledbits hi master!

    I am going into a state of despair with Hubitat, and thinking that I have made a bad switch from VeraPlus to Hubitat.

    Well, as you always comment, look at the log, as I already commented my suspicion that the MSR sent all commands to Hubitat, and this one that failed was confirmed, as you mention in your message.

    Routines below.

    67373a04-9814-4a02-aac3-c52757a66307-image.png

    0f16e5e1-31ce-4948-af46-e6e80b01149b-image.png

    7491a245-8ba7-405c-b1ed-c0661d96bb75-image.png

    19708ea3-5adc-41b4-afb8-3dc3f8880c20-image.png

    Looking at the log, I don't understand what sequence the MSR performed, but I see that all of the above actions were sent to Hubitat without fail.

    [latest-21349]2021-12-17T01:46:57.319Z <Rule:NOTICE> Rule#rule-kx9oxcss configuration changed; reloading
    [latest-21349]2021-12-17T01:46:57.321Z <Rule:NOTICE> Rule#rule-kx9oxcss stopping rule
    [latest-21349]2021-12-17T01:46:57.324Z <Rule:NOTICE> Rule Rule#rule-kx9oxcss stopped
    [latest-21349]2021-12-17T01:46:57.325Z <Rule:INFO> Rule#rule-kx9oxcss (nGarden) started
    [latest-21349]2021-12-17T01:46:57.327Z <Rule:INFO> nGarden (Rule#rule-kx9oxcss) SET!
    [latest-21349]2021-12-17T01:46:57.331Z <Engine:INFO> Enqueueing "nGarden<SET>" (rule-kx9oxcss:S)
    [latest-21349]2021-12-17T01:46:57.345Z <Engine:NOTICE> Starting reaction nGarden<SET> (rule-kx9oxcss:S)
    [latest-21349]2021-12-17T01:46:57.346Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>298: http://192.168.33.22/apps/api/67/devices/298/on
    [latest-21349]2021-12-17T01:46:57.348Z <Engine:INFO> Enqueueing "nLight Garden ON" (re-kx65h5u7)
    [latest-21349]2021-12-17T01:46:57.350Z <Engine:INFO> Enqueueing "nLight Security ON" (re-kx67amal)
    [latest-21349]2021-12-17T01:46:57.352Z <Engine:INFO> Enqueueing "nLight Corredor Evening" (re-kx659j8a)
    [latest-21349]2021-12-17T01:46:57.377Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 4
    [latest-21349]2021-12-17T01:46:57.378Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>296: http://192.168.33.22/apps/api/67/devices/296/on
    [latest-21349]2021-12-17T01:46:57.379Z <Engine:NOTICE> Starting reaction nLight Garden ON (re-kx65h5u7)
    [latest-21349]2021-12-17T01:46:57.380Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>97: http://192.168.33.22/apps/api/67/devices/97/on
    [latest-21349]2021-12-17T01:46:57.381Z <Engine:NOTICE> Starting reaction nLight Security ON (re-kx67amal)
    [latest-21349]2021-12-17T01:46:57.382Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>197: http://192.168.33.22/apps/api/67/devices/197/on
    [latest-21349]2021-12-17T01:46:57.383Z <Engine:NOTICE> Starting reaction nLight Corredor Evening (re-kx659j8a)
    [latest-21349]2021-12-17T01:46:57.384Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>419: http://192.168.33.22/apps/api/67/devices/419/on
    [latest-21349]2021-12-17T01:46:57.456Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 5
    [latest-21349]2021-12-17T01:46:57.457Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>297: http://192.168.33.22/apps/api/67/devices/297/on
    [latest-21349]2021-12-17T01:46:57.563Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 1
    [latest-21349]2021-12-17T01:46:57.564Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>162: http://192.168.33.22/apps/api/67/devices/162/on
    [latest-21349]2021-12-17T01:46:57.673Z <Engine:NOTICE> Resuming reaction nLight Security ON (re-kx67amal) from step 1
    [latest-21349]2021-12-17T01:46:57.674Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>229: http://192.168.33.22/apps/api/67/devices/229/on
    [latest-21349]2021-12-17T01:46:57.782Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 1
    [latest-21349]2021-12-17T01:46:57.784Z <Engine:NOTICE> nLight Corredor Evening delaying until 1639705619783<16/12/2021 20:46:59>
    [latest-21349]2021-12-17T01:46:57.889Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 6
    [latest-21349]2021-12-17T01:46:57.890Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>449: http://192.168.33.22/apps/api/67/devices/449/on
    [latest-21349]2021-12-17T01:46:57.995Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 2
    [latest-21349]2021-12-17T01:46:57.996Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>385: http://192.168.33.22/apps/api/67/devices/385/on
    [latest-21349]2021-12-17T01:46:58.106Z <Engine:NOTICE> Resuming reaction nLight Security ON (re-kx67amal) from step 2
    [latest-21349]2021-12-17T01:46:58.106Z <Engine:INFO> nLight Security ON all actions completed.
    [latest-21349]2021-12-17T01:46:58.214Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 7
    [latest-21349]2021-12-17T01:46:58.215Z <Engine:INFO> nGarden<SET> all actions completed.
    [latest-21349]2021-12-17T01:46:58.321Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 3
    [latest-21349]2021-12-17T01:46:58.322Z <Engine:INFO> nLight Garden ON all actions completed.
    [latest-21349]2021-12-17T01:46:59.795Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 2
    [latest-21349]2021-12-17T01:46:59.796Z <HubitatController:null> HubitatController#hubitat final action path for color_temperature.set on Entity#hubitat>419: http://192.168.33.22/apps/api/67/devices/419/setColorTemperature/2850
    [latest-21349]2021-12-17T01:46:59.800Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 3
    [latest-21349]2021-12-17T01:46:59.800Z <Engine:INFO> nLight Corredor Evening all actions completed.
    
    

    I have checked each of the devices, all are in the log. I am even using the action_pace: 100 setting and I see that the interval was obeyed.

    But out of 10 lights that should be on, as you can see on the Hubitat's panel only 2 were.

    86c83747-f825-4f79-a871-ea952be25377-image.png

    Sorry @toggledbits to ask, and I will understand if you don't answer because I see that the MSR is perfect.

    Any recommendations for Hubitat? Reset the whole Z-wave radio and pair it again? Any APP that can check radio occupancy or Hubitat processing? Maybe you with your experience can give me some guidance, and again sorry, I know it is not MSR, but I'm almost back to the Vera with all its problems of drive and evolution.

    G Offline
    G Offline
    gwp1
    wrote on last edited by
    #19

    @wmarcolin Something you've not surfaced, wireless interference. How close is your Hubitat to your WiFi router (they should not be near each other as they will interfere due to the frequencies in use), how close are your z-wave hubs to each other, etc.

    Are all devices the kind that are plugged into electricity? I've discovered with some battery-operated devices that they sleep a lot to conserve battery and that delays actions. My iblind controllers are a perfect example: sending a refresh first and then the command seems to make them much happier regarding responding to commands.

    As @toggledbits noted in a response to me previously in the thread (and you've supported), the mesh plays a huge role, too.

    I had a Veralite and then moved to a VeraSecure. I've done direct comparisons between my VeraSecure and Hubitat using MSR to trigger the rules and the Hubitat was much faster. That, amongst other reasons, is why my VeraSecure is now completely offline.

    *Hubitat C-7 2.4.3.149
    *Proxmox VE v8, Beelink MiniPC 12GBs, SSD

    *HASS 2025.11.1
    w/ ZST10-700 fw 7.18.3

    *Prod MSR in docker/portainer
    MSR: latest-25310-dc2bb580
    MQTTController: 25139
    ZWave Controller: 25139

    wmarcolinW 1 Reply Last reply
    0
    • wmarcolinW wmarcolin

      @toggledbits hi master!

      I am going into a state of despair with Hubitat, and thinking that I have made a bad switch from VeraPlus to Hubitat.

      Well, as you always comment, look at the log, as I already commented my suspicion that the MSR sent all commands to Hubitat, and this one that failed was confirmed, as you mention in your message.

      Routines below.

      67373a04-9814-4a02-aac3-c52757a66307-image.png

      0f16e5e1-31ce-4948-af46-e6e80b01149b-image.png

      7491a245-8ba7-405c-b1ed-c0661d96bb75-image.png

      19708ea3-5adc-41b4-afb8-3dc3f8880c20-image.png

      Looking at the log, I don't understand what sequence the MSR performed, but I see that all of the above actions were sent to Hubitat without fail.

      [latest-21349]2021-12-17T01:46:57.319Z <Rule:NOTICE> Rule#rule-kx9oxcss configuration changed; reloading
      [latest-21349]2021-12-17T01:46:57.321Z <Rule:NOTICE> Rule#rule-kx9oxcss stopping rule
      [latest-21349]2021-12-17T01:46:57.324Z <Rule:NOTICE> Rule Rule#rule-kx9oxcss stopped
      [latest-21349]2021-12-17T01:46:57.325Z <Rule:INFO> Rule#rule-kx9oxcss (nGarden) started
      [latest-21349]2021-12-17T01:46:57.327Z <Rule:INFO> nGarden (Rule#rule-kx9oxcss) SET!
      [latest-21349]2021-12-17T01:46:57.331Z <Engine:INFO> Enqueueing "nGarden<SET>" (rule-kx9oxcss:S)
      [latest-21349]2021-12-17T01:46:57.345Z <Engine:NOTICE> Starting reaction nGarden<SET> (rule-kx9oxcss:S)
      [latest-21349]2021-12-17T01:46:57.346Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>298: http://192.168.33.22/apps/api/67/devices/298/on
      [latest-21349]2021-12-17T01:46:57.348Z <Engine:INFO> Enqueueing "nLight Garden ON" (re-kx65h5u7)
      [latest-21349]2021-12-17T01:46:57.350Z <Engine:INFO> Enqueueing "nLight Security ON" (re-kx67amal)
      [latest-21349]2021-12-17T01:46:57.352Z <Engine:INFO> Enqueueing "nLight Corredor Evening" (re-kx659j8a)
      [latest-21349]2021-12-17T01:46:57.377Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 4
      [latest-21349]2021-12-17T01:46:57.378Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>296: http://192.168.33.22/apps/api/67/devices/296/on
      [latest-21349]2021-12-17T01:46:57.379Z <Engine:NOTICE> Starting reaction nLight Garden ON (re-kx65h5u7)
      [latest-21349]2021-12-17T01:46:57.380Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>97: http://192.168.33.22/apps/api/67/devices/97/on
      [latest-21349]2021-12-17T01:46:57.381Z <Engine:NOTICE> Starting reaction nLight Security ON (re-kx67amal)
      [latest-21349]2021-12-17T01:46:57.382Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>197: http://192.168.33.22/apps/api/67/devices/197/on
      [latest-21349]2021-12-17T01:46:57.383Z <Engine:NOTICE> Starting reaction nLight Corredor Evening (re-kx659j8a)
      [latest-21349]2021-12-17T01:46:57.384Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>419: http://192.168.33.22/apps/api/67/devices/419/on
      [latest-21349]2021-12-17T01:46:57.456Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 5
      [latest-21349]2021-12-17T01:46:57.457Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>297: http://192.168.33.22/apps/api/67/devices/297/on
      [latest-21349]2021-12-17T01:46:57.563Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 1
      [latest-21349]2021-12-17T01:46:57.564Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>162: http://192.168.33.22/apps/api/67/devices/162/on
      [latest-21349]2021-12-17T01:46:57.673Z <Engine:NOTICE> Resuming reaction nLight Security ON (re-kx67amal) from step 1
      [latest-21349]2021-12-17T01:46:57.674Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>229: http://192.168.33.22/apps/api/67/devices/229/on
      [latest-21349]2021-12-17T01:46:57.782Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 1
      [latest-21349]2021-12-17T01:46:57.784Z <Engine:NOTICE> nLight Corredor Evening delaying until 1639705619783<16/12/2021 20:46:59>
      [latest-21349]2021-12-17T01:46:57.889Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 6
      [latest-21349]2021-12-17T01:46:57.890Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>449: http://192.168.33.22/apps/api/67/devices/449/on
      [latest-21349]2021-12-17T01:46:57.995Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 2
      [latest-21349]2021-12-17T01:46:57.996Z <HubitatController:null> HubitatController#hubitat final action path for power_switch.on on Entity#hubitat>385: http://192.168.33.22/apps/api/67/devices/385/on
      [latest-21349]2021-12-17T01:46:58.106Z <Engine:NOTICE> Resuming reaction nLight Security ON (re-kx67amal) from step 2
      [latest-21349]2021-12-17T01:46:58.106Z <Engine:INFO> nLight Security ON all actions completed.
      [latest-21349]2021-12-17T01:46:58.214Z <Engine:NOTICE> Resuming reaction nGarden<SET> (rule-kx9oxcss:S) from step 7
      [latest-21349]2021-12-17T01:46:58.215Z <Engine:INFO> nGarden<SET> all actions completed.
      [latest-21349]2021-12-17T01:46:58.321Z <Engine:NOTICE> Resuming reaction nLight Garden ON (re-kx65h5u7) from step 3
      [latest-21349]2021-12-17T01:46:58.322Z <Engine:INFO> nLight Garden ON all actions completed.
      [latest-21349]2021-12-17T01:46:59.795Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 2
      [latest-21349]2021-12-17T01:46:59.796Z <HubitatController:null> HubitatController#hubitat final action path for color_temperature.set on Entity#hubitat>419: http://192.168.33.22/apps/api/67/devices/419/setColorTemperature/2850
      [latest-21349]2021-12-17T01:46:59.800Z <Engine:NOTICE> Resuming reaction nLight Corredor Evening (re-kx659j8a) from step 3
      [latest-21349]2021-12-17T01:46:59.800Z <Engine:INFO> nLight Corredor Evening all actions completed.
      
      

      I have checked each of the devices, all are in the log. I am even using the action_pace: 100 setting and I see that the interval was obeyed.

      But out of 10 lights that should be on, as you can see on the Hubitat's panel only 2 were.

      86c83747-f825-4f79-a871-ea952be25377-image.png

      Sorry @toggledbits to ask, and I will understand if you don't answer because I see that the MSR is perfect.

      Any recommendations for Hubitat? Reset the whole Z-wave radio and pair it again? Any APP that can check radio occupancy or Hubitat processing? Maybe you with your experience can give me some guidance, and again sorry, I know it is not MSR, but I'm almost back to the Vera with all its problems of drive and evolution.

      toggledbitsT Offline
      toggledbitsT Offline
      toggledbits
      wrote on last edited by toggledbits
      #20

      @wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:

      Looking at the log, I don't understand what sequence the MSR performed, but I see that all of the above actions were sent to Hubitat

      Hmmm. I can't answer for the Hubitat part, but I explain the order. The first three actions in nGarden<SET> are Run Reaction, so these enqueue those reactions with the executive -- they are not run in-line. That's why you see the three "Enqueueing" lines, followed by a resume of nGarden<Set> from step 4. We see the action output for device 298, which is step 3 (numbered from 0), before the enqueue messages because enqueueing itself is an asynchronous operation, so the executive quickly started the three Run Reaction enqueue requests, then ran the device 298 Entity Action. Running an entity action is asynchronous, so the executive had to wait for that operation to finish. Since it went into a wait state, the tasks for the three Run Reaction enqueues could run, so they did. When they were done and the 298 device action was finished sending, nGarden<SET> could then resume from step 4 (numbered from 0, so 5 as we look at it). That's device 296 so we see that on the next line. Again, device actions have to wait for the send, so execution paused of nGarden<Set> paused there, which allowed nLight Garden ON, the first of the three enqueued reactions, to start and send its first command to device 97. That blocked that reaction, so nLight Security ON was next in the queue and it started and sent its first device action to 197. That blocked that reaction, so nLight Corredor Evening started and ran its first action against 419. It blocked, of course, so everything paused about 70ms until nGarden<Set> became the first ready task, so it resumed at step 5 (from 0, or 6 as we count from 1). And so on, until all were sent.

      I'm not sure what your pacing configuration was at this point, but overall it appears about right for the number of tasks sent. It's hard to tell without more debug on, and maybe I'll add some "standard" messages about device queueing while we're looking at this (since debug on a Controller instance can be very large and a bit like sipping from a firehouse).

      One thing to note also is that each Entity Action blocks while sending -- the reaction waits for the hub to acknowledge the request. For that to happen, the request must be sent, and the hub has to give an HTTP 200 (OK) response to the request (if it gives an error, that would be logged, and there are no errors logged in this snippet). So at the least, the hub has acknowledged the request, but that doesn't mean it has completed the request, let alone that the request was successful in its overall execution (e.g. manipulating the device). That's a different and much bigger problem.

      I'm not done looking at this. I want to study the timing more carefully as well. There's something about it that doesn't seem right to me. As I said, I'm going to add some more standard (non-debug) diagnostic output to this while we're looking at it, and roll a new release later today, for you to try and send me new logs.

      Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

      wmarcolinW 1 Reply Last reply
      0
      • CrilleC Crille

        @wmarcolin I don't have a Hubitat myself but have you looked at the logs on the hub per https://docs.hubitat.com/index.php?title=Logs for hints of what is happening when set reaction fires?

        wmarcolinW Offline
        wmarcolinW Offline
        wmarcolin
        wrote on last edited by
        #21

        @crille thanks for your comment, I will look at the log information. I think I even have to activate it, because so far nothing has shown up. Thanks for the information.

        1 Reply Last reply
        0
        • G gwp1

          @wmarcolin Something you've not surfaced, wireless interference. How close is your Hubitat to your WiFi router (they should not be near each other as they will interfere due to the frequencies in use), how close are your z-wave hubs to each other, etc.

          Are all devices the kind that are plugged into electricity? I've discovered with some battery-operated devices that they sleep a lot to conserve battery and that delays actions. My iblind controllers are a perfect example: sending a refresh first and then the command seems to make them much happier regarding responding to commands.

          As @toggledbits noted in a response to me previously in the thread (and you've supported), the mesh plays a huge role, too.

          I had a Veralite and then moved to a VeraSecure. I've done direct comparisons between my VeraSecure and Hubitat using MSR to trigger the rules and the Hubitat was much faster. That, amongst other reasons, is why my VeraSecure is now completely offline.

          wmarcolinW Offline
          wmarcolinW Offline
          wmarcolin
          wrote on last edited by
          #22

          @gwp1

          Ok, as you can see from the picture, my Hubitat is next to my Asus router, where the Vera Plus used to be. In theory radio waves from the router should interfere with the Zigbee because both frequencies are in 2.4, and should not in any way interfere with the Hubitat that works at a frequency of 900 Mhz. But I'm trying everything, and I've already moved the Hubitat 2 meters away from the router.

          ce28b775-0bc0-4e79-a7c9-f6de8601b8da-image.png

          The vera was already disconnected and I have also removed it and put it away, who knows, maybe in the future I will find some use for it.

          With respect to your comment of battery devices, I understand, but it is not the case, as I mentioned above I tried to make a sequence of lights, the 6 devices are 4 Aeontec Micro Switch G2 (DSC26103), 1 Zipato Bulb 2, and another Everspring AN145, that is all connected to the electricity all the time. Your battery point is very valid, but it is not what has caused me panic.

          Finally, your comment about Hubitat being faster than Vera is what I read everywhere, but unfortunately, this is not what is happening to me at the moment. Slow reactions, even using Hubitat's own dashboard commands it takes a while for the action to happen, so I go back to the mesh network issue above. Maybe now moving it can help.

          But still remains the question of the lack of response to MSR at the same speed.

          I will now read the post of @toggledbits to see what he comments.

          Thank you very much for your attention.

          1 Reply Last reply
          0
          • toggledbitsT Offline
            toggledbitsT Offline
            toggledbits
            wrote on last edited by toggledbits
            #23

            Build 21351 just posted. This has a fix for a problem in the setup of the task queue for HubitatController that is likely causing it to not pace as expected. Let's give this a try, and in addition, try a few different values for action_pace. I would start from 25 and work up.

            Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

            1 Reply Last reply
            0
            • toggledbitsT toggledbits

              @wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:

              Looking at the log, I don't understand what sequence the MSR performed, but I see that all of the above actions were sent to Hubitat

              Hmmm. I can't answer for the Hubitat part, but I explain the order. The first three actions in nGarden<SET> are Run Reaction, so these enqueue those reactions with the executive -- they are not run in-line. That's why you see the three "Enqueueing" lines, followed by a resume of nGarden<Set> from step 4. We see the action output for device 298, which is step 3 (numbered from 0), before the enqueue messages because enqueueing itself is an asynchronous operation, so the executive quickly started the three Run Reaction enqueue requests, then ran the device 298 Entity Action. Running an entity action is asynchronous, so the executive had to wait for that operation to finish. Since it went into a wait state, the tasks for the three Run Reaction enqueues could run, so they did. When they were done and the 298 device action was finished sending, nGarden<SET> could then resume from step 4 (numbered from 0, so 5 as we look at it). That's device 296 so we see that on the next line. Again, device actions have to wait for the send, so execution paused of nGarden<Set> paused there, which allowed nLight Garden ON, the first of the three enqueued reactions, to start and send its first command to device 97. That blocked that reaction, so nLight Security ON was next in the queue and it started and sent its first device action to 197. That blocked that reaction, so nLight Corredor Evening started and ran its first action against 419. It blocked, of course, so everything paused about 70ms until nGarden<Set> became the first ready task, so it resumed at step 5 (from 0, or 6 as we count from 1). And so on, until all were sent.

              I'm not sure what your pacing configuration was at this point, but overall it appears about right for the number of tasks sent. It's hard to tell without more debug on, and maybe I'll add some "standard" messages about device queueing while we're looking at this (since debug on a Controller instance can be very large and a bit like sipping from a firehouse).

              One thing to note also is that each Entity Action blocks while sending -- the reaction waits for the hub to acknowledge the request. For that to happen, the request must be sent, and the hub has to give an HTTP 200 (OK) response to the request (if it gives an error, that would be logged, and there are no errors logged in this snippet). So at the least, the hub has acknowledged the request, but that doesn't mean it has completed the request, let alone that the request was successful in its overall execution (e.g. manipulating the device). That's a different and much bigger problem.

              I'm not done looking at this. I want to study the timing more carefully as well. There's something about it that doesn't seem right to me. As I said, I'm going to add some more standard (non-debug) diagnostic output to this while we're looking at it, and roll a new release later today, for you to try and send me new logs.

              wmarcolinW Offline
              wmarcolinW Offline
              wmarcolin
              wrote on last edited by
              #24

              @toggledbits once again thank you for your kind attention.

              What depends on me to send log, do tests I am at your disposal. Just tell me what I should do that I will be promptly attending.

              @gwp1 in a previous message I had informed that I had moved Hubitat 2 meters away from my central office, which has my Asus router, no-break, the modems of the internet providers. I have just radicalized, and now the Hubitat is more than 20 meters away and open space, well away from all this magnetic field and radio frequency of a possible interference, let's see if this helps communication.

              So now it remains to investigate why the very fast MSR running on a dedicated notebook (Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 8Gb RAM, 300Gb SSD), maybe running over the Hubitat.

              Thanks

              toggledbitsT 1 Reply Last reply
              1
              • wmarcolinW wmarcolin

                @toggledbits once again thank you for your kind attention.

                What depends on me to send log, do tests I am at your disposal. Just tell me what I should do that I will be promptly attending.

                @gwp1 in a previous message I had informed that I had moved Hubitat 2 meters away from my central office, which has my Asus router, no-break, the modems of the internet providers. I have just radicalized, and now the Hubitat is more than 20 meters away and open space, well away from all this magnetic field and radio frequency of a possible interference, let's see if this helps communication.

                So now it remains to investigate why the very fast MSR running on a dedicated notebook (Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 8Gb RAM, 300Gb SSD), maybe running over the Hubitat.

                Thanks

                toggledbitsT Offline
                toggledbitsT Offline
                toggledbits
                wrote on last edited by
                #25

                @wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:

                Just tell me what I should do that I will be promptly attending.

                You are always helpful, and that's appreciated. For the moment, just run it as it comes. Let's see what happens. Start with a 25 for action_pace, and if you continue to have issues, move it up to 50. If 50 isn't enough, then I'll ask for logs. Let me know how it goes.

                Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                wmarcolinW 2 Replies Last reply
                0
                • toggledbitsT toggledbits

                  @wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:

                  Just tell me what I should do that I will be promptly attending.

                  You are always helpful, and that's appreciated. For the moment, just run it as it comes. Let's see what happens. Start with a 25 for action_pace, and if you continue to have issues, move it up to 50. If 50 isn't enough, then I'll ask for logs. Let me know how it goes.

                  wmarcolinW Offline
                  wmarcolinW Offline
                  wmarcolin
                  wrote on last edited by
                  #26

                  @toggledbits

                  OK, changed action_pace to 25 (it was 100), and upgraded to build 21351.

                  1 Reply Last reply
                  0
                  • toggledbitsT toggledbits

                    @wmarcolin said in [Solved] Is there a cap or max number of devices a Global Reaction should not exceed?:

                    Just tell me what I should do that I will be promptly attending.

                    You are always helpful, and that's appreciated. For the moment, just run it as it comes. Let's see what happens. Start with a 25 for action_pace, and if you continue to have issues, move it up to 50. If 50 isn't enough, then I'll ask for logs. Let me know how it goes.

                    wmarcolinW Offline
                    wmarcolinW Offline
                    wmarcolin
                    wrote on last edited by
                    #27

                    @toggledbits in this last upgrade process, I got these zombie processes. How is it possible to kill them? Thanks.

                    516e1036-ae68-4f4a-a146-9268527fe4a1-imagem.png

                    1 Reply Last reply
                    0
                    • toggledbitsT Offline
                      toggledbitsT Offline
                      toggledbits
                      wrote on last edited by
                      #28

                      Stop Reactor. Then grab the reactor.log file and upload it to me. I'm going to DM you a link.

                      After you've uploaded the log file, remove the storage/states/reaction_queue.json file and restart Reactor.

                      Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                      wmarcolinW 2 Replies Last reply
                      0
                      • toggledbitsT toggledbits

                        Stop Reactor. Then grab the reactor.log file and upload it to me. I'm going to DM you a link.

                        After you've uploaded the log file, remove the storage/states/reaction_queue.json file and restart Reactor.

                        wmarcolinW Offline
                        wmarcolinW Offline
                        wmarcolin
                        wrote on last edited by
                        #29

                        @toggledbits done boss!!

                        1 Reply Last reply
                        1
                        • toggledbitsT toggledbits

                          Stop Reactor. Then grab the reactor.log file and upload it to me. I'm going to DM you a link.

                          After you've uploaded the log file, remove the storage/states/reaction_queue.json file and restart Reactor.

                          wmarcolinW Offline
                          wmarcolinW Offline
                          wmarcolin
                          wrote on last edited by
                          #30

                          @toggledbits

                          I think that in another track you have also received this message, which is occurring exactly when running the above event of turning on several lights.

                          d2a231c7-0825-412c-bca0-c61db13fb384-image.png

                          toggledbitsT 1 Reply Last reply
                          0
                          • wmarcolinW wmarcolin

                            @toggledbits

                            I think that in another track you have also received this message, which is occurring exactly when running the above event of turning on several lights.

                            d2a231c7-0825-412c-bca0-c61db13fb384-image.png

                            toggledbitsT Offline
                            toggledbitsT Offline
                            toggledbits
                            wrote on last edited by
                            #31

                            @wmarcolin Very good. That means the more aggressive checks are working. It appears that Hubitat's event socket is a good bit more fragile than its Hass equal. I will add an option to the next release to silence this warning (although the reconnect will still be logged in the log file). You should also be able to see the effect of the reconnects on the system entity's x_hubitat_sys.reconnects counter.

                            For comparison, I don't get these errors unless I force them. The one restart shown below was because I upgraded the hub to 2.3.0.120.

                            b66090a7-5f2d-42a7-81a6-6934f71f378f-image.png

                            Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                            1 Reply Last reply
                            0
                            • toggledbitsT Offline
                              toggledbitsT Offline
                              toggledbits
                              wrote on last edited by
                              #32

                              TL;DR: Hubitat needs more aggressive WebSocket connection health tests and recovery, and that's been added as of 21351. When recovery is needed (which should be very rare), device states may be delayed up to 120 seconds. Don't use WiFi for your Reactor host or hub in production. If your Reactor host and hub aren't on the same network segment (LAN), you may see more reconnects. If you see reconnects when your Reactor host and hub are on the same network segment, you likely have a network quality issue.

                              I want to explain how I understand the problem reported, and how the fix (which is more of a workaround) works.

                              The events websocket is one of two channels used by HubitatController to get data from the hub. When HubitatController connects to the hub, it begins with a query to MakerAPI to fetch the bulk data for all devices -- "give me everything". Thereafter, the events socket provides (only) updates (changes). Being a WebSocket connection, it has a standard-required implementation of ping-pong that both serves to keep the connection alive and test the health of the connection. In Reactor, I use a standard library to provide the WebSocket implementation, and this library is in wide use, so while it's almost certainly not bug free (nothing is), it has sufficient exposure to be considered trustworthy. I imagine Hubitat does the same thing, but since it's a closed system, I don't know for sure; they may use something common, or they may have rolled their own, or they may have chosen some black sheep from among many choices for whatever reason. In any case, neither Hubitat nor Reactor implement the WebSocket protocol itself, we just use our respective WebSocket libraries to open and manage the connections and send/receive data.

                              Apparently there is a failure mode for the connection, and we don't know if it's on the Hubitat (Java) side or in the nodejs package, where the events can stop coming, but apparently the ping-pong mechanism continues to work for the connection, otherwise it would be torn down/flagged as closed/error by the libraries on both ends. There's no easy way to tell if Hubitat has stopped sending messages or the nodejs library has stopped receiving or passing them, and since the libraries/packages on both ends are black boxes as far as I'm concerned, I don't really care, I just want it to work better. So...

                              HubitatController versions prior to 21351 relied solely on the WebSocket's native ping-pong mechanism to describe connection health, as Reactor does for Hass and even its own UI-to-Engine connection (lending credence to the theory that the nodejs library is not the cause). But for Hubitat it appears the WebSocket ping-pong alone is not enough, so 21351 has introduced some additional tests at the application layer. If any of these fails, the connections are closed and re-opened. When reopened, a full device/state inventory is done again as usual, so the current state of all devices is reestablished. Any missed device updates during the "dead time" would be corrected by this inventory.

                              By the way, one of the things that exacerbates the problem with the Hubitat events WebSocket is that it's a one-way connection at its application later: Hubitat only transmits. There is no message I can send over the WebSocket for which I could expect a speedy reply as proof of health. I have to find other things to do through MakerAPI in an attempt to force Hubitat to send me data over the WebSocket, and this takes more time as well. If there was two-way communication, it would be a lot easier and faster to know if the connection was healthy.

                              So the question that remains, then, is what is that timing? By default, HubitatController will start its aggressive recovery at 60 seconds of channel silence. If the channel then remains silent for an additional 60 seconds, the close/re-open recovery occurs. So even if the connection fails, the maximum time to recovery and correct state of all devices will be just over 120 seconds. So even in worst-case conditions, entity states should not lag more than that. Given that these stalls are the exception rather than the rule for most users, these pauses should be rare.

                              There is one tuning parameter that may be useful to set on VPN connections or any other "distanced" connection (i.e. any connection where the Reactor host and the hub are not on the same network segment, and in particular may traverse connection-managing software or hardware like proxies, stateful packet inspection and intrusion detection systems, load balancers, etc.). That is websocket_ping_interval, which will be added to the next build. This will set the interval, in milliseconds, between pings (default 60000). This should be sufficiently narrow to prevent some VPNs from aborting the socket in some cases (see the WebSocket missive at the end), but if not, smaller values can be tried, at the expense of additional network traffic and a slight touch on CPU. If the reconnects don't improve significantly, a different VPN option should be chosen.

                              And this brings me to two recommendations:

                              1. You should not use a WiFi connection for either the Reactor host or hub in production use. These are fine for testing and experimentation, but are an inappropriate choice in production for both reliability and performance reasons.
                              2. If you use a VPN between the Reactor host and the hub, "subscription VPNs" are probably best avoided, as these will be the most aggressive in connection management and cause the most disconnects and failures. That's because they are tuned for surfing web traffic and checking email, basically, where the connections are open-query-response-close — connections don't stay open very long, typically. There are optimizations of HTTP where connections are kept open after a response to allow for a follow-up query (e.g. request an embedded image after requesting a document), but these are generally much shorter than the expected infinite open of a WebSocket connection (more on this at the bottom). Point-to-point VPNs that you set up and manage yourself are likely to provide better stability and performance (e.g. PPTP, SSH tunnels, etc.).

                              I will also add this: in my network, I do not get Hubitat WebSocket stalls and reconnects. I have had to force them through various devious means to test the behavior I've just implemented. I owned a commercial data center in the San Francisco Bay Area, with a managed network offered to clients with 100% uptime service level agreements. I built and maintained that network. My home network is a reflection of that — good quality equipment, meticulous cabling, sensible architecture (scaled down appropriately for the lesser scope and demands), and active data collection and monitoring. My network runs clean, and when there are problems, I know it (and where). If your Reactor host and Hubitat hub are on the same network segment, hardwired and not WiFi, and you are getting reconnects, I think you should audit your network quality. Something isn't happy. It only takes one bad cable, or one bad connector on one end of one cable, to cause a lot of problems.

                              ---

                              For anyone interested, one reason why WebSockets can be troublesome in network environments where connection management may be done between the endpoints is that a WebSocket typically begins its life as an HTTP request. The client makes an HTTP request to the server with specific headers that ask that the connection to be "converted" (they call it "upgraded") from HTTP to WebSocket. If the server agrees, the connection becomes persistent and a new session layer is introduced. But because the connection starts as HTTP, any interstitial proxy or device that is managing the connection as it passes through may mistake it for a plain HTTP (web page) request, and when the connection doesn't tear itself down after a short period the proxy/device thinks is reasonable for HTTP requests, it forces the issue and sends a disconnect to both ends. This is necessary because tracking open connections consumes memory and CPU on these devices, and in a commercial ISP environment this could mean tens of thousands or hundreds of thousands of open connections at a single interface/gateway. So to keep from being overwhelmed, these devices may just time out those connections (on a predictable schedule/timeout, or just due to load), but because it's not really an HTTP connection at that point (it's been upgraded to a WebSocket), the proxy/device is breaking a connection that both the server and client expect to be persistent, and that can then cause all kinds of problems, the most benign of which is forcing the two endpoints to reconnect frequently and waste a lot of time and bandwidth doing it. On a LAN, you typically don't have these problems, because the two endpoints have no stateful management between them (network switches, if present between, just pass traffic, not manage connections), so barring network problems, there's no reason for them to be disconnected until either asks to close.

                              Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                              wmarcolinW 1 Reply Last reply
                              1
                              • toggledbitsT toggledbits

                                TL;DR: Hubitat needs more aggressive WebSocket connection health tests and recovery, and that's been added as of 21351. When recovery is needed (which should be very rare), device states may be delayed up to 120 seconds. Don't use WiFi for your Reactor host or hub in production. If your Reactor host and hub aren't on the same network segment (LAN), you may see more reconnects. If you see reconnects when your Reactor host and hub are on the same network segment, you likely have a network quality issue.

                                I want to explain how I understand the problem reported, and how the fix (which is more of a workaround) works.

                                The events websocket is one of two channels used by HubitatController to get data from the hub. When HubitatController connects to the hub, it begins with a query to MakerAPI to fetch the bulk data for all devices -- "give me everything". Thereafter, the events socket provides (only) updates (changes). Being a WebSocket connection, it has a standard-required implementation of ping-pong that both serves to keep the connection alive and test the health of the connection. In Reactor, I use a standard library to provide the WebSocket implementation, and this library is in wide use, so while it's almost certainly not bug free (nothing is), it has sufficient exposure to be considered trustworthy. I imagine Hubitat does the same thing, but since it's a closed system, I don't know for sure; they may use something common, or they may have rolled their own, or they may have chosen some black sheep from among many choices for whatever reason. In any case, neither Hubitat nor Reactor implement the WebSocket protocol itself, we just use our respective WebSocket libraries to open and manage the connections and send/receive data.

                                Apparently there is a failure mode for the connection, and we don't know if it's on the Hubitat (Java) side or in the nodejs package, where the events can stop coming, but apparently the ping-pong mechanism continues to work for the connection, otherwise it would be torn down/flagged as closed/error by the libraries on both ends. There's no easy way to tell if Hubitat has stopped sending messages or the nodejs library has stopped receiving or passing them, and since the libraries/packages on both ends are black boxes as far as I'm concerned, I don't really care, I just want it to work better. So...

                                HubitatController versions prior to 21351 relied solely on the WebSocket's native ping-pong mechanism to describe connection health, as Reactor does for Hass and even its own UI-to-Engine connection (lending credence to the theory that the nodejs library is not the cause). But for Hubitat it appears the WebSocket ping-pong alone is not enough, so 21351 has introduced some additional tests at the application layer. If any of these fails, the connections are closed and re-opened. When reopened, a full device/state inventory is done again as usual, so the current state of all devices is reestablished. Any missed device updates during the "dead time" would be corrected by this inventory.

                                By the way, one of the things that exacerbates the problem with the Hubitat events WebSocket is that it's a one-way connection at its application later: Hubitat only transmits. There is no message I can send over the WebSocket for which I could expect a speedy reply as proof of health. I have to find other things to do through MakerAPI in an attempt to force Hubitat to send me data over the WebSocket, and this takes more time as well. If there was two-way communication, it would be a lot easier and faster to know if the connection was healthy.

                                So the question that remains, then, is what is that timing? By default, HubitatController will start its aggressive recovery at 60 seconds of channel silence. If the channel then remains silent for an additional 60 seconds, the close/re-open recovery occurs. So even if the connection fails, the maximum time to recovery and correct state of all devices will be just over 120 seconds. So even in worst-case conditions, entity states should not lag more than that. Given that these stalls are the exception rather than the rule for most users, these pauses should be rare.

                                There is one tuning parameter that may be useful to set on VPN connections or any other "distanced" connection (i.e. any connection where the Reactor host and the hub are not on the same network segment, and in particular may traverse connection-managing software or hardware like proxies, stateful packet inspection and intrusion detection systems, load balancers, etc.). That is websocket_ping_interval, which will be added to the next build. This will set the interval, in milliseconds, between pings (default 60000). This should be sufficiently narrow to prevent some VPNs from aborting the socket in some cases (see the WebSocket missive at the end), but if not, smaller values can be tried, at the expense of additional network traffic and a slight touch on CPU. If the reconnects don't improve significantly, a different VPN option should be chosen.

                                And this brings me to two recommendations:

                                1. You should not use a WiFi connection for either the Reactor host or hub in production use. These are fine for testing and experimentation, but are an inappropriate choice in production for both reliability and performance reasons.
                                2. If you use a VPN between the Reactor host and the hub, "subscription VPNs" are probably best avoided, as these will be the most aggressive in connection management and cause the most disconnects and failures. That's because they are tuned for surfing web traffic and checking email, basically, where the connections are open-query-response-close — connections don't stay open very long, typically. There are optimizations of HTTP where connections are kept open after a response to allow for a follow-up query (e.g. request an embedded image after requesting a document), but these are generally much shorter than the expected infinite open of a WebSocket connection (more on this at the bottom). Point-to-point VPNs that you set up and manage yourself are likely to provide better stability and performance (e.g. PPTP, SSH tunnels, etc.).

                                I will also add this: in my network, I do not get Hubitat WebSocket stalls and reconnects. I have had to force them through various devious means to test the behavior I've just implemented. I owned a commercial data center in the San Francisco Bay Area, with a managed network offered to clients with 100% uptime service level agreements. I built and maintained that network. My home network is a reflection of that — good quality equipment, meticulous cabling, sensible architecture (scaled down appropriately for the lesser scope and demands), and active data collection and monitoring. My network runs clean, and when there are problems, I know it (and where). If your Reactor host and Hubitat hub are on the same network segment, hardwired and not WiFi, and you are getting reconnects, I think you should audit your network quality. Something isn't happy. It only takes one bad cable, or one bad connector on one end of one cable, to cause a lot of problems.

                                ---

                                For anyone interested, one reason why WebSockets can be troublesome in network environments where connection management may be done between the endpoints is that a WebSocket typically begins its life as an HTTP request. The client makes an HTTP request to the server with specific headers that ask that the connection to be "converted" (they call it "upgraded") from HTTP to WebSocket. If the server agrees, the connection becomes persistent and a new session layer is introduced. But because the connection starts as HTTP, any interstitial proxy or device that is managing the connection as it passes through may mistake it for a plain HTTP (web page) request, and when the connection doesn't tear itself down after a short period the proxy/device thinks is reasonable for HTTP requests, it forces the issue and sends a disconnect to both ends. This is necessary because tracking open connections consumes memory and CPU on these devices, and in a commercial ISP environment this could mean tens of thousands or hundreds of thousands of open connections at a single interface/gateway. So to keep from being overwhelmed, these devices may just time out those connections (on a predictable schedule/timeout, or just due to load), but because it's not really an HTTP connection at that point (it's been upgraded to a WebSocket), the proxy/device is breaking a connection that both the server and client expect to be persistent, and that can then cause all kinds of problems, the most benign of which is forcing the two endpoints to reconnect frequently and waste a lot of time and bandwidth doing it. On a LAN, you typically don't have these problems, because the two endpoints have no stateful management between them (network switches, if present between, just pass traffic, not manage connections), so barring network problems, there's no reason for them to be disconnected until either asks to close.

                                wmarcolinW Offline
                                wmarcolinW Offline
                                wmarcolin
                                wrote on last edited by
                                #33

                                @toggledbits master!

                                Another lesson in knowledge, and dedication to understanding and solving problems.

                                Well, I spent two days reading your post before trying to answer anything.

                                First starting from the end, in my case my network is all Giga, CAT7 cables with industrial connectors. All the cabling of the house came with cables ready to not run the risk of redoing connectors. Then I even hired a company to certify the network, which also uses management switches that I can validate the network. So ping inside my house between any equipment is < 1ms.

                                8531f96f-e697-48f6-9691-8e74ebacf9d1-image.png

                                From the technical side what I see is that the connection between MSR and Hubitat is fragile, and it becomes even more so when MSR is much faster in its actions without Hubitat responses.

                                What you would be doing for a future version is strengthening the validation of this communication, in particular, to return the status of given orders. That is, if you tell the MSR to turn on a light, make sure that the return state says that the light is on.

                                I was in doubt, in case of saying that it was not turned on, would there be a reset? Could this be a summary?

                                toggledbitsT 1 Reply Last reply
                                0
                                • wmarcolinW wmarcolin

                                  @toggledbits master!

                                  Another lesson in knowledge, and dedication to understanding and solving problems.

                                  Well, I spent two days reading your post before trying to answer anything.

                                  First starting from the end, in my case my network is all Giga, CAT7 cables with industrial connectors. All the cabling of the house came with cables ready to not run the risk of redoing connectors. Then I even hired a company to certify the network, which also uses management switches that I can validate the network. So ping inside my house between any equipment is < 1ms.

                                  8531f96f-e697-48f6-9691-8e74ebacf9d1-image.png

                                  From the technical side what I see is that the connection between MSR and Hubitat is fragile, and it becomes even more so when MSR is much faster in its actions without Hubitat responses.

                                  What you would be doing for a future version is strengthening the validation of this communication, in particular, to return the status of given orders. That is, if you tell the MSR to turn on a light, make sure that the return state says that the light is on.

                                  I was in doubt, in case of saying that it was not turned on, would there be a reset? Could this be a summary?

                                  toggledbitsT Offline
                                  toggledbitsT Offline
                                  toggledbits
                                  wrote on last edited by toggledbits
                                  #34

                                  Ping (and traceroute) are not network quality tools. They are path tools. To measure link quality, since you said you had managed switches, you'd need to look at the error counts on each port. That may be worth a squiz.

                                  As @Alan_F has observed, you can get restarts from a quiet channel that is just naturally quiet. If you don't have a lot of devices, and things aren't changing often, it's very likely to see a quiet channel. HubitatController probes by picking a device and making it refresh, which usually causes some event activity on the WebSocket channel. But it's possible that the device it picks (randomly) may not do much when refreshed. You can set probe_device to the device ID (number) of a device that consistently causes events when asked to refresh to mitigate that random effect. That may take some experimentation, using the Hubitat UI to ask devices to refresh, and watching for green highlights in the Reactor Entities list. When you find a device that consistent "lights" when you refresh it on Hubitat, you've probably got a reliable probe device.

                                  I have noted in my own network that certain devices (that shall be unnamed) are sufficiently chatty that I'm at no risk of a quiet channel. Let's just say anything that monitors energy is a good canary in the mine.

                                  Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                                  wmarcolinW 1 Reply Last reply
                                  0
                                  • toggledbitsT toggledbits

                                    Ping (and traceroute) are not network quality tools. They are path tools. To measure link quality, since you said you had managed switches, you'd need to look at the error counts on each port. That may be worth a squiz.

                                    As @Alan_F has observed, you can get restarts from a quiet channel that is just naturally quiet. If you don't have a lot of devices, and things aren't changing often, it's very likely to see a quiet channel. HubitatController probes by picking a device and making it refresh, which usually causes some event activity on the WebSocket channel. But it's possible that the device it picks (randomly) may not do much when refreshed. You can set probe_device to the device ID (number) of a device that consistently causes events when asked to refresh to mitigate that random effect. That may take some experimentation, using the Hubitat UI to ask devices to refresh, and watching for green highlights in the Reactor Entities list. When you find a device that consistent "lights" when you refresh it on Hubitat, you've probably got a reliable probe device.

                                    I have noted in my own network that certain devices (that shall be unnamed) are sufficiently chatty that I'm at no risk of a quiet channel. Let's just say anything that monitors energy is a good canary in the mine.

                                    wmarcolinW Offline
                                    wmarcolinW Offline
                                    wmarcolin
                                    wrote on last edited by
                                    #35

                                    @toggledbits

                                    The ping tests were to validate the connection, speed, below is the screen of the switch that the MSR and Hubitat are on, no errors. I have a weekly reset programmed, so this information is from last Friday until now.

                                    ca085548-f8da-408f-b57c-9535eda260ab-image.png

                                    Ok, so what I should set now is a device connection probe, something like the APP Device Watchdog. I'm going this way.

                                    Anyway, I think I mentioned before, I ended up moving Hubitat far away from my wifi router on Saturday, and since last night after deactivating the devices again one by one, I started rebuilding my mesh. I already see good results with the move away, several devices that previously had trouble responding, now seem to work better. Hopefully, by Wednesday I will have finished the rebuild, and will be able to check how the network behavior and ping-pong between MSR and Hubitat is now with the latest version of MSR.

                                    Thanks.

                                    1 Reply Last reply
                                    0
                                    • toggledbitsT Offline
                                      toggledbitsT Offline
                                      toggledbits
                                      wrote on last edited by
                                      #36

                                      Very good. In my limited experience with this new heuristic, Z-Wave devices seem to be the most predictable for probes among the basic devices. In the current build, whatever you choose must support the Refresh (Hubitat native) capability (aka x_hubitat_Refresh in Reactor). In the next build, the config value probe_action will be available to let you use a different action, if you need to, and probe_parameters (an object) containing any key/value parameter pairs that the command may need (optional).

                                      I've also found that the "Hub Information" app (or more correctly, the device that this app manages) makes a good probe target. It has some useful data, and also very reliably updates fields when commanded to do so, even at a high frequency, so the next build will have specific support for this app/device if it finds it among the hub's inventory.

                                      Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                                      wmarcolinW 1 Reply Last reply
                                      0
                                      • toggledbitsT toggledbits

                                        Very good. In my limited experience with this new heuristic, Z-Wave devices seem to be the most predictable for probes among the basic devices. In the current build, whatever you choose must support the Refresh (Hubitat native) capability (aka x_hubitat_Refresh in Reactor). In the next build, the config value probe_action will be available to let you use a different action, if you need to, and probe_parameters (an object) containing any key/value parameter pairs that the command may need (optional).

                                        I've also found that the "Hub Information" app (or more correctly, the device that this app manages) makes a good probe target. It has some useful data, and also very reliably updates fields when commanded to do so, even at a high frequency, so the next build will have specific support for this app/device if it finds it among the hub's inventory.

                                        wmarcolinW Offline
                                        wmarcolinW Offline
                                        wmarcolin
                                        wrote on last edited by
                                        #37

                                        @toggledbits hi!

                                        I have been operating for two days with version 23360, and the chaos described in the messages above is no longer happening. The recurrent failures of leaving an action without executing as a whole are no longer oberved.

                                        Thanks for all your efforts!

                                        A question, in version 21353 we started seeing the message below, which you made possible after deactivating the alert.

                                        b2ee2739-e903-4f3a-875a-b1158c2fa53d-image.png

                                        Is it possible to make some kind of query, or trigger some action when this message happens? I would like for example to send a Telegram to notify me.

                                        Thanks.

                                        1 Reply Last reply
                                        0
                                        • toggledbitsT Offline
                                          toggledbitsT Offline
                                          toggledbits
                                          wrote on last edited by toggledbits
                                          #38

                                          Yes, there are at least two ways to do that now. Any thoughts on where you might look?

                                          Edit: Same question as this thread: https://smarthome.community/topic/832/identifying-when-msr-cannot-connect-to-home-assistant

                                          And this thread is wandering, so maybe let's leave it alone.

                                          Author of Multi-system Reactor and Reactor, DelayLight, Switchboard, and about a dozen other plugins that run on Vera and openLuup.

                                          wmarcolinW 2 Replies Last reply
                                          0
                                          • toggledbitsT toggledbits unlocked this topic on
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          Recent Topics

                                          • [Solved] Local expression in Rule does not evaluate as they used to do
                                            CrilleC
                                            Crille
                                            0
                                            5
                                            99

                                          • Reactor (Multi-System/Multi-Hub) Announcements
                                            toggledbitsT
                                            toggledbits
                                            5
                                            130
                                            74.0k

                                          • Home Assistant 2025.11.2 and latest-25315
                                            G
                                            gwp1
                                            0
                                            6
                                            120

                                          • Notice to Docker + ARM Users (RPi 3/4/5 and others)
                                            toggledbitsT
                                            toggledbits
                                            1
                                            1
                                            61

                                          • Requesting a proper ARM64/aarch64 Docker image (Pi 5 support)
                                            M
                                            mgvra
                                            1
                                            3
                                            148

                                          • Script action and custom timers
                                            toggledbitsT
                                            toggledbits
                                            0
                                            4
                                            158

                                          • Help resolve change in behaviour post update
                                            CatmanV2C
                                            CatmanV2
                                            0
                                            12
                                            438

                                          • There is an alternative to homebridge-mqttthing
                                            akbooerA
                                            akbooer
                                            1
                                            2
                                            126

                                          • Reactor w/HA 2025.11 error on set_datetime service call setting only time
                                            CrilleC
                                            Crille
                                            0
                                            6
                                            180

                                          • Reactor Version 25310 : Office Light control via rule in reactor no longer working since last update.
                                            toggledbitsT
                                            toggledbits
                                            0
                                            17
                                            523

                                          • Shelly Wall Display XL
                                            akbooerA
                                            akbooer
                                            2
                                            9
                                            835

                                          • [Solved] alarm() in global expression throws error in log.
                                            toggledbitsT
                                            toggledbits
                                            0
                                            26
                                            950
                                          Powered by NodeBB | Contributors
                                          Hosted freely by 10RUPTiV - Solutions Technologiques | Contact us
                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Unsolved