OpenLuup unavailable
-
rafale77replied to akbooer on Mar 25, 2021, 5:29 PM last edited by rafale77 Mar 25, 2021, 1:32 PM
True, it generates more TCP traffic to the machine and overall but... it may reduce traffic within the openLuup/socket instance since there could be traffic not destined/subscribed to openLuup but maybe I misunderstood something...
edit: I am still seeing this packets sending into a closed connection error appearing occasionally on 25b so the keepalive may not help in my case given how short the time is.
-
Indeed, the keepalive shouldn't make much difference since MQTT has a ping/response transaction to ostensibly keep the channel open anyway. It was worh a try. On my Synology Docker system, I'm not seeing any dropouts like that at all. Once again, closed connections should be handled correctly and aren't, in themselves, necessarily sinister. It's not even totally clear that this is the reason to hang the system. Perplexing.
-
@akbooer I think the KeepAlive is 15 seconds when I read up on Tasmota KeepAlive.
The connection from the devices to the server seem stable, I have seldom seen any dropouts in the device console anyway.Updated to v21.2.25b and restarted, we will se what happens!
Also enabledluup.attr_set ("openLuup.MQTT.PublishVariableUpdates", true)
, I had turned it off to bring down the load on messages. -
I looked through the log files for errors, the red highlighting is very good to highlight the errors.
This one is Homewave (192.168.1.127) on my phone. I noticed it was slow to connect. No crash though. I had a few more similar errors in the log a bit later.
2021-03-25 20:39:31.205 openLuup.io.server:: HTTP:3480 connection from 192.168.1.127 tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.207 openLuup.server:: GET /data_request?output_format=json&id=lu_status HTTP/1.1 tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.531 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x557d7a41acf8' sending 2 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.732 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x557d7a41acf8' sending 6 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.925 openLuup.server:: error 'closed' sending 2 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.925 openLuup.server:: ...only 0 bytes sent 2021-03-25 20:39:31.925 openLuup.server:: error 'closed' sending 6 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.925 openLuup.server:: ...only 0 bytes sent 2021-03-25 20:39:31.925 openLuup.server:: error 'closed' sending 857193 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.925 openLuup.server:: ...only 128000 bytes sent 2021-03-25 20:39:31.925 openLuup.server:: error 'closed' sending 2 bytes to tcp{client}: 0x557d7a41acf8 2021-03-25 20:39:31.925 openLuup.server:: ...only 0 bytes sent 2021-03-25 20:39:31.925 openLuup.server:: error 'closed' sending 5 bytes to tcp{client}: 0x557d7a41acf8
A few Whisper errors, could be related to a Luup reload at that I made at that time:
2021-03-25 19:28:41.905 openLuup.context_switch:: ERROR: [dev #0] ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) 2021-03-25 19:28:41.905 openLuup.scheduler:: 55.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature ERROR ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) function: 0x557d78862090 2021-03-25 19:28:42.325 openLuup.context_switch:: ERROR: [dev #0] ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) 2021-03-25 19:28:42.325 openLuup.scheduler:: 246.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature ERROR ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) function: 0x557d78862090 2021-03-25 19:28:42.326 openLuup.context_switch:: ERROR: [dev #0] ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) 2021-03-25 19:28:42.326 openLuup.scheduler:: 247.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature ERROR ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) function: 0x557d78862090 2021-03-25 19:28:42.363 openLuup.context_switch:: ERROR: [dev #0] ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) 2021-03-25 19:28:42.363 openLuup.scheduler:: 11669.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature ERROR ./openLuup/whisper.lua:101: bad argument #1 to 'format' (number expected, got string) function: 0x557d78862090 2021-03-25 19:28:42.758 luup_log:3: ALTUI: startupDeferred, called on behalf of device:3
-
The Homewave one is interesting. Might just be a slow connection. I'll make the timeout for socket.select() larger (probably user-configurable.) So HomeWave works OK anyway?
Ah, very interesting. I know what the second one is. Brand new variables are (in Vera, hence openLuup too) given the default value of EMPTY. These messages come from the HIstorian (expecting a numeric value) which you have enabled?? I suspect that they will go away after the first update (so after the system has been running a while, so long as the actual device is working.)
-
The homewave errors are the same as my occasional home assistant errors. My homewave is nowhere slow and was the initial source of my openLuup hangs. I just retested it with a transition between cellular+VPN to wifi and while I used to be able to cause openLuup to crash nearly 100% of the time doing that, I am no longer able to do it. Last year, the big factor was my upgrade of luasocket to the current github master branch which although it reports as 3.0rc1 is actually scm-2. This made a night and day difference and made even homewave feel significantly more responsive. I have since also moved to running openLuup to luajit with rapidjson which also is another performance improvement. It is possible that I accidentally downgraded my luasocket while troubleshooting something else at some point in the past few months which lead to the 2 crashes a couple of days ago. I have since been unable to crash openLuup again. Just a data point... I am still seeing these errors in the log but again, no crash.
This is the error I am seeing when I am switching network connection on homewave while it is polling openLuup:
2021-03-25 13:59:40.533 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x7f3a6d48d9d8' sending 2 bytes to tcp{client}: 0x7f3a6d48d9d8 2021-03-25 13:59:46.651 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x7f3a6d6c5520' sending 2 bytes to tcp{client}: 0x7f3a6d6c5520 2021-03-25 13:59:47.192 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x7f3a6d6c5520' sending 2 bytes to tcp{client}: 0x7f3a6d6c5520 2021-03-25 13:59:47.392 openLuup.server:: error 'socket.select() not ready to send tcp{client}: 0x7f3a6d6c5520' sending 6 bytes to tcp{client}: 0x7f3a6d6c5520
-
ArcherSreplied to akbooer on Mar 25, 2021, 9:30 PM last edited by ArcherS Mar 25, 2021, 5:38 PM
@akbooer yes Homewave works as it should. At times it is a bit slow to connect, usually quite ok.
I have enabled Historian and have connected it to Grafana in which I have some graphs.
@rafale77 I tried to switch between cellular&vpn and wifi with Homewave and it seems to work without crashing OpenLuup. I am using Qvpn from Qnap.
-
rafale77replied to ArcherS on Mar 27, 2021, 5:56 PM last edited by rafale77 Mar 27, 2021, 2:03 PM
Any update?
I just inadvertently stress tested openLuup's socket after all the updates this morning:
I went working in the yard cutting wood this morning which trips my ipcam motion sensor which then trips HomeAssistant object detection which then scans my camera stream using a deep learning model and when it detects a person sends an http call to openLuup to trip a security sensor on which I run a scene to do other things ranging from notifications to announcement depending on housemode etc... This happens at a rate of 5Hz (one call every 200ms)... You can see that the high frequency of http calls on top of the regular polling (I poll z-way at rate of 2Hz for example) made openLuup's CPU utilization jump! yet no crash...
Edit: And I checked the log to discover a lot more errors of sending bytes to a closed connection but as I said, it handled like a champ.
-
5% peak load? Shameful!
I’ve been stress testing too, and haven’t been able to make anything crash recently. I’m crafting a new Docker to take over from my main, and ageing, BeagleBone Black system, and the I/O there just whistles along with cpu at under 0.2%. I also see that RapidJson is available as a standard install of Alpine, so I may include that.
My conclusion, following yours, is that this is low-level LuaSocket issues on out of date libraries, which only show a problem when moderately stressed. I don’t think there’s anything more I can do at the openLuup coding level.
-
@akbooer said in OpenLuup unavailable:
5% peak load? Shameful!
I know! right? lol! As you know I optimize things to the edge... running on luajit, having turned JIT back on and rapidjson which I highly recommend over cjson because it enables not having to failover to dkjson if you so desire. I also regularly look at library updates which admittedly for lua, are sadly rare. I even tested a number of forks of luajit and found nothing worth the trouble at this point.
-
ArcherSreplied to rafale77 on Mar 27, 2021, 7:47 PM last edited by ArcherS Mar 28, 2021, 4:03 AM
@rafale77 @akbooer two days ago the system was quite slow and non-responsive. I then decided to disable the Mqtt server in OpenLuup to see what would happen. Now after two days no crash, so it seems as if this is somehow casing the crashes. Small progress perhaps.
Edit:
What I still have noticed is that the system has been quite slow on v21.3.25b and v2.3.24b with at times very long response times. Much worse that what it was 3-4 days ago for some reason, before updating from v2.3.21 to v2.3.24b.
On 2.3.21 the system was quite snappy, but after some time crashed without warning.
Now I updated to 21.3.26 and so far it seems much more responsive that on v2.3.24b / v21.3.25b.I thought I would updated the Lua sockets libraries, LuaFileSystem etc to the latest versions, but now I cannot find the openLuup > System > Required list list.
This is what I had alst time I checked a week ago:
Hints on what to update and how to do it much appreciated.
Edit: The update to 21.3.26 initially seems to have made the system more repsonsive for whatevere reason. Good news, let's see if it stays like that.
I see that I am on LuaFileSystem 1.6.3 that seems to be from 2015 and @rafale77 you are on 1.8.0 that is from 2020. This seems like something that could need updating. -
Just in case you would want to know how to update these libraries:
Need to install luarocks. This depends on what machine you are using but assuming you are on a debian base,sudo apt-get install luarocks
will do it. then you can start upgrading things:
-
@rafale77 said in OpenLuup unavailable:
Yeah @akbooer has moved that section under tables/luup files
Sorry! Yes, in fact they were duplicates (almost) and Files seemed a better place than System.
-
@archers said in OpenLuup unavailable:
Much worse that what it was 3-4 days ago for some reason. Then the system was quite snappy, but after some time. This was on v21.3.25b, now I updated to 21.3.26, I will see if ot continues to be slow.
OK, I have a possible answer to that. The system wait for up to a specified period of time if it fins that a socket is not ready to send. Initially, I used a timeout of zero (v21.3.24) bad choice, since most things timed out. Changed it to 0.2 seconds (v21.3.24b) , and later, then to 1.0 seconds (v21.3.26).
I will rever to something smaller, perhaps 0.1s. But you can change this (for the HTTP server) with:
luup.attr_set ("openLuup.HTTP.SelectWait", 0.1)
Again, apologies, but your excellent reporting of the effect of these changes is a very efficient way for me to zero in on issues, for which, thanks.
-
@akbooer thanks! I see that my text above got a bit incomplete.
This is how it should be ( I think, many updates lately
) :
What I still have noticed is that the system has been quite slow on v21.3.25b and v2.3.24b with at times very long response times. Much worse that what it was 3-4 days ago for some reason, before updating from v2.3.21 to v2.3.24b.
On 2.3.21 the system was quite snappy, but after some time crashed without warning.
Now I updated to 21.3.26 and so far it seems much more responsive that on v2.3.24b / v21.3.25b.For clarity I edited the post above also.
-
@rafale77 LuaFileSystem 1.8.0 Installed:
What I did was:
sudo apt update sudo apt install luarocks sudo luarocks install luafilesystem
I will keep it running without eabling Mqtt for a while and see if it seems stable.
-
While you are at it, I would recommend you upgrade luasec (ssl) as well.
63/88