OpenLuup unavailable
-
The last three mornings OpenLuup for some reason has been unavailable. In other words it has not been responding when trying to connect via AltUI or Console.
I have been trying to investigate a bit around what this could be but not got it pinned down.
I run OpenLuup with one Zway bridge and two Vera bridges. The Zway and Veras have all been available every time, so that should not be the problem.
When the problems started I had updated OpenLuup with Mqtt and added a few Mqtt devices. At the same time I also updated AltUI to the latest version. At the same time I also removed some devices from Room 101.
The Mqtt devices are normally reporting data as they should.In Lua startup I have added the following entries related to Mqtt:
luup.attr_set ("openLuup.MQTT.Port", 1883) luup.attr_set ("openLuup.MQTT.Username", "luup") luup.attr_set ("openLuup.MQTT.Password", "openluup") luup.attr_set ("openLuup.MQTT.PublishDeviceStatus", "0") -- ================================== Mqtt functions below: local json = require "openLuup.json" function MyMQTThandler (topic, message) local info = json.decode (message) local t = info.AM2301.Temperature local h = info.AM2301.Humidity luup.variable_set ("urn:upnp-org:serviceId:TemperatureSensor1", "CurrentTemperature", t, 216) luup.variable_set ("urn:micasaverde-com:serviceId:HumiditySensor1", "CurrentLevel", h, 228) end luup.register_handler ("MyMQTThandler", "mqtt:tele/tasmota_test/SENSOR") function TasmotaCO2Axel (topic, message) local info = json.decode (message) local t = info.BME280.Temperature local h = info.BME280.Humidity local p = info.BME280.Pressure local c = info.MHZ19B.CarbonDioxide luup.variable_set ("urn:upnp-org:serviceId:TemperatureSensor1", "CurrentTemperature", t, 229) -- set temp to device luup.variable_set ("urn:micasaverde-com:serviceId:HumiditySensor1", "CurrentLevel", h, 230) -- set humidity to device luup.variable_set ("urn:micasaverde-com:serviceId:GenericSensor1", "CurrentLevel", p, 232) -- set air pressure to device luup.variable_set ("urn:micasaverde-com:serviceId:GenericSensor1", "CurrentLevel", c, 231) -- set CO2 to device end luup.register_handler ("TasmotaCO2Axel", "mqtt:tele/TasmotaCO2Ax/SENSOR") function TasmotaCO2Anton (topic, message) local info = json.decode (message) local t = info.BME280.Temperature local h = info.BME280.Humidity local p = info.BME280.Pressure local c = info.MHZ19B.CarbonDioxide luup.variable_set ("urn:upnp-org:serviceId:TemperatureSensor1", "CurrentTemperature", t, 233) -- set temp to device luup.variable_set ("urn:micasaverde-com:serviceId:HumiditySensor1", "CurrentLevel", h, 234) -- set humidity to device luup.variable_set ("urn:micasaverde-com:serviceId:GenericSensor1", "CurrentLevel", p, 235) -- set air pressure to device luup.variable_set ("urn:micasaverde-com:serviceId:GenericSensor1", "CurrentLevel", c, 236) -- set CO2 to device end luup.register_handler ("TasmotaCO2Anton", "mqtt:tele/TasmotaCO2An/SENSOR") -- =======================================================================
This morning it seems as if OpenLuup stopped working at 06:24, I kept MqttExporer up and running over night and it the data reporting had stopped at that time. (The image shows the last reported data and then it shows that it commences again after I restarted OpenLuup at 07:19.
Before restarting I grabbed all the log files, the last few entries in the log are:
2021-03-18 06:24:09.712 luup.variable_set:: 149.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature was: -10.6 now: -10.5 #hooks:0 2021-03-18 06:24:09.712 openLuup.server:: request completed (2 bytes, 1 chunks, 4 ms) tcp{client}: 0x55ca8d503438 2021-03-18 06:24:09.721 luup.variable_set:: 229.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature was: 20.4 now: 20.4 #hooks:0 2021-03-18 06:24:09.721 luup.variable_set:: 230.urn:micasaverde-com:serviceId:HumiditySensor1.CurrentLevel was: 46.7 now: 46.7 #hooks:0 2021-03-18 06:24:09.721 luup.variable_set:: 232.urn:micasaverde-com:serviceId:GenericSensor1.CurrentLevel was: 1016.5 now: 1016.5 #hooks:0 2021-03-18 06:24:09.721 luup.variable_set:: 231.urn:micasaverde-com:serviceId:GenericSensor1.CurrentLevel was: 2010 now: 2020 #hooks:0 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8cb5cf68 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8c855e68 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8d2daa08 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8c7c3218 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8ca87948 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8d312d98 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8c7e95f8 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8c74a218 2021-03-18 06:24:09.754 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8bfefea8 2021-03-18 06:24:09.755 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55ca8c6ebce8 2021-03-18 06:24:09.756 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x55ca8c4dfe48 2021-03-18 06:24:09.756 openLuup.mqtt:: closed tcp{client}: 0x55ca8c4dfe48 2021-03-18 06:24:09.756 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/TasmotaCO2Ax/# tcp{client}: 0x55ca8c4dfe48 2021-03-18 06:24:09.756 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/tasmotas/# tcp{client}: 0x55ca8c4dfe48 2021-03-18 06:24:09.756 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/TasmotaCO2Ax_fb/# tcp{client}: 0x55ca8c4dfe48 2021-03-18 06:24:09.756 openLuup.mqtt:: ERROR publishing application message for mqtt:cmnd/TasmotaCO2Ax/POWER : closed
The last rows are a bit strange to me, them seem to indicate that something goes wrong with the Mqtt for that device?
@akbooer I assume that it could be more of interest in the logs, I can of course email them if needed. Also let me know if any more information is missing.
-
Thanks for reporting that.
I see that you have two essentially identical handlers. This is not a problem, but I would choose to move that processing to a helper function, indexing the device number list with the received topic.
You say that you’ve seen this on several consecutive days. Does it always happen at the same time?
To interpret the log:
- the sending of a published message to TasmotaC02Ax has failed, because it found that the socket was closed.
- this might indicate an issue with that device, or a timeout.
- as a result, the server has cancelled the subscriptions from that device.
The server is designed to cope with this situation, and I’ve tested clients going up and down all the time, without a problem. Nonetheless, the only mechanism that I know of which can totally freeze a system is an incomplete socket transaction due to network issues. AFAIK, there haven’t been any recent changes to other socket handling code (HTTP server, and the like.)
-
@akbooer I grabbed the logs from yesterday also and it is not exactly the same time, it was 04:22 if I read the logs correctly.
I do not have the logs from the day before, but it was ok in the evening and frozen in the morning at approx 07:00 when I checked.As it seems it was two devices (TasmotaCO2Ax and TasmotaCO2An) that got an error.
2021-03-17 04:22:26.339 luup.variable_set:: 236.urn:micasaverde-com:serviceId:GenericSensor1.CurrentLevel was: 587 now: 593 #hooks:0 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3e8cfd8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d43678a8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3e375d8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3cf1848 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3dc5a88 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d46e2698 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d40adfc8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d45a96a8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3a55748 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3192208 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3f40118 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3b455a8 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3ea7548 2021-03-17 04:22:26.340 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d47471d8 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d3b92588 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d37022a8 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d37bc3e8 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d43b3068 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d4d6c0f8 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d40f7b68 2021-03-17 04:22:26.341 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x5580d47249a8 2021-03-17 04:22:26.341 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x5580d3abfcb8 2021-03-17 04:22:26.341 openLuup.mqtt:: RECEIVE ERROR: closed tcp{client}: 0x5580d3abfcb8 2021-03-17 04:22:26.341 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x5580d4268eb8 2021-03-17 04:22:26.341 openLuup.mqtt:: RECEIVE ERROR: closed tcp{client}: 0x5580d4268eb8 2021-03-17 04:22:26.341 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x5580d47d0f18 2021-03-17 04:22:26.341 openLuup.mqtt:: RECEIVE ERROR: closed tcp{client}: 0x5580d47d0f18 2021-03-17 04:22:26.341 openLuup.server:: GET /data_request?id=lu_action&DeviceNum=175&id=variableset&serviceId=urn:micasaverde-com:serviceId:HumiditySensor1&Variable=CurrentLevel&Value=44.1 HTTP/1.1 tcp{client}: 0x5580d3ab86f8 2021-03-17 04:22:26.342 luup.variable_set:: 175.urn:micasaverde-com:serviceId:HumiditySensor1.CurrentLevel was: 44.1 now: 44.1 #hooks:0 2021-03-17 04:22:26.342 openLuup.server:: request completed (2 bytes, 1 chunks, 0 ms) tcp{client}: 0x5580d3ab86f8 2021-03-17 04:22:26.344 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x5580d3718798 2021-03-17 04:22:26.344 openLuup.mqtt:: closed tcp{client}: 0x5580d3718798 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/TasmotaCO2Ax/# tcp{client}: 0x5580d3718798 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/tasmotas/# tcp{client}: 0x5580d3718798 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2Ax UNSUBSCRIBE from cmnd/TasmotaCO2Ax_fb/# tcp{client}: 0x5580d3718798 2021-03-17 04:22:26.344 openLuup.mqtt:: ERROR publishing application message for mqtt:cmnd/TasmotaCO2Ax/POWER : closed 2021-03-17 04:22:26.344 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x5580d3faa808 2021-03-17 04:22:26.344 openLuup.mqtt:: closed tcp{client}: 0x5580d3faa808 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/TasmotaCO2An_fb/# tcp{client}: 0x5580d3faa808 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/TasmotaCO2An/# tcp{client}: 0x5580d3faa808 2021-03-17 04:22:26.344 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/tasmotas/# tcp{client}: 0x5580d3faa808 2021-03-17 04:22:26.344 openLuup.mqtt:: ERROR publishing application message for mqtt:cmnd/TasmotaCO2An/POWER : closed
-
Ooh, that’s interesting. That’s an awful lot of HTTP sockets being closed. I assume you don’t actually have many active HTTP connections at night. I think your system might be running out of file descriptors (which are used by the socket library.). This gives me somewhere to look...
-
@akbooer I have some 10 external devices reporting temperature etc to some 35 devices in OpenLuup. The sockets in the log are for the updates of these devices.
Currently they are doing this via http, the intention is to move most of these to Mqtt, hopefully that will off load the http socket.When you say it, when I tested the Mqtt bridge from @therealdb I also had some problems with http endpoint crashes (see the posts around January 30:th in this thread).
So looking at the socket sounds like a good idea.I will do some more tests to try and get some more info also.
-
Also interesting!
You're able to play with the backlog queue length in the HTTP (and any other) server too. Default is 2000, I don't know why I chose that, it is perhaps too big. But you can use this startup configuration line to change that:
luup.atttr_set ("openLuup.HTTP.Backlog", 2000) -- queue length
see: http://w3.impa.br/~diego/software/luasocket/tcp.html#listen
Might be interesting to see if this makes any difference (eg. make it a lot smaller.)
Your help is much appreciated, since this is something that I can't reproduce at random on my own systems.
-
The latest development v21.3.18 has added a little log analysis tool to the log console pages. It shows the maximum gap between log entries and the time at which that ocurred. In normal operation this should be less than 120 seconds (since openLuup writes cpu and memory statistics at that rate.)
Should make it easier to spot any large breaks in the logs (obviously not if the entire system is frozen!)
-
I had the same crash this night. This time it became unavailable at 03:51.
Last night I reinstalled the two Tasmota sensors I added before the crash so that they have the same Tasmota release as the older sensor that I had for the initial tests when it did not crash. This made no difference, so I can rule that one out.
When looking in the log file from before the crash it looks very similar:
021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b6338f7778 2021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b6348fa9a8 2021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b63427b0d8 2021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b634b6da68 2021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b632a24e88 2021-03-19 03:51:48.344 openLuup.io.server:: HTTP:3480 connection closed openLuup.server.receive closed tcp{client}: 0x55b63443ef78 2021-03-19 03:51:48.348 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x55b633f49218 2021-03-19 03:51:48.348 openLuup.mqtt:: closed tcp{client}: 0x55b633f49218 2021-03-19 03:51:48.348 openLuup.mqtt:: tasmota_test UNSUBSCRIBE from cmnd/tasmota_test_fb/# tcp{client}: 0x55b633f49218 2021-03-19 03:51:48.348 openLuup.mqtt:: tasmota_test UNSUBSCRIBE from cmnd/tasmota_test/# tcp{client}: 0x55b633f49218 2021-03-19 03:51:48.348 openLuup.mqtt:: tasmota_test UNSUBSCRIBE from cmnd/tasmotas/# tcp{client}: 0x55b633f49218 2021-03-19 03:51:48.348 openLuup.mqtt:: ERROR publishing application message for mqtt:cmnd/tasmota_test/POWER : closed
At 08:38 it crashed again, this is the first time it has crashed so soon again:
21-03-19 08:38:41.016 openLuup.server:: GET /data_request?id=lu_action&DeviceNum=192&id=variableset&serviceId=urn:micasaverde-com:serviceId:HumiditySensor1&Variable=CurrentLevel&Value=39.4 HTTP/1.1 tcp{client}: 0x55beb3bcdbd8 2021-03-19 08:38:41.016 openLuup.server:: GET /data_request?id=lu_action&DeviceNum=164&id=variableset&serviceId=urn:upnp-org:serviceId:TemperatureSensor1&Variable=CurrentTemperature&Value=-2.1 HTTP/1.1 tcp{client}: 0x55beb3da8618 2021-03-19 08:38:41.016 luup.variable_set:: 192.urn:micasaverde-com:serviceId:HumiditySensor1.CurrentLevel was: 38.6 now: 39.4 #hooks:0 2021-03-19 08:38:41.016 openLuup.server:: request completed (2 bytes, 1 chunks, 0 ms) tcp{client}: 0x55beb3bcdbd8 2021-03-19 08:38:41.016 luup.variable_set:: 164.urn:upnp-org:serviceId:TemperatureSensor1.CurrentTemperature was: -2.1 now: -2.1 #hooks:0 2021-03-19 08:38:41.017 openLuup.server:: request completed (2 bytes, 1 chunks, 0 ms) tcp{client}: 0x55beb3da8618 2021-03-19 08:38:41.022 openLuup.io.server:: MQTT:1883 connection closed tcp{client}: 0x55beb3c83ca8 2021-03-19 08:38:41.022 openLuup.mqtt:: closed tcp{client}: 0x55beb3c83ca8 2021-03-19 08:38:41.022 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/TasmotaCO2An_fb/# tcp{client}: 0x55beb3c83ca8 2021-03-19 08:38:41.022 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/tasmotas/# tcp{client}: 0x55beb3c83ca8 2021-03-19 08:38:41.022 openLuup.mqtt:: TasmotaCO2An UNSUBSCRIBE from cmnd/TasmotaCO2An/# tcp{client}: 0x55beb3c83ca8 2021-03-19 08:38:41.022 openLuup.mqtt:: ERROR publishing application message for mqtt:cmnd/TasmotaCO2An/POWER : closed
I have also checked the log analysis tool and could see one device that was often very slow. This is temp data from a Shelly Uni brought in with Sitesensor. I notised the Shelly being very slow also when accessing via the webpage.
I could see the following in the log file just before the the first crash:
# lines: 1214, maximum gap: 42.646s @ 2021-03-19 03:51:46.785 2021-03-19 03:51:04.137 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LastRun was: 1616121064 now: 1616122264 #hooks:0 2021-03-19 03:51:04.137 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LastQuery was: 1616121064 now: 1616122264 #hooks:0 2021-03-19 03:51:04.137 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LastEval was: 1616121064 now: 1616122264 #hooks:0 2021-03-19 03:51:04.137 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Message was: Last query OK now: Requesting JSON... #hooks:0 2021-03-19 03:51:04.137 luup_log:206: SiteSensor: Requesting JSON data 2021-03-19 03:51:04.137 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 03:31:04: SiteSensor: Requesting JSON data|03:31:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ }|0... now: 03:51:04: SiteSensor: Requesting JSON data #hooks:0 2021-03-19 03:51:04.138 luup_log:206: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } 2021-03-19 03:51:04.138 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 03:51:04: SiteSensor: Requesting JSON data now: 03:51:04: SiteSensor: Requesting JSON data|03:51:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } #hooks:0 2021-03-19 03:51:46.785 luup_log:206: SiteSensor: Response is 908 bytes in "/tmp/Q_SiteSensor_206.txt" 2021-03-19 03:51:46.785 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 03:51:04: SiteSensor: Requesting JSON data|03:51:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } now: 03:51:04: SiteSensor: Requesting JSON data|03:51:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ }|0... #hooks:0 2021-03-19 03:51:46.786 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Message was: Requesting JSON... now: Parsing response... #hooks:0 2021-03-19 03:51:46.787 luup.set_failure:: status = 0 2021-03-19 03:51:46.787 luup.variable_set:: 206.urn:micasaverde-com:serviceId:HaDevice1.CommFailure was: 0 now: 0 #hooks:0 2021-03-19 03:51:46.787 luup.variable_set:: 206.urn:micasaverde-com:serviceId:HaDevice1.CommFailureTime was: 0 now: 0 #hooks:0 2021-03-19 03:51:46.787 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Message was: Parsing response... now: Retrieving last response... #hooks:0 2021-03-19 03:51:46.788 luup_log:206: SiteSensor: Eval #1: "response.ext_temperature[\"0\"].tC"=(number)24.44 2021-03-19 03:51:46.788 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 03:51:04: SiteSensor: Requesting JSON data|03:51:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ }|0... now: 03:51:04: SiteSensor: Requesting JSON data|03:51:04: SiteSensor: "GET" "http://192.168.1.31/status", headers={ }|0... #hooks:0
It was again very slow just before the second crash:
# lines: 1598, maximum gap: 46.871s @ 2021-03-19 08:38:34.554 2021-03-19 08:37:47.682 luup_log:206: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } 2021-03-19 08:37:47.682 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 08:37:47: SiteSensor: Requesting JSON data now: 08:37:47: SiteSensor: Requesting JSON data|08:37:47: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } #hooks:0 2021-03-19 08:38:34.554 luup_log:206: SiteSensor: Request returned no data, status 500 2021-03-19 08:38:34.554 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LogCapture was: 08:37:47: SiteSensor: Requesting JSON data|08:37:47: SiteSensor: "GET" "http://192.168.1.31/status", headers={ } now: 08:37:47: SiteSensor: Requesting JSON data|08:37:47: SiteSensor: "GET" "http://192.168.1.31/status", headers={ }|0... #hooks:0 2021-03-19 08:38:34.554 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Failed was: 0 now: 1 #hooks:0 2021-03-19 08:38:34.554 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.FailedSince was: now: 1616139514 #hooks:0 2021-03-19 08:38:34.554 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.LastFail was: 1616052193 now: 1616139514 #hooks:0 2021-03-19 08:38:34.555 luup.set_failure:: status = 1 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:HaDevice1.CommFailure was: 0 now: 1 #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:HaDevice1.CommFailureTime was: 0 now: 1616139514 #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:SecuritySensor1.Tripped was: 0 now: 1 #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:SecuritySensor1.LastTrip was: 1616052265 now: 1616139514 #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:SecuritySensor1.ArmedTripped was: 0 now: 1 #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Message was: Requesting JSON... now: Retrieving last response... #hooks:0 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:toggledbits-com:serviceId:SiteSensor1.Message was: Retrieving last response... now: Last query failed, HTTP status 500 #hooks:0 2021-03-19 08:38:34.555 luup.set_failure:: status = 1 2021-03-19 08:38:34.555 luup.variable_set:: 206.urn:micasaverde-com:serviceId:HaDevice1.CommFailure was: 1 now: 1 #hooks:0
After the second crash I deactivated the Sitesensor fetching this data from the Shelly. At least I have not seen so long delays after that.
I have not yet changed the server backlog.
I will continue to do some testing, but I am not quite sure what I am chasing at the moment.
-
OK, well thanks again for all that. I will take a while to consider this. However, my first suggestion would anyway have been to disable SiteSenso, if at all possible, and see how it goes.
Once again, there is nothing amiss with a break in the log of up to 120s, and, indeed, in my latest edit (internal, as yet) I have disabled the gap warning message unless is exceeds 120 seconds.
-
@akbooer yes that was my thought also and see how it goes.
Btw I typically have delays of 2-4 seconds. At times a bit longer, those often being Sitesensors or Netmon pinging PCs that are offline.
That Shelly was the one device often being really slow. In fact it is often slow accessing via the browser also.The strange thing is that all the devices have been up and running on my system for a really long time, but maybe adding the Mqtt on top was the last small straw. Strange anyway.
-
You mention a slow Shelly. This should not be a problem, but, again, worth just taking it out of the loop sometime? Why is it 'slow'?
-
@akbooer I do not fully know. It is a Shelly Uni with two relays and one temp sensor.
I have read so-so reports on the Shelly Uni in fact.
It is situated close to the wifi access point, but I moved it a bit just now and it got better wifi quality numbers.
It is sitting in the midst of the electrical central, could be one reason perhaps.
The idea is to move it to Mqtt for the temp reporting and then it should work better. I turned off that Sitesensor for now to test if it helps anything. -
That's great, thanks. Just to let you know that I am taking this very seriously and I now have a fairly continuous stream of UDP updates going from my VeraLite to three different openLuup systems (Rpi, BeagleBone Black, Synology Docker) via the UDP -> MQTT bridge, with MQTT Explorer running on an iMac subscribed to one or other of those. From time to time the iMac sleeps, so the Explorer disconnects and then later reconnects. All openLuup instances are running the latest v21.3.19 looking for large gaps in the logs (or, indeed, a system freeze.)
So I am stress-testing this as far as I can without having your exact configuration.
-
ArcherSreplied to akbooer on Mar 19, 2021, 10:37 PM last edited by ArcherS Mar 19, 2021, 6:59 PM
@akbooer I'm running on an an old 2,27x4 GHz Fujitsu i3 Esprimo SFF PC with Ubuntu 18.04.5 and 4GB RAM and a 120 GB SSD.
My Z-way server is on a separate Raspberry Pi 3B+ with a Razberry board. Then one old UI5 Vera Lite and a UI7 Vera Plus, both almost completely offloaded, just a few virtual devices left on them.
-
This morning no crash. Not sure which of the changes I made that made the difference. Still good progress and something to continue working from.
This is what I did yesterday:
-
Updated to OpenLuup 21.3.19 from 21.3.18
-
Removed the line
luup.attr_set ("openLuup.MQTT.PublishDeviceStatus", "0")
from Lua Startup. I tested reloading without the line and it was not needed anymore as it seems -
Deactivated a few Sitesensors; the slow Shelly Uni, one fetching weather data and three more fetching some local data
-
Deleted two devices from Netmon that does not exist anymore, i.e. never respond to ping
-
Cut the power to my three test Mqtt devices, i.e. no active Mqtt devices reporting anything over Mqtt all night
-
-
Progress indeed. A slow build back should isolate the culprit.
For my part, I’ve spotted a difference in behaviour between my RPi and Docker instances when there’s a network outage, resulting in an error message "no route to host" on the RPi which blocks the system for longer than 2 minutes. Needs further investigation.
-
The latest development version v21.3.20 has a new console page openLuup > System > Required which shows key system modules, and also those required by each plugin.
I am seeing some small version differences between my systems, but I don't think this is the issue. I would, nevertheless, be interested in your version of this page:
Incidentally, this system (running in a Docker on my Synology NAS) ran overnight with over 100,000 MQTT messages sent to MQTT Explorer, totalling over 18 Mb of data without any issues at all.
5/88