Profiling Lua Code

akbooer

is it possible to somehow profile openLuup [code]

Due to the amazingly parsimonious and elegant design of Lua, it's possible to write a profiler for any Lua code (openLuup, or otherwise) in just a few tens of lines.

Here's a profiler which counts function calls (but doesn't time them.) You use it like this:

local profile = require "miniprofiler"
profile "on"     -- start profiling

-- run whatever code you want here

profile "off"      -- turn off profiling
print (profile)    -- print the result

profile "clear"      -- clear the results to start again

Note that the same object is used to configure the profile, hold the data, and format the output. I just love Lua.

...and here's the code for the profiler module itself (miniprofiler.lua)...

-- miniprofiler.lua

-- simple profiler for function calls
-- @akbooer  June 2020

--Use:
--  local profile = require "miniprofiler"
--  profile "on"       -- start profiling
--  profile "off"      -- stop  profiling
--  profile "clear"    -- reset counts
--  print (profile)    -- show  results
  
local calls = 0
local funcs = {}
local hook = debug.sethook

local function profile(event)
  if event == "call" then
    calls = calls + 1 
    local i = debug.getinfo (2, 'n')
    local n = table.concat {(i.name or '?'), '/', i.namewhat}
    funcs[n] = (funcs[n] or 0) + 1
  end
end

local function report()
  local fcts = {}
  for n,v in pairs (funcs) do fcts[#fcts+1] = {n,v} end       -- make name index
  table.sort (fcts, function (a,b) return b[2] < a[2] end)    -- alphabetize
  local result = "%12s   %-8s   %s"
  local out = {result: format (calls, "TOTAL","Function calls")}
  local split = "^([^/]+)/(.*)"
  for _, x in ipairs(fcts) do
    local f, w = x[1]: match (split)
    out[#out+1] = result: format (x[2], w, f)
  end
  out[#out+1] = ''
  return table.concat (out, '\n')
end

local function call (_, x)
  local f = {
    on    = function () hook (profile, 'c') end,
    off   = function () hook () end,
    clear = function () for x in pairs(funcs) do funcs[x] = nil end calls = 0 end}
  do (f[x] or f.clear) () end
end
  
return setmetatable (funcs, {__call = call, __tostring = report})

-----

akbooer

Just by way of example, here's the profile of a single line of code (albeit, one that you can't do in Vera) which fetches the entire history of openLuup's memory usage (probably a daily average, assuming you have the Historian enabled – in my case, about two year's worth of data)


local profile = require "miniprofiler"
profile "clear"

profile "on"
local x, t = luup.variable_get ("openLuup", "Memory_Mb", 2, {0, os.time()})
profile "off"

print (profile)

Resulting in the profiler output:

 ––– 31 lines ––– 35.2 ms –––

       18310   TOTAL      Function calls
       10960   local      (for generator)
        7300   global     tonumber
           7   upvalue    time
           6   method     oldest
           3   method     read
           2   field      unformatted_unpack
           2   global     type
           2              ?
           2   local      fct
           2   global     ipairs
           2   field      time
           2   upvalue    calc_offset
           2   field      concat
           2   method     gmatch
           2   method     seek
           1   local      profile
           1   upvalue    Wfetch
           1   field      timenow
           1   method     close
           1   field      variable_get
           1   method     variable_get
           1   upvalue    hook
           1   field      fetch
           1   field      clear
           1   field      floor
           1   field      dsv2filepath
           1   method     ipairs
           1   field      open
           1   global     pcall

a-lurker

That works. Here is some AES ECB 128 running. Note how often the 8 bit XOR is called! Clearly something you would want optimised (and it is).

  782620   TOTAL      Function calls
  588814   upvalue    xor8
   76050   field      insert
   33370   field      format
   12102   field      char
    7854   upvalue    shiftRowsInverse
    7293   upvalue    mixColumnsInverse
    6194   local      (for generator)
    5849   field      concat
    5328   upvalue    band
    4532   global     tostring
    3623   global     type
    2446   upvalue    rshift
    2375   global     ipairs
    2145   upvalue    p
    1680   upvalue    shiftRowsForward
    1560   upvalue    mixColumns
    1487   global     select
    1487   upvalue    print
    1223   global     unpack
    1156   field      gettime
    1107   method     match
     902   method     format
     898   field      send
     898   upvalue    write
     898   field      current_device
     898   upvalue    formatted_time
     898   field      date
     898   method     write
     898   field      floor
     894   upvalue    debug
     894   field      log
     818   upvalue    tableDump

akbooer

I see that xor8 is referenced as an ‘upvalue’. You may be able to access this faster by making it a local reference. Something like...

local xor8 = bit.xor8     -- or wherever it comes from

This places the reference in one of the virtual machine registers so that it doesn’t have to search up the chain of closures to find it.

I’ve had great success with using this to increase performance of a often-called function (over a million times) on another project I am working on, cutting overall run time to about 60% of the initial value.

YMMV.

a-lurker

The xor8 sits in a module ie the AES ECB 128 module and it is only called within that module - not from outside the module.

akbooer

Further to this:

@a-lurker said in openLuup: Suggestion:

is it possible to somehow profile openLuup to see what is really holding it up.

and this:

@akbooer said in openLuup: Suggestion:

the [debug] library can be slow, so timing is not really possible at the Lua level.

...it occurred to me that openLuup already has the tools you need, at least at the level of plugins (rather than individual functions.)

Each plugin has a dynamic attribute called cpu(s) with cumulative CPU time to sub-millisecond resolution. Everything you need can be wrapped into a single short function:

local function cpu ()
  local array
  local function sub(a,b)
    local s, b = array {}, b or {}
    for n,v in pairs(a) do s[n] = b[n] and v - b[n] or nil end
    return s
  end
  local function str(x)
    local con = table.concat
    local time, info = "%8.3f", "%12s %8s %s"
    local b = {info: format ("(s.ms)", "[#]", "device name")}
    for n,v in pairs(x) do 
      local name = luup.devices[n].description: match "%s*(.*)"
      b[#b+1] = info: format (time: format(v), con{'[', n, ']'}, name)
    end
    return con (b, '\n')
  end
  function array (x) setmetatable (x, {__sub = sub, __tostring = str}) return x end
  local t = array {}
  for i, d in pairs (luup.devices) do t[i] = d.attributes["cpu(s)"] end
  return t
end

This returns a table with snapshot of the current cpu(s) attribute of all plugins.

local T = cpu()
print (pretty(T))

which shows (for my test system):

{nil,0.029998,183.22298999998,0,
  [13] = 14753.573033043,
  [16] = 57.727522,
  [17] = 3020.1334680015,
  [18] = 3721.1430400028,
  [20] = 2054.268114,
  [42] = 11.429613,
  [48] = 13080.996162966,
  [70] = 7731.7173460024
}

...but it's better than that, since you can simply print it and get a nice formatted output:

print "TOTAL plugin time"
print (T)

which looks like this:

 TOTAL plugin time
      (s.ms)      [#] device name
       0.030      [2] openLuup
     183.173      [3] Alternate UI
       0.000      [4] Alternate App Store
   13077.346     [48] Philips hue
    7729.377     [70] Studio hue
      57.728     [16] BBB Sonos (Study)
    3019.473     [17] Vera 0
    3720.243     [18] Vera 1
    2053.798     [20] Netatmo
      11.430     [42] DarkSky Weather
   14749.333     [13] Vera Edge

...but it's even better than that, because you can do math with this object, so calling it again, you can then print the incremental change between the two CPU time snapshots:

local T2 = cpu()
print "Time since last call"
print (T2 - T)

giving:

 Time since last call
      (s.ms)      [#] device name
       0.000      [2] openLuup
       0.030      [3] Alternate UI
       0.000      [4] Alternate App Store
       0.000     [20] Netatmo
       0.640     [17] Vera 0
       2.290     [13] Vera Edge
       1.300     [70] Studio hue
       0.000     [42] DarkSky Weather
       2.370     [48] Philips hue
       0.000     [16] BBB Sonos (Study)
       0.790     [18] Vera 1

At the moment you'll have to copy this function and use it directly, but I'll include it as part of the openLuup extensions in the next release:

local T = luup.openLuup.cpu

or something like that.

HTH.

akbooer

@akbooer said in Profiling Lua Code:

I'll include it as part of the openLuup extensions in the next release:

Done!

a-lurker

Works great - thanks for this. And the top contenders are:

EKM meter plugin
AltUI
AltHue Philips Hue
VeraBridge
Paradox IP150 web page scraper
SMA Inverter
DataYours

EKM meter plugin, which I wrote and uses the 'incoming' stuff on a byte by byte basis - only runs every 30 secs - pathetic. Needs a rewrite: down another rabbit hole. Paradox IP150 web page scraper: another of mine. Runs every second. Been doing a rewrite in the hope of speeding it up but I don't think it will make any difference. SMA Inverter; also mine: runs every 5 mins. Uses UDP with sockets - not too bad. The EKM meter takes up 16 time more time than the SMA inverter.

I went through the openLuup code - I looked and looked - but couldn't find the bit of code where you alter the openLuup time to always be about 1/100,000 of the top contender.

It might be easier to purchase a new RasPi 4 to replace the current RasPi 3- - whoops, I hope it's not listening.

akbooer

These:

AltUI
AltHUE
VeraBridge

...all poll in one way or another, and may well spend time converting to/from JSON. Usual to see larger numbers here. Installing Cjson will help VeraBridge.

The openLuup plugin, itself, runs very rarely. What you’re not seeing is the openLuup system itself, scheduler, Historian, etc., but that is, of course, lightning fast.

a-lurker

I have cjson installed. Crumpets again? Polling is a real killer in all this stuff.

akbooer

No, just cereals and OJ on a weekday.

I’m surprised that DataYours makes the list. Although retrieving large amount of data and plotting take time, normal incremental database updates should be fast. You never moved to using the Data Historian? Not a top priority, though.

a-lurker

Data Historian? So much stuff and so little time. I should have got a CBUS system and paid for the man to come in when it needed tweaking. Living off muesli with fruit here.

Seriously though, I hoping people will be able to find ways to improve their plugins, based on the info provided in this new report. If it's representative of what's actually going on, it should be useful. However, it may not represent what's causing actual traffic jams ie demand for resources all occurring simultaneously and holding up the scheduling.

rafale77

I never got hooked to DM myself, even when I first tried it... Data historian is a fantastic implementation!

akbooer

@a-lurker said in Profiling Lua Code:

based on the info provided in this new report. If it's representative of what's actually going on, it should be useful.

I’ve improved upon it (hopefully) by adding a wall-clock time (per plugin) measurement in the latest development release (v20.6.29). The measurements are as good as I can possibly make them, given the underlying tools (os.clock(), socket.gettime()) and wrapped directly in the context_switch() routine which is the heart of the scheduler.

In the example below, for a machine running three VeraBridges, amongst other things, you can see that two of the bridges are running at a ratio of 1.1, whereas the other is 1.5. There's a good reason for this, since that bridge is not using asynchronous HTTP, so wastes time waiting for the synchronous request response. I think that's also true for the AltHue plugin.

Indeed, the numbers are only long(ish)-term averages, and don’t necessarily represent transient hold-ups. I suspect any further diagnostic, like queue length statistics, would become too obscure to be useful.

a-lurker

Have tried out the new console Wall clock to CPU clock feature, etc. Very good addition.

So I find that one plugin has a Wall clock to CPU clock ratio of 352.7 to 1.

The plugin monitors the status of a AV device every 30 seconds by sending a HTTP request to the device and gets an XML response containing all the status. Now this AV device is fully powered off most of the day. So the HTTP call times out every 30 seconds (uses a one second request timeout). The plugin uses this to decide if the AV device is powered on or not.

So the question is: do these ongoing and regular timeouts sap resources to any significant degree or not. Likewise will it diminish overall I/O capability. If so, what is the best approach to improving this?

I looked at: http_async.lua but I don't see how it could help in this scenario.

akbooer

If the timeout is on the HTTP request when sent, then async isn’t going to help (as it currently stands.)

Yes, the blocking request will steal one second in every thirty from the entire openLuup system.

Try using PING to test whether the device is up?

openLuup email server

Why different types of Triggers within openLuup & ALTUI

openLuup_install.lua - URL changed?

OpenLuup : Loading Global Modules

openLuup charting and forward slashes in variable names

openLuup log files - LuaUPnP.log and LuaUPnP_startup.log

openLuup: Version Log

ZWaveJS and MQTT

Awesome MQTT - GitHub list

GetSolarCoords - correct results?

Truncated web pages using wireguard over long distances

Console view: - No Live refreshes via :3480/console?page=devices

openLuup and room names

Need to get rid of RulesEngine

Migrating Zwave & ZigBee stuff to MQTT - guidance needed.

DisplayLine1 & 2 - Ability to align text?

AltUI access over VPN

Problem with luup.call_delay nexted in luup.call_timer

OpenLuup installation

Unexpected stop of openLuup

Profiling Lua Code

Recent Topics