Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Unsolved
Collapse
Discussion Forum to share and further the development of home control and automation, independent of platforms.
  1. Home
  2. Hardware
  3. IPCam
  4. Facial recognition explained and Customized Home-Assistant components
Video Doorbells
rafale77R
With the boom of video doorbells with the likes of Ring, Skybell and Next doorbell cams, I came to the realization that I did not want to be cloud dependent for this type service for long term reliability, privacy and cost. I finally found last year a wifi video doorbell which is cost effective and support RTSP and now ONVIF streaming: The RCA HSDB2A which is made by Hikvision and has many clones (EZViz, Nelly, Laview). It has an unusual vertical aspect ratio designed to watch packages delivered on the floor.... It also runs on 5GHz wifi which is a huge advantage. I have tried running IPCams on 2.4GHz before and it is a complete disaster for your WIFI bandwidth. Using a spectrum analyzer, you will see what I mean. It completely saturates the wifi channels because of the very high IO requirements and is a horrible design. 2.4GHz gets range but is too limited in bandwidth to support any kind of video stream reliably... unless you have a dedicated SSID and channel available for it. The video is recorded locally on my NVR. I was able to process the stream from it on Home Assistant to get it to do facial recognition and trigger automations on openLuup like any other IPCams. This requires quite a bit of CPU power to do... I also get snapshots through push notifications through pushover on motion like all of my other IPcams. Movement detection is switched on and off by openLuup... based on house mode.
IPCam
Object Recognition
rafale77R
Sharing a few options for object recognition which then can be used as triggers for home automation. My two favorites so far: https://github.com/asmirnou/watsor#homeassistant-integration https://github.com/skvark/opencv-python
IPCam
Facial recognition triggering automation
rafale77R
I have optimized my facial recognition scheme and discovered a few things: My wifi doorbell, the RCA HSDB2, was overloaded by having to provide too many concurrent rtsp streams which was causing the streams themselves to be unreliable: Cloud stream stream to QNAP NVR stream to home assistant (regular) stream to home assistant facial recognition. I decided to use the proxy function of the QNAP NVR to now only pull 2 streams from the doorbell and have the NVR be the source for home assistant. This stabilized the system quite a bit. The second optimization was to find out that by default home assistant processes images every 10s. It made me think that the processing was slow but it turns out that it was just not being triggered frequently enough. I turned it up to 2s and now I have a working automation to trigger an openLuup scene, triggering opening a doorlock with conditionals on house mode and geofence. Now I am looking to offload this processing from the cpu to an intel NCS2 stick so I might test some other components than Dlib to make things run even faster.
IPCam
CCTV on Openluup
CatmanV2C
Can we? Simply? Specifically Foscams which used to run 'fine' on Vera but are not exactly high importance to me. But since they are there.... Cheers C
IPCam
Facial recognition explained and Customized Home-Assistant components
rafale77R
Sharing what I have learned and some modifications to components with their benefits. On home assistant/python3, facial recognition involves the following steps: Establishing and maintaining a camera stream (over rtsp or http protocol) Have the ability to extract a single frame from the open stream in order to process it Pre process using the same steps as a video frame and store in memory a predetermined number of pictures as the known people to later compare with. In reality what is being compared are arrays of numbers generated by a model. Run a face detection and localization on the frame using one model Using the resulting location of 4., Extract from the picture, the face and encode it into a array of number Run a classification or comparison between the pre-set faces and the face on the video and spit out the "inference" or "prediction" to determine if they are close enough to be the same person. Even though a few components have been created on home-assistant for many years to do this, I ran into challenges which forced me to improve/optimize the process. Home Assistant's camera does not establish and keep open a stream in the background. It can open one on demand through its UI but doesn't keep it open. This forces the facial camera component to have to re-establish a new stream to get a single frame every time it needs to process an image causing up to 2s of delays, unacceptable for my application. I therefore rewrote the ffmpeg camera component to use opencv and maintain a stream within a python thread and since I have a GPU, I decided to decode the video using my GPU to relieve the CPU. This also required playing with some subtleties to avoid uselessly decoding frames we won't process while still needing to remove them from the thread buffer. The frame extraction was pretty challenging using ffmpeg which is why I opted to use opencv instead, as it executes the frame synchonization and alignment from the byte stream for us. The pre-set pictures was not a problem and a part of every face component. I started with the dlib component which had two models for ease of use. It makes use of the dlib library and the "facial_recognition" wrapper which has a python3 API but the CNN model requires a GPU and while it works well for me, turned out not to be the best as explained in this article and also quite resource intensive:https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/ So I opted to move to the opencv DNN algorithm instead. Home Assistant has an openCV component but it is a bit generic and I couldn't figure out how to make it work. In any case, it did not have the steps 5 and 6 I wanted. For the face encoding step, I struggled quite a bit as it is quite directly connected to what option I would chose for step 6. From my investigation, I came to this: https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/ "*Use dlib’s embedding model (but not it’s k-NN for face recognition) In my experience using both OpenCV’s face recognition model along with dlib’s face recognition model, I’ve found that dlib’s face embeddings are more discriminative, especially for smaller datasets. Furthermore, I’ve found that dlib’s model is less dependent on: Preprocessing such as face alignment Using a more powerful machine learning model on top of extracted face embeddings If you take a look at my original face recognition tutorial, you’ll notice that we utilized a simple k-NN algorithm for face recognition (with a small modification to throw out nearest neighbor votes whose distance was above a threshold). The k-NN model worked extremely well, but as we know, more powerful machine learning models exist. To improve accuracy further, you may want to use dlib’s embedding model, and then instead of applying k-NN, follow Step #2 from today’s post and train a more powerful classifier on the face embeddings.*" The trouble from my research is that I can see some people have tried but I have not seen posted anywhere a solution to translating the location array output from the opencv dnn model into a dlib rect object format for dlib to encode. Well, I did just that... For now I am sticking with the simple euclidian distance calculation and a distance threshold to determine the face match as it has been quite accurate for me but the option of going for a much more complex classification algorithm is open... when I get to it. So in summary, the outcome is modifications to: A. the ffmpeg camera component to switch to opencv and enable background maintenance of a stream with one rewritten file: https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/ffmpeg/camera.py B. Changes to the dlib face recognition component to support the opencv face detection model: https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/dlib_face_identify/image_processing.py C. Modified face_recognition wrapper to do the same, enabling conversion between dlib and opencv https://github.com/rafale77/face_recognition/blob/master/face_recognition/api.py D. And additions of the new model to the face_recognition library involving adding a couple of files: https://github.com/rafale77/face_recognition_models/blob/master/face_recognition_models/__init__.py https://github.com/rafale77/face_recognition_models/tree/master/face_recognition_models/models Overall these changes significantly improved speed and decreased cpu and gpu utilization rate over any of the original dlib components. At the moment the CUDA use for this inference is broken on openCV using the latest CUDA so I have not even switched on the GPU for facial detection yet (it worked fine using the dlib cnn model) but a fix may already have been posted so I will recompile openCV shortly... Edit: Sure enough openCV is fixed. I am running the face detection on the GPU now.
IPCam
opensource NVR
DesTD
At the moment i'm using the Surveillance software in Synology but I'm limited to 6 cameras (2 included and I took a 4pack a while ago) But I have 8 cameras, so right now, 2 of them are not in the NVR! I checked back in time motioneye but this software is very slow and all my cameras feed was lagging... any other solution?
IPCam
License Plate recognition
rafale77R
Something fun to do if you have a camera located on your driveway: This home assistant component enables recognition of a license plate which in turn could open the garage door... https://www.home-assistant.io/integrations/openalpr_local/
IPCam
Monocle on Alexa
rafale77R
Sharing an excellent skill I use to locally stream from my IPCams to echo shows: https://monoclecam.com Your video stream does not need to go to the cloud. This skill just forwards the local stream address to the echo device when the camera name is called. It does require them to host the address and camera information (credentials) on their server though. I personally block all my IP cameras from accessing the internet from the router.
IPCam

Facial recognition explained and Customized Home-Assistant components

Scheduled Pinned Locked Moved IPCam
7 Posts 2 Posters 2.7k Views 2 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • rafale77R Offline
    rafale77R Offline
    rafale77
    wrote on last edited by rafale77
    #1

    Sharing what I have learned and some modifications to components with their benefits.
    On home assistant/python3, facial recognition involves the following steps:

    1. Establishing and maintaining a camera stream (over rtsp or http protocol)
    2. Have the ability to extract a single frame from the open stream in order to process it
    3. Pre process using the same steps as a video frame and store in memory a predetermined number of pictures as the known people to later compare with. In reality what is being compared are arrays of numbers generated by a model.
    4. Run a face detection and localization on the frame using one model
    5. Using the resulting location of 4., Extract from the picture, the face and encode it into a array of number
    6. Run a classification or comparison between the pre-set faces and the face on the video and spit out the "inference" or "prediction" to determine if they are close enough to be the same person.

    Even though a few components have been created on home-assistant for many years to do this, I ran into challenges which forced me to improve/optimize the process.

    1. Home Assistant's camera does not establish and keep open a stream in the background. It can open one on demand through its UI but doesn't keep it open. This forces the facial camera component to have to re-establish a new stream to get a single frame every time it needs to process an image causing up to 2s of delays, unacceptable for my application. I therefore rewrote the ffmpeg camera component to use opencv and maintain a stream within a python thread and since I have a GPU, I decided to decode the video using my GPU to relieve the CPU. This also required playing with some subtleties to avoid uselessly decoding frames we won't process while still needing to remove them from the thread buffer.
    2. The frame extraction was pretty challenging using ffmpeg which is why I opted to use opencv instead, as it executes the frame synchonization and alignment from the byte stream for us.
    3. The pre-set pictures was not a problem and a part of every face component.
    4. I started with the dlib component which had two models for ease of use. It makes use of the dlib library and the "facial_recognition" wrapper which has a python3 API but the CNN model requires a GPU and while it works well for me, turned out not to be the best as explained in this article and also quite resource intensive:https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/
      So I opted to move to the opencv DNN algorithm instead. Home Assistant has an openCV component but it is a bit generic and I couldn't figure out how to make it work. In any case, it did not have the steps 5 and 6 I wanted.
    5. For the face encoding step, I struggled quite a bit as it is quite directly connected to what option I would chose for step 6. From my investigation, I came to this: https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/

    "*Use dlib’s embedding model (but not it’s k-NN for face recognition)

    In my experience using both OpenCV’s face recognition model along with dlib’s face recognition model, I’ve found that dlib’s face embeddings are more discriminative, especially for smaller datasets.

    Furthermore, I’ve found that dlib’s model is less dependent on:

    Preprocessing such as face alignment
    Using a more powerful machine learning model on top of extracted face embeddings
    If you take a look at my original face recognition tutorial, you’ll notice that we utilized a simple k-NN algorithm for face recognition (with a small modification to throw out nearest neighbor votes whose distance was above a threshold).

    The k-NN model worked extremely well, but as we know, more powerful machine learning models exist.

    To improve accuracy further, you may want to use dlib’s embedding model, and then instead of applying k-NN, follow Step #2 from today’s post and train a more powerful classifier on the face embeddings.*"

    The trouble from my research is that I can see some people have tried but I have not seen posted anywhere a solution to translating the location array output from the opencv dnn model into a dlib rect object format for dlib to encode. Well, I did just that...

    1. For now I am sticking with the simple euclidian distance calculation and a distance threshold to determine the face match as it has been quite accurate for me but the option of going for a much more complex classification algorithm is open... when I get to it.

    So in summary, the outcome is modifications to:
    A. the ffmpeg camera component to switch to opencv and enable background maintenance of a stream with one rewritten file:
    https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/ffmpeg/camera.py
    B. Changes to the dlib face recognition component to support the opencv face detection model:
    https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/dlib_face_identify/image_processing.py
    C. Modified face_recognition wrapper to do the same, enabling conversion between dlib and opencv

    face_recognition/face_recognition/api.py at master · rafale77/face_recognition

    face_recognition/face_recognition/api.py at master · rafale77/face_recognition

    The world's simplest facial recognition api for Python and the command line - rafale77/face_recognition

    D. And additions of the new model to the face_recognition library involving adding a couple of files:

    face_recognition_models/face_recognition_models/__init__.py at master · rafale77/face_recognition_models

    face_recognition_models/face_recognition_models/__init__.py at master · rafale77/face_recognition_models

    Trained models for the face_recognition python library - rafale77/face_recognition_models

    face_recognition_models/face_recognition_models/models at master · rafale77/face_recognition_models

    face_recognition_models/face_recognition_models/models at master · rafale77/face_recognition_models

    Trained models for the face_recognition python library - rafale77/face_recognition_models

    Overall these changes significantly improved speed and decreased cpu and gpu utilization rate over any of the original dlib components.
    At the moment the CUDA use for this inference is broken on openCV using the latest CUDA so I have not even switched on the GPU for facial detection yet (it worked fine using the dlib cnn model) but a fix may already have been posted so I will recompile openCV shortly...

    Edit: Sure enough openCV is fixed. I am running the face detection on the GPU now.

    1 Reply Last reply
    2
    • CatmanV2C Offline
      CatmanV2C Offline
      CatmanV2
      wrote on last edited by
      #2

      I understood all of the words, just not the order in which you used them....

      C

      The Ex-Vera abuser know as CatmanV2.....

      rafale77R 1 Reply Last reply
      0
      • CatmanV2C CatmanV2

        I understood all of the words, just not the order in which you used them....

        C

        rafale77R Offline
        rafale77R Offline
        rafale77
        wrote on last edited by rafale77
        #3

        @CatmanV2

        Maybe this could help:

        The four computational steps:

        The model training (the bases to compare to) must be done before hand.

        Screen Shot 2020-07-10 at 14.38.18.png

        Each step can be either done by the CPU or the GPU. I highlighted in teal my current selection for which I shared the code. 😉

        For the last column... I am just getting started...
        https://machinelearningmastery.com/hyperparameters-for-classification-machine-learning-algorithms/

        1 Reply Last reply
        1
        • CatmanV2C Offline
          CatmanV2C Offline
          CatmanV2
          wrote on last edited by
          #4

          Well, thanks for trying 😄

          C

          The Ex-Vera abuser know as CatmanV2.....

          1 Reply Last reply
          0
          • rafale77R Offline
            rafale77R Offline
            rafale77
            wrote on last edited by
            #5

            So, I just modified the HASS plugin to move to use the scikit SVM classification... It requires a lot of data (i.e pictures) Not sure it is more accurate yet...

            1 Reply Last reply
            0
            • rafale77R Offline
              rafale77R Offline
              rafale77
              wrote on last edited by
              #6

              Well now I can confirm, It is drastically more accurate if trained with a large enough dataset. I loaded 30 random face pictures, 30 of my own and 30 of the wife and the thing is a little more accurate than the previous approach. Pretty happy with the outcome.
              So the recognition (encoding and classification) uses machine learning which is easy enough to run on the CPU, the face detection is more complex and relies on deep learning. I have basically resolved to assemble my own system composing from various model sources. Pretty satisfactory outcome.
              When I'll have more time, I will investigate some newer award winning ones like insightface, deepstack...

              1 Reply Last reply
              1
              • rafale77R Offline
                rafale77R Offline
                rafale77
                wrote on last edited by rafale77
                #7

                I have been optimizing the code further to have less dependencies (relying directly on the dlib library instead of a wrapper for it) and learned some more about convoluted neural network inferences... fascinating field. I also changed the jitter parameter from 1 to 10 which should help the accuracy.

                core/homeassistant/components/dlib_face_identify/image_processing.py at live · rafale77/core

                core/homeassistant/components/dlib_face_identify/image_processing.py at live · rafale77/core

                :house_with_garden: Open source home automation that puts local control and privacy first - rafale77/core

                The original dlib model has scored 99.17% at the LFW benchmark and my mods sped up the face detection and probably improved accuracy by using a classifier rather than just an Euclidean distance, enlarging the face detection and training set, and the increase of the jitter parameter. Looking around for what is available, the only thing that is potentially better is the retinaface detection + arcface encoding which potentially could improve the angled recognition. I am almost done implementing it for testing but it probably is overkill for my doorbell... 🙂

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                Recent Topics

                • [Reactor] Variables not updating correctly in latest-25201-2aa18550
                  tunnusT
                  tunnus
                  0
                  94
                  7.5k

                • The reaction stopped working (Google Nest max playing a video)
                  F
                  Fanan
                  0
                  8
                  525

                • Do you Matter?
                  akbooerA
                  akbooer
                  0
                  3
                  167

                • Caution: zwave-js-ui docker 11.4.0 is broken
                  toggledbitsT
                  toggledbits
                  0
                  2
                  110

                • Shelly Wall Display XL
                  therealdbT
                  therealdb
                  2
                  6
                  282

                • Handling Dead Entities and Renamed Entities
                  PablaP
                  Pabla
                  0
                  5
                  203

                • Strange behavior for MQTT templates using payload and attributes
                  toggledbitsT
                  toggledbits
                  0
                  6
                  253

                • [MSR] reactor-mqtt-contrib package for additional MQTT templates
                  therealdbT
                  therealdb
                  1
                  46
                  9.0k

                • HA 2025.9.4 Supported Yet?
                  toggledbitsT
                  toggledbits
                  0
                  2
                  151

                • Rule Set UI bug - RESOLVED
                  toggledbitsT
                  toggledbits
                  1
                  2
                  297

                • [Reactor] Copy&Paste of Rules
                  therealdbT
                  therealdb
                  0
                  1
                  325

                • [Reactor] Help with screne controller cycling logic
                  toggledbitsT
                  toggledbits
                  0
                  5
                  476
                Powered by NodeBB | Contributors
                Hosted freely by 10RUPTiV - Solutions Technologiques | Contact us
                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • Unsolved