With the boom of video doorbells with the likes of Ring, Skybell and Next doorbell cams, I came to the realization that I did not want to be cloud dependent for this type service for long term reliability, privacy and cost.
I finally found last year a wifi video doorbell which is cost effective and support RTSP and now ONVIF streaming:
The RCA HSDB2A which is made by Hikvision and has many clones (EZViz, Nelly, Laview). It has an unusual vertical aspect ratio designed to watch packages delivered on the floor....
It also runs on 5GHz wifi which is a huge advantage. I have tried running IPCams on 2.4GHz before and it is a complete disaster for your WIFI bandwidth. Using a spectrum analyzer, you will see what I mean. It completely saturates the wifi channels because of the very high IO requirements and is a horrible design. 2.4GHz gets range but is too limited in bandwidth to support any kind of video stream reliably... unless you have a dedicated SSID and channel available for it.
The video is recorded locally on my NVR. I was able to process the stream from it on Home Assistant to get it to do facial recognition and trigger automations on openLuup like any other IPCams. This requires quite a bit of CPU power to do...
I also get snapshots through push notifications through pushover on motion like all of my other IPcams. Movement detection is switched on and off by openLuup... based on house mode.
Sharing a few options for object recognition which then can be used as triggers for home automation.
My two favorites so far:
Object detection for video surveillance. Contribute to asmirnou/watsor development by creating an account on GitHub.
GitHub - opencv/opencv-python: Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages. GitHub - opencv/opencv-python: Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages. - opencv/opencv-python
I have optimized my facial recognition scheme and discovered a few things:
My wifi doorbell, the RCA HSDB2, was overloaded by having to provide too many concurrent rtsp streams which was causing the streams themselves to be unreliable:
Cloud stream
stream to QNAP NVR
stream to home assistant (regular)
stream to home assistant facial recognition.
I decided to use the proxy function of the QNAP NVR to now only pull 2 streams from the doorbell and have the NVR be the source for home assistant. This stabilized the system quite a bit.
The second optimization was to find out that by default home assistant processes images every 10s. It made me think that the processing was slow but it turns out that it was just not being triggered frequently enough. I turned it up to 2s and now I have a working automation to trigger an openLuup scene, triggering opening a doorlock with conditionals on house mode and geofence. Now I am looking to offload this processing from the cpu to an intel NCS2 stick so I might test some other components than Dlib to make things run even faster.
Sharing what I have learned and some modifications to components with their benefits.
On home assistant/python3, facial recognition involves the following steps:
Even though a few components have been created on home-assistant for many years to do this, I ran into challenges which forced me to improve/optimize the process.
Home Assistant's camera does not establish and keep open a stream in the background. It can open one on demand through its UI but doesn't keep it open. This forces the facial camera component to have to re-establish a new stream to get a single frame every time it needs to process an image causing up to 2s of delays, unacceptable for my application. I therefore rewrote the ffmpeg camera component to use opencv and maintain a stream within a python thread and since I have a GPU, I decided to decode the video using my GPU to relieve the CPU. This also required playing with some subtleties to avoid uselessly decoding frames we won't process while still needing to remove them from the thread buffer. The frame extraction was pretty challenging using ffmpeg which is why I opted to use opencv instead, as it executes the frame synchonization and alignment from the byte stream for us. The pre-set pictures was not a problem and a part of every face component. I started with the dlib component which had two models for ease of use. It makes use of the dlib library and the "facial_recognition" wrapper which has a python3 API but the CNN model requires a GPU and while it works well for me, turned out not to be the best as explained in this article and also quite resource intensive:https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/So I opted to move to the opencv DNN algorithm instead. Home Assistant has an openCV component but it is a bit generic and I couldn't figure out how to make it work. In any case, it did not have the steps 5 and 6 I wanted. For the face encoding step, I struggled quite a bit as it is quite directly connected to what option I would chose for step 6. From my investigation, I came to this: https://www.pyimagesearch.com/2018/09/24/opencv-face-recognition/
"*Use dlib’s embedding model (but not it’s k-NN for face recognition)
In my experience using both OpenCV’s face recognition model along with dlib’s face recognition model, I’ve found that dlib’s face embeddings are more discriminative, especially for smaller datasets.
Furthermore, I’ve found that dlib’s model is less dependent on:
Preprocessing such as face alignment
Using a more powerful machine learning model on top of extracted face embeddings
If you take a look at my original face recognition tutorial, you’ll notice that we utilized a simple k-NN algorithm for face recognition (with a small modification to throw out nearest neighbor votes whose distance was above a threshold).
The k-NN model worked extremely well, but as we know, more powerful machine learning models exist.
To improve accuracy further, you may want to use dlib’s embedding model, and then instead of applying k-NN, follow Step #2 from today’s post and train a more powerful classifier on the face embeddings.*"
The trouble from my research is that I can see some people have tried but I have not seen posted anywhere a solution to translating the location array output from the opencv dnn model into a dlib rect object format for dlib to encode. Well, I did just that...
For now I am sticking with the simple euclidian distance calculation and a distance threshold to determine the face match as it has been quite accurate for me but the option of going for a much more complex classification algorithm is open... when I get to it.So in summary, the outcome is modifications to:
A. the ffmpeg camera component to switch to opencv and enable background maintenance of a stream with one rewritten file:
https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/ffmpeg/camera.py
B. Changes to the dlib face recognition component to support the opencv face detection model:
https://github.com/rafale77/home-assistant/blob/dev/homeassistant/components/dlib_face_identify/image_processing.py
C. Modified face_recognition wrapper to do the same, enabling conversion between dlib and opencv
The world's simplest facial recognition api for Python and the command line - rafale77/face_recognition
D. And additions of the new model to the face_recognition library involving adding a couple of files:face_recognition_models/face_recognition_models at master · rafale77/face_recognition_models face_recognition_models/face_recognition_models at master · rafale77/face_recognition_models
Trained models for the face_recognition python library - rafale77/face_recognition_models
init.pyface_recognition_models/face_recognition_models/models at master · rafale77/face_recognition_models face_recognition_models/face_recognition_models/models at master · rafale77/face_recognition_models
Trained models for the face_recognition python library - rafale77/face_recognition_models
Overall these changes significantly improved speed and decreased cpu and gpu utilization rate over any of the original dlib components.
At the moment the CUDA use for this inference is broken on openCV using the latest CUDA so I have not even switched on the GPU for facial detection yet (it worked fine using the dlib cnn model) but a fix may already have been posted so I will recompile openCV shortly...
Edit: Sure enough openCV is fixed. I am running the face detection on the GPU now.
At the moment i'm using the Surveillance software in Synology but I'm limited to 6 cameras (2 included and I took a 4pack a while ago)
But I have 8 cameras, so right now, 2 of them are not in the NVR!
I checked back in time motioneye but this software is very slow and all my cameras feed was lagging...
any other solution? 😉
Sharing an excellent skill I use to locally stream from my IPCams to echo shows:
MonocleYour video stream does not need to go to the cloud. This skill just forwards the local stream address to the echo device when the camera name is called. It does require them to host the address and camera information (credentials) on their server though. I personally block all my IP cameras from accessing the internet from the router.
Object Recognition
-
So I got a bit frustrated with the resource drain from Watsor's usage of FFmpeg and have been investigating various models to implement on openCV. I ended up completely rewriting the Home assistant openCV component to use an updated SSD512 model and using my own FFmpeg component for decoding... which uses openCV:
-
It is pretty straightforward if you know where your installation is. If you use a virtual environment, bare metal or VM installation, it should be easy to find the location of your python packages. If you use containers then I cannot help you.
Once you the location of your home assistant package, it is as easy as replacing the original file with my version and setup the component the same way as the original one says. For these, you will need to download the models which are publicly available in the links I posted and then put them into a model/ folder in your home assistant configuration folder. ".homeassistant/model".Edit: pretty thrilled with the possibilities: Counting cars in the driveway/garage counting people at the door to adjust the greeting message... etc...
-
In my pursuit of efficiency, for home assistant, I have dug deeper into its image processing and camera code and found that image processing, not being the camera component's main intended purpose is extremely inefficient and can generate 3x more CPU load than needed. Essentially it is doing way too much format coding/decoding:
From the camera's stream H264/H265 to raw then to JPEG then to bytes then to raw then to array for processing. I found it to be insane and decided recode this:These two core component files need to be replaced:
"https://github.com/rafale77/core/blob/live/homeassistant/components/camera/init.py"
"https://github.com/rafale77/core/blob/live/homeassistant/components/image_processing/init.py"Then the ffmpeg integration component:
"https://github.com/rafale77/core/blob/live/homeassistant/components/ffmpeg/camera.py"And finally the two image processing components, opencv (object detection) and dlib (face recognition) which I also streamlined:
"https://github.com/rafale77/core/blob/live/homeassistant/components/dlib_face_identify/image_processing.py"
"https://github.com/rafale77/core/blob/live/homeassistant/components/opencv/image_processing.py"For the last 3 files, comment out all the lines containing Cuda if you do not have a GPU.
How to:
Replace all 5 files I have posted above into your home assistant installation.
Configure your cameras as ffmpeg components as you normally would. If it uses h265 encoding, add "h265" to the "extra_command" option. No other extra command is needed or works.Get the model files and create a "model" folder in your homeassistant folder (it should be ~/.homeassistant).
Here are the model files:
"https://github.com/mmilovec/facedetectionOpenCV"
You only need the res10_300x300_ssd_iter_140000.caffemodel and the deploy.prototxt.txt files.
and why I picked this model for detection.
"https://www.learnopencv.com/face-detection-opencv-dlib-and-deep-learning-c-python/"Next the object detection model files:
"https://github.com/AlexeyAB/darknet/releases/tag/darknet_yolo_v4_pre"
You only need the first two files. (.cfg and .weight)
and last, this file , which you will have to rename cococlasses.txt. It is the list of objects the detector can identify which you can use for your configuration below. By default it will detect "person"
You should have a total of 5 files in the model folder if you want both face recognition and object detection. Now configure these components in yaml as per the official documentation and you are done. For object detection, the "classifiers" option is a comma separated list: i.e. "person,car,dog". Homeassistant's image processing component has you set the update frequency.
These should work fine on a couple of cameras on a CPU but the higher the processing frequency is and the more cameras you have, the higher the load. I have mine update at 5Hz, meaning using 0.2s interval and I run these on a GPU, now with 7 cameras. No more camera motion trips due to the wind blowing on leaves... etc... and yeah it's completely integrated into openLuup. -
Unfortunately, I can't help much in this case. Using the supervised installation means you have everything containerized and I am a bit allergic to this approach as it complicates the interaction between containers and makes access/changes to various code very difficult. It also an inefficient installation mode. These changes are not addons. They are core code changes which will require any installation but the supervised/containerized ones.
-
One idea I just had to simplify things is just to install home assistant from my branch by doing this from a venv:
pip3 install git+https://github.com/rafale77/core.git
But again it does not help the containerized installation which has the disadvantage of its benefit: Contained controlled small environment so you cannot mess with it means that you can't modify anything in it.
-
Hm that's a pity. I think most users have a "supervised" install due to simplicity.
Also interesting:
-
Yeah I look at things from an overall standpoint... Yes it may appear simple at the surface but it is actually very complicated just one layer underneath so if one wants to do anything more with it, it becomes a hot mess. I use some containers but only for very small and limited applications which benefit from being self contained. For anything more than that... like home-assistant, I just find it absurd.
And yup, privacy... again why I am not running anything like this through a cloud API...
Another example of something made simple but using a very complex solution under the surface to a simple problem. I see the cloud as philosophically the same as the docker containers... Adding layers of complexity and liabilities for the sake of making it convenience.
I am a big advocate of KISS but from an overall standpoint. Not from just the user installation which is only a short one time extra effort vs. a lifetime of inefficiency and other risks. -
@sender said in Object Recognition:
Ehm... noob here again... let's say:
I have homeassistant supervised (ability to add addons). I have option to easiliy spin up a VM. Where do I start?As you know, I am not a Linux guru and I would need every step documented
Ok, from another standpoint. You know my setup... can I do something with a vm on ubuntu and having it integrated in hass?
-
Yes you can. I don't know what this entire obsession with containers is about in the home assistant forum. I found it actually much easier to install and manage with a VM or a virtual environment. By the way, the supervised installation has had an attempt for deprecation causing a huge thread on their forum. A lot of people use it but it is so complicated and takes so much work to maintain that the devs wanted to take it away... also because IMHO it really made 0 sense to begin with.
Disturbingly the simplest, fastest, most flexible and easiest installation is the one they recommend for developers:
In my case, in a ubuntu VM, I even skip the virtual environment which is an unnecessary added complication.
Make a copy of your home-assistant configuration folder. (If you don't want to start from scratch)
Set up the VM to the same address as your previous installation.
Just install a virgin ubuntu OS, install python3.7 if it isn't already installed, this will depend on the version of ubuntu you installed. I think any version before 20.04 will need it:sudo apt-get update && sudo apt-get install python3.7
Copy your old .homeassistant folder into your user directory again only if you want to keep your config.
Install home assistant from my repopython3.7 -m pip install git+https://github.com/rafale77/core.git
and start it:
hass --open-ui
You can go through another tutorial to make it auto start upon VM start but this should suffice for now.
I would argue that this isn't any more complicated than the supervised installation. To me it is actually much simpler... It is literally 3 commands in the worst case for a new install, without any funky docker software to download.The docker container, I think helped people setup the autostart and install the right version of python... A complicated solution to a simple problem. Adding another layer of management program, virtualization, restrictions, file system management, cpu and memory inefficiency etc... just to control 5 lines of startup code?
-
And... now that I have switched to pytorch for facial detection/recognition I am looking to see if Yolov4 can be enhanced and sure enough... a 7 days old update to this project could be what I will try to implement in home assistant next:
It combines YoloV4 with some of the enhancements of YOLOv5 and seems to be better than both.
-
So it seems I am practically done with my setup and it is now running super efficiently and very "false positive" immune.
I have a set of 16 cameras, running on an NVR which record upon motion on my NAS.
I got the system to now eliminate false detection of spider webs and wind blowing leaves triggering motion detection using the object detection scheme. This however is a bit too heavy in terms of computation and therefore power consumption so I have made the whole thing smarter and less wasteful:Camera motion detection -> trip a sensor on openLuup which then runs a scene -> turn on object detection on Home Assistant (I modified home assistant to have a switch to turn it on and off) -> if object detection sees a recognizable object -> trips another sensor on openLuup which triggers other scenes depending on what object is detected.
I did the same thing with facial detection: Camera motion detected -> trip openLuup sensor -> trigger scene to turn on a virtual switch on openLuup -> which turns on Face recognition on Home Assistant -> If face is detected -> Trip another sensor on openLuup with the name of the detected face -> triggers others scenes which parses the name and unlock doors, turn on lights, sends notifications etc... after some delay, the virtual switch turns back off which turn off the face recognition on home assistant. Been running the whole setup for a month now an its been solid though I have been tweaking things here and there.
A lot of work and research has gone into finding the right AI models and how to implement them on python some with a lot of code optimization... By the way this is an updated version of the one in my previous post, better accuracy.