Face Detection and Recognition on the ESP32

Alasdair Allan
2 min readDec 10, 2018


It’s generally acknowledged that arrival of the Espressif Systems ESP8266 and ESP32 changed the maker community and how we looked at the availability and affordability of computing.

Additionally, the newer ESP32 chips supports security features like secure boot and flash encryption needed to implement real ‘production ready’ Internet of Things smart devices, and interestingly, it now also has support for machine learning at the edge with the ESP-WHO framework

The ESP32 chip. (📷: Alasdair Allan)

I’ve spent a lot of time over the last year working with small embedded systems doing machine learning for both voice and vision. However, most of work I’ve been doing has been with Arm processors, or with custom ASIC processors like the Intel Movidius. Despite their popularity in the maker community, the Tensilica Xtensa cores used by the ESP8266 and ESP32 mean that a lot of the popular off the shelf frameworks for machine learning don’t run on the chips.

Which is what makes the ESP-WHO framework rather interesting. Built on top of Espressif’s own IoT Development Framework (ESP-IDF), the official development framework for the chip, this new framework is intended for face detection and recognition.

Current ESP-WHO feature set. (📷: Espressif Systems)

The ESP-WHO framework takes QVGA (320×240) images as input. Face detection is implemented using MTCNN and MobileNet, and will return the position of any faces in the image if present. While face recognition, that’s the identification of a particular individual’s face, is implemented with MobileFace.

If you want to have a play around with ESP-WHO it is available on GitHub. You’ll need an ESP32-based development board with sufficient available GPIO pins and more than 4MB of external SPI RAM and, at least at the moment, an OV2640-based camera module.