Deep Learning at the Edge on an Arm Cortex-Powered Camera Board

Alasdair Allan
4 min readJul 3, 2018

It’s no secret that I’m an advocate of edge-based computing, and after a number of years where cloud computing has definitely been in ascendency, the swing back towards the edge is now well underway. Driven, not by the Internet of Things as you might perhaps expect, but by the movement of machine learning out of the cloud.

Until recently most of the examples we’ve seen, such as the Neural Compute Stick or Google’s AIY Projects kits, were based around custom silicon like Intel’s Movidius chip. However, recently Arm quietly released its CMSIS-NN library, a neural network library optimised for the Cortex-M-based microcontrollers.

Machine learning development is done in two stages. An algorithm is initially trained on a large set of sample data on a fast powerful machine or cluster, then the trained network is deployed into an application that needs to interpret real data. This deployment stage, or “inference” stage, is where edge computing is really useful.

The CMSIS-NN library means it’s now much easier to deploy trained networks onto much cheaper microcontrollers. This is exactly what OpenMV have done with their OpenMV Cam, which is built around a Cortex-M7 processor.

Smile Detection Demo with OpenMV Camera. (📷: OpenMV)

The OpenMV Cam is a small, low-powered, microcontroller-based camera board. Programmed in MicroPython, it’s well suited for machine vision applications, and part of a generation of accessible boards now appearing on the market. Based around the STM32F765VI Arm Cortex-M7 processor running at 216 MHz, it has 512KB of RAM and 2 MB of flash memory. It features a micro SD Card socket for local storage, and both full-speed USB (12Mb/s) and an SPI bus (54Mb/s) for streaming images. The board has both a 12-bit ADC, and 12-bit DAC.

The OpenMV Cam. (📷: OpenMV)

The image sensor on the board is a OV7725 capable of taking 640x480 8-bit Grayscale images, or 640×480 16-bit RGB565 images at 60 FPS when the resolution is above 320×240 and 120 FPS when it is below.

The sensor is mounted behind a 2.8mm lens using a standard M12 lens mount, also sometimes known as an “S-mount,” commonly used for CCTV and cheap webcams. That means if you want to use a different or more specialised lens with the board you can swap out the default lens and attach it.

We’ve seen trained networks on edge devices before, I’ve talked about them extensively here on the Hackster blog, and out in meat-space at O’Reilly’s AI and Strata Conferences, as well as Crowd Supply’s Teardown conference.

Face recognition at Crowd Supply’s Teardown Conference. (📷: Drew Fustini)

However the OpenMV walkthrough of how to train a model in Caffe, then quantize it for use with the CMSIS-NN library, and deploy it to low-powered hardware, is the first time I’ve seen the entire process wrapped up with a bow. If you’re interested in running trained networks nearer your data, this is a good place to start.

Re-loadable CNN on the OpenMV Cam M7/H7 with CMSIS-NN. (📷: OpenMV)

The ability to run trained networks nearer the data — without the cloud support that seems necessary to almost every task these days, or even in some cases without even a network connection — could help reduce barriers to developing, tuning, and deploying machine learning applications. It could potentially help make “smart objects” actually smart, rather than just network connected clients for machine learning algorithms running in remote data centres. It could, in fact, be the start of a sea change about how we think about machine learning and how the Internet of Things might be built. Because now there is — at least the potential — to allow us to put the smarts on the smart device, instead of in the cloud.

The recent scandals and hearings around the misuse of data harvested from social networks has surfaced long standing problems around data privacy and misuse, while the GDPR in Europe has tightened restrictions around data sharing. Yet the new generation of embedded devices, the arrival of the IoT, may cause the demise of large scale data harvesting entirely. In its place smart devices will allow us process data at the edge, making use of machine learning to interpret the most flexible sensor we have, the camera.

Interpreting camera data in real-time, and abstracting it to signal rather than imagery, will allow us to extract insights from the data without storing potentially privacy and GDPR infringing data. While social media data feeds provides ‘views,’ lots of signal, it provides few insights. Processing imagery using machine learning models at the edge, on potentially non-networked enabled embedded devices, will allow us to feedback into the environment in real time closing the loop without the large scale data harvesting that has become so prevalent.

In the end we never wanted the data anyway, we wanted the actions that the data could generate. Insights into our environment are more useful than write-only data collected and stored for a rainy day.