The Big Benchmarking Roundup

Getting started with machine learning and edge computing

Alasdair Allan


Over the last six months I’ve been looking at machine learning on the edge, publishing a series of articles trying to answer some of the questions that people have been asking about inferencing on embedded hardware.

But, after a half year of posts, talks, and videos, it’s all bit of a sprawling mess and the overall picture is of what’s really happening is rather confusing.

So here’s a great big benchmarking roundup!

Inferencing time in milli-seconds for the for MobileNet v1 SSD 0.75 depth model (left hand bars) and the MobileNet v2 SSD model (right hand bars), trained using the Common Objects in Context (COCO) dataset with an input size of 300×300. Stand alone platforms are shown in green, while the (single) bars for the Xnor AI2GO platform are timings for their proprietary binary weight model and are shown in blue. All other measurements using accelerator hardware attached to the Raspberry Pi 3, Model B+, are in yellow, while measurements on the Raspberry Pi 4, Model B, in red.

Although some people have dismissed the idea of benchmarks for inferencing as irrelevant because “…it’s training times that matter,” that doesn’t really seem justified. While if you take an academic approach to machine learning you often will train thousands of different models to find one that is ‘paper worthy’ but this does not seem to be how things work out in the world.

Instead for embedded systems training is a sunk cost with the final model being used thousands, perhaps even millions, of times depending on how many systems make use of it. Those models will also tend to hang around, potentially for decades if you’re talking about hardware that’s going into factories, homes, or public spaces. So in the long term it’s how fast those models run on the embedded hardware that’s important, not how long they took to train.

Discussion of the methodology behind the benchmarks can be found in the original post in the series, while the latest results can be found below, and are also discussed in both the first and the final post in the series.

Final benchmarking results in milli-seconds for MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset with an input size of 300×300, alongside the Xnor AI2GO platform and their proprietary binary weight model.

While inferencing speed is probably our most important measure, these are devices intended to do machine learning at at the edge. That means we also need to pay attention to environmental factors.