Benchmarking the Intel Neural Compute Stick on the New Raspberry Pi 4, Model B

The last in a series of articles on machine learning and edge computing comparing Google, Intel, and NVIDIA accelerator hardware along with the Raspberry Pi 3 and 4

When the Raspberry Pi 4 was launched, I sat down to update the benchmarks I’d been putting together for the new generation of accelerator hardware intended for machine learning at the edge. Unfortunately at the time, the Intel OpenVINO framework did not yet work under Raspbian Buster which meant I was unable to carry out benchmarking with the Intel hardware.

This changed a couple of weeks ago with the release of OpenVINO 2019.R2, so it was time to take another look at machine learning on the Raspberry Pi 4.

Headline Results From Benchmarking

Connecting the Intel Neural Compute Stick 2 to the USB 3 bus of the new Raspberry Pi 4, we do not see the dramatic ×3 increase in inferencing speed between our original results, and the new results, that we saw with the Coral USB Accelerator from Google.

Instead, for the both the Intel Neural Computer Stick 2, and the older Movidius Neural Compute Stick, we see only a moderate increase in inferencing speed when the accelerator hardware is connect via the USB 3, rather than USB 2, bus of the Raspberry Pi; we see only a 20 and 30 percent increase in speed.

However, unlike the Coral USB Accelerator where we saw inferencing slow—with inferencing times actually increase by a factor of ×2 when connected via the USB 2 rather than the USB 3 bus—we saw no statistically significant difference between the times recorded for inferencing when the Neural Compute Stick was connected to the USB 2 bus of the Raspberry Pi 4.

These results seem to suggest that, unlike the Coral USB Accelerator, the Intel Movidius-based hardware was not significantly throttled when used on the older Raspberry Pi hardware and restricted to USB 2.

The overall speed increase when using the hardware with the Raspberry Pi 4’s USB 3 bus was therefore disappointingly small, especially when compared with the Coral USB Accelerator from Google.

Part I — Benchmarking

A More Detailed Analysis of the Results

Our original benchmarks were done using both TensorFlow and TensorFlow Lite on a Raspberry Pi 3, Model B+, and these were rerun using the new Raspberry Pi 4, Model B, with 4GB of RAM. Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD models, both models trained on the Common Objects in Context (COCO) dataset. The AI2GO platform was benchmarked using their ‘medium’ Kitchen Object Detector model. This model is a binary weight network, and while the nature of the training dataset is not known, some technical papers around the model are available.

A single 3888×2916 pixel test image was used containing two recognisable objects in the frame, a banana🍌 and an apple🍎. The image was resized down to 300×300 pixels before presenting it to each model, and the model was run 10,000 times before an average inferencing time was taken.

Overall, comparing our new timings with our previous results little has changed. The Coral Edge TPU-based hardware keeps its place as ‘best in class’ while, without any evidence of a large speed up from the Intel hardware, the Raspberry Pi 4 running TensorFlow Lite remains competitive with both the NVIDIA Jetson Nano and the Intel Movidius hardware we tested here.

However probably the biggest takeaway for those wishing to use the new Raspberry Pi 4 for inferencing is the performance gains seen with the Coral USB Accelerator. The addition of USB 3.0 to the Raspberry Pi 4 means we see an approximate ×3 increase in inferencing speed over our original results.

We see no corresponding speed increase when using the Intel hardware.


Overall, a very disappointing result for the Intel Movidius-based hardware. Expecting similar speed ups to that seen with the Coral USB Accelerator, we saw only between 20 and 30 percent increase in inferencing speed when the hardware was attached to the Raspberry Pi 4’s USB 3 bus.

Part II — Methodology

Preparing the Intel Neural Compute Stick 2 and Raspberry Pi

We last looked at the the Intel Neural Compute Stick 2 back in June, just after the launch of the new Raspberry Pi 4, Model B. At the time the OpenVINO framework did not work yet under Raspbian Buster, and Python 3.7. However that changed recently with the release of OpenVINO 2019.R2.

Installation of the OpenVINO framework has not changed significantly from our original hands on with the hardware back in April. Although the official installation instructions for Raspbian have been updated, and can now be followed without modification.

Go ahead and grab the new release and install it,

before appending the setup script to the end of your .bashrc file.

Then run the rules script to install new udev rules so that your Raspberry Pi can recognise the Neural Compute Stick when you plug it in.

You should go ahead logout of the Raspberry Pi, and back in again, so that all these changes can take affect. Then plug in the Neural Compute Stick.

Checking dmesg you should see something a lot like this at the bottom,

if you don’t see similar messages then the stick hasn’t been recognised. Try rebooting your Raspberry Pi and check again,

and you should that the stick has been detected.

The Benchmarking Code

The code from our previous benchmarks was reused unchanged.

Further code can be found in the official Intel Movidius Github repo.

In Closing

Comparing these platforms on an even footing continues to be difficult. But despite the disappointing performance of the Intel hardware, it is clear that the new Raspberry Pi 4 is a solid platform for machine learning inferencing at the edge. The Coral USB Accelerator retains its ‘best in class’ result.

Links to Getting Started Guides

If you’re interested in getting started with any of the accelerator hardware I used during my benchmarks, I’ve put together getting started guides for the Google, Intel, and NVIDIA hardware I looked at during the analysis.

Links to Previous Benchmarks

This benchmarking article was the last in a series looking at accelerator hardware, and TensorFlow on the Raspberry Pi 3 and 4. If you’re interested in details of around the previous benchmarks details are below.

Scientist, Author, Hacker, Maker, and Journalist.