Benchmarking the Intel Neural Compute Stick on the New Raspberry Pi 4, Model B

The last in a series of articles on machine learning and edge computing comparing Google, Intel, and NVIDIA accelerator hardware along with the Raspberry Pi 3 and 4

Alasdair Allan
7 min readAug 15, 2019

When the Raspberry Pi 4 was launched, I sat down to update the benchmarks I’d been putting together for the new generation of accelerator hardware intended for machine learning at the edge. Unfortunately at the time, the Intel OpenVINO framework did not yet work under Raspbian Buster which meant I was unable to carry out benchmarking with the Intel hardware.

The original Movidius Neural Compute Stick (top) and newer Intel Neural Compute Stick 2 (bottom).

This changed a couple of weeks ago with the release of OpenVINO 2019.R2, so it was time to take another look at machine learning on the Raspberry Pi 4.

Headline Results From Benchmarking

Connecting the Intel Neural Compute Stick 2 to the USB 3 bus of the new Raspberry Pi 4, we do not see the dramatic ×3 increase in inferencing speed between our original results, and the new results, that we saw with the Coral USB Accelerator from Google.

Instead, for the both the Intel Neural Computer Stick 2, and the older Movidius Neural Compute Stick, we see only a moderate increase in inferencing speed when the accelerator hardware is connect via the USB 3, rather than USB 2, bus of the Raspberry Pi; we see only a 20 and 30 percent increase in speed.

Inferencing time in milli-seconds for the for MobileNet v1 SSD 0.75 depth model trained using the Common Objects in Context (COCO) dataset with an input size of 300×300. Measurements on the Raspberry Pi 3, Model B+, are in yellow, measurements on the Raspberry Pi 4, Model B, in red. The left-hand red bar for the Raspberry Pi 4 is with the Intel Movidius hardware connected to the USB 2 bus, while the right-and red bar is with it connected to the USB 3 bus.

However, unlike the Coral USB Accelerator where we saw inferencing slow—with inferencing times actually increase by a factor of ×2 when connected via the USB 2 rather than the USB 3 bus—we saw no statistically significant difference between the times recorded for inferencing when the Neural Compute Stick was connected to the USB 2 bus of the Raspberry Pi 4.

Benchmarking results in milli-seconds for MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset with an input size of 300×300, for the Raspberry Pi 3, Model B+ (left), and the new Raspberry Pi 4, Model B (right).

These results seem to suggest that, unlike the Coral USB Accelerator, the Intel Movidius-based hardware was not significantly throttled when used on the older Raspberry Pi hardware and restricted to USB 2.

The overall speed increase when using the hardware with the Raspberry Pi 4’s USB 3 bus was therefore disappointingly small, especially when compared with the Coral USB Accelerator from Google.

Part I — Benchmarking

A More Detailed Analysis of the Results

Our original benchmarks were done using both TensorFlow and TensorFlow Lite on a Raspberry Pi 3, Model B+, and these were rerun using the new Raspberry Pi 4, Model B, with 4GB of RAM. Inferencing was carried out with the MobileNet v2 SSD and MobileNet v1 0.75 depth SSD models, both models trained on the Common Objects in Context (COCO) dataset. The Xnor.ai AI2GO platform was benchmarked using their ‘medium’ Kitchen Object Detector model. This model is a binary weight network, and while the nature of the training dataset is not known, some technical papers around the model are available.

A single 3888×2916 pixel test image was used containing two recognisable objects in the frame, a banana🍌 and an apple🍎. The image was resized down to 300×300 pixels before presenting it to each model, and the model was run 10,000 times before an average inferencing time was taken.

Benchmarking results in milli-seconds for MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset with an input size of 300×300.

Overall, comparing our new timings with our previous results little has changed. The Coral Edge TPU-based hardware keeps its place as ‘best in class’ while, without any evidence of a large speed up from the Intel hardware, the Raspberry Pi 4 running TensorFlow Lite remains competitive with both the NVIDIA Jetson Nano and the Intel Movidius hardware we tested here.

Inferencing time in milli-seconds for the for MobileNet v1 SSD 0.75 depth model (left hand bars) and the MobileNet v2 SSD model (right hand bars), trained using the Common Objects in Context (COCO) dataset with an input size of 300×300. Stand alone platforms are shown in green, while the (single) bars for the Xnor AI2GO platform are timings for their proprietary binary weight model and are shown in blue. All other measurements on the Raspberry Pi 3, Model B+, are in yellow, while measurements on the Raspberry Pi 4, Model B, in red.

However probably the biggest takeaway for those wishing to use the new Raspberry Pi 4 for inferencing is the performance gains seen with the Coral USB Accelerator. The addition of USB 3.0 to the Raspberry Pi 4 means we see an approximate ×3 increase in inferencing speed over our original results.

Benchmarking results in milli-seconds for the Coral USB Accelerator using the MobileNet v1 SSD 0.75 depth model and the MobileNet v2 SSD model, both trained using the Common Objects in Context (COCO) dataset for the Raspberry Pi 3, Model B+ (left), and the Raspberry Pi 4, Model B over USB 3.0 (middle) and USB 2 (right).

We see no corresponding speed increase when using the Intel hardware.

Summary

Overall, a very disappointing result for the Intel Movidius-based hardware. Expecting similar speed ups to that seen with the Coral USB Accelerator, we saw only between 20 and 30 percent increase in inferencing speed when the hardware was attached to the Raspberry Pi 4’s USB 3 bus.

Part II — Methodology

Preparing the Intel Neural Compute Stick 2 and Raspberry Pi

We last looked at the the Intel Neural Compute Stick 2 back in June, just after the launch of the new Raspberry Pi 4, Model B. At the time the OpenVINO framework did not work yet under Raspbian Buster, and Python 3.7. However that changed recently with the release of OpenVINO 2019.R2.

Getting Started with the Intel Neural Compute Stick 2 on Raspbian. (📹: Intel Movidius)

Installation of the OpenVINO framework has not changed significantly from our original hands on with the hardware back in April. Although the official installation instructions for Raspbian have been updated, and can now be followed without modification.

Go ahead and grab the new release and install it,

$ wget https://download.01.org/opencv/2019/openvinotoolkit/R2/l_openvino_toolkit_runtime_raspbian_p_2019.2.242.tgz
$ tar -zxvf l_openvino_toolkit_runtime_raspbian_p_2019.2.242.tgz
$ mv l_openvino_toolkit_runtime_raspbian_p_2019.2.242 openvino
$ source /home/pi/openvino/bin/setupvars.sh
[setupvars.sh] OpenVINO environment initialized

before appending the setup script to the end of your .bashrc file.

$ echo "source /home/pi/openvino/bin/setupvars.sh" >> ~/.bashrc

Then run the rules script to install new udev rules so that your Raspberry Pi can recognise the Neural Compute Stick when you plug it in.

$ sudo usermod -a -G users "$(whoami)"
$ sh openvino/install_dependencies/install_NCS_udev_rules.sh
Updating udev rules...
Udev rules have been successfully installed.
$

You should go ahead logout of the Raspberry Pi, and back in again, so that all these changes can take affect. Then plug in the Neural Compute Stick.

Checking dmesg you should see something a lot like this at the bottom,

[ 1491.382860] usb 1-1.2: new high-speed USB device number 5 using dwc_otg
[ 1491.513491] usb 1-1.2: New USB device found, idVendor=03e7, idProduct=2485
[ 1491.513504] usb 1-1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1491.513513] usb 1-1.2: Product: Movidius MyriadX
[ 1491.513522] usb 1-1.2: Manufacturer: Movidius Ltd.
[ 1491.513530] usb 1-1.2: SerialNumber: 03e72485

if you don’t see similar messages then the stick hasn’t been recognised. Try rebooting your Raspberry Pi and check again,

$ dmesg | grep Movidius
[ 2.062235] usb 1-1.2: Product: Movidius MyriadX
[ 2.062244] usb 1-1.2: Manufacturer: Movidius Ltd.
$

and you should that the stick has been detected.

The Benchmarking Code

The code from our previous benchmarks was reused unchanged.

Further code can be found in the official Intel Movidius Github repo.

In Closing

Comparing these platforms on an even footing continues to be difficult. But despite the disappointing performance of the Intel hardware, it is clear that the new Raspberry Pi 4 is a solid platform for machine learning inferencing at the edge. The Coral USB Accelerator retains its ‘best in class’ result.

Links to Getting Started Guides

If you’re interested in getting started with any of the accelerator hardware I used during my benchmarks, I’ve put together getting started guides for the Google, Intel, and NVIDIA hardware I looked at during the analysis.

Links to Previous Benchmarks

This benchmarking article was the last in a series looking at accelerator hardware, and TensorFlow on the Raspberry Pi 3 and 4. If you’re interested in details of around the previous benchmarks details are below.

--

--

Alasdair Allan
Alasdair Allan

Written by Alasdair Allan

Scientist, Author, Hacker, Maker, and Journalist.

Responses (2)