Hands on with the Coral USB Accelerator
I also go “Hands on with the Coral Dev Board” in a companion article.
At last year’s Google Next conference in San Francisco Google announced two new upcoming hardware products both built around Google’s Edge TPU, their purpose-built ASIC designed to run machine learning inferencing at the edge.
Both a development board and a USB Accelerator, with a form-factor along similar lines to the Intel’s Neural Compute Stick, were announced allowing users to run inferences of pre-trained TensorFlow Lite models locally on their own hardware.
The hardware is now available and has now launched into public Beta under the name Coral and, ahead of the launch, I managed to get my hands onto some early access hardware. I’ve already gotten hands on with the Coral Dev Board, but I also wanted to look at the new Coral USB Accelerator.
Opening the box
Much like the Coral Dev Board, the USB Accelerator comes in a small rather unprepossessing box. Mine was pre-production hardware, and the box came with sticker’ed up saying that “Marking and packaging not final,” which may well explain the AIY Projects branding on my USB Accelerator. Possibly the rebrand to Coral was only a fairly recent decision?
Inside the box is a USB stick and a short USB-C to USB-A cable intended to connect to to your computer. At 65mm × 30mm the USB Accelerator has roughly the same footprint as the Intel Neural Compute Stick, however with a depth of just 8mm the accelerator is much more slimline.
The size of the USB Accelerator stick doesn’t seem all that important until you realise that the Intel stick was so large it tended to block nearby ports, or with some computers, be hard to use at all.
Gathering the supplies
Unlike the Coral Dev Board, which needs a lot of setup work done before you can get started, the USB Accelerator is designed to be more or less plug and go, all you need is a Linux computer with a free USB port.
The computer needs to be running a Debian 6.0 or higher, or any derivative as Ubuntu 10.0+, and the accelerator runs fine when connected to a computer with either an x86_64 computer, or an ARM64, architecture.
Fortunately, at least for those of us working in an ‘all Apple’ shops, that means you can usr the USB Accelerator with a Raspberry Pi board and everything should ‘just work.’ So in addition to what comes in the box, the absolute minimum you’re going to need is a Raspberry Pi board, a micro USB to USB-A cable, a 2.5A power supply, and a micro SD Card.
Setting up your computer
The first thing you’ll need to do is to set up your Linux computer. If you’re using a Raspberry Pi for this it’s probably a good idea to install a fresh version of the operating systems and work from a clean slate.Go ahead and download the latest release of Raspbian Lite and set up your Raspberry Pi.
Unless you’re using wired networking, or have a display and keyboard attached to the Raspberry Pi, at a minimum you’ll need to put the Raspberry Pi on to your wireless network, and enable SSH.
Once you’ve set up your Raspberry Pi go ahead and power it on, and then open up a Terminal window on your laptop and SSH into the Raspberry Pi.
% ssh firstname.lastname@example.org
Once you’ve logged in you might want to change the hostname to something less generic, to let you tell it apart from all the other Raspberry Pi boards on your network, I chose
Powering your Raspberry Pi
If like me you’ve chosen to connect the USB Accelerator to a Raspberry Pi you’re going to need a good power supply. The more modern Raspberry Pi boards, especially the latest model the Raspberry Pi 3, Model B+, needs a USB power supply that will provide +5V consistently at 2 to 2.5A. Depending on what peripherals the board needs to support that can be a problem.
Typically the Raspberry Pi uses between 500 and 1,000mA depending on the current state of the board. However attaching a monitor to the HDMI port uses 50mA, adding a camera module requires 250mA, and keyboards and mice can take as little as 100mA or well over 1,000mA depending on the model. With the USB Accelerator itself requiring at least 500mA.
However I’ve found that most USB chargers will tend to under supply the Raspberry Pi, and as a result the board will register a low power condition and start to throttle the CPU speed. If things get worse the board may suffer brown outs and start to randomly, or repeatedly, reboot.
If you have a monitor attached to your Raspberry Pi board this is the point where you’ll see a yellow lightning bolt in the top right hand corner of your screen. However if you’re running your Raspberry Pi headless, you can still check from the command line using
$ vcgencmd get_throttled
However the output is in the form of binary flags, and therefore more than somewhat impenetrable. Fortunately it’s not that hard to put together a script to parse the output of the command and get something that’s a bit more human readable.
While the current draw on boot with the USB Accelerator attached is the standard 500mA, when the USB Accelerator is running you can get brown outs if you have a poor power supply, or a cheap poor quality USB cable.
If you the script reports that the board is under-volted it’s likely you should replace your power supply with a more suitable one before proceeding.
Unless you’re really sure, I’d recommend you pick on the official Raspberry Pi USB power supply. It has been designed to consistently provide +5.1V despite rapid fluctuations in current draw. It also has an attached micro USB cable, which means that you don’t accidentally use a poor quality cable—something that can be an issue.
Those fluctuations in demand is something that happens a lot with when you’re using peripherals with the Raspberry Pi, and something that other supplies—designed to provide consistent current for charging cellphones—usually don’t cope with all that well.
Installing the software
You’re ready to install the software needed to support the Edge TPU.
Right now the Coral hardware is still in a “beta” release phase, so it’s likely that software setup instructions will change. However at the moment you can go ahead and download the software development kit using
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/edgetpu_api.tar.gz
$ tar -xvzf edgetpu_api.tar.gz
and then run the installation script.
$ cd python-tflite-source
$ bash ./install.sh
The script will kick off, but then pause with the following warning.
“During normal operation, the Edge TPU Accelerator may heat up, depending on the computation workloads and operating frequency. Touching the metal part of the device after it has been operating for an extended period of time may lead to discomfort and/or skin burns. As such, when running at the default operating frequency, the device is intended to safely operate at an ambient temperature of 35C or less. Or when running at the maximum operating frequency, it should be operated at an ambient temperature of 25C or less.”
In other words, the stick is going to get hot if runs for extended periods of time. However what Google is warning us about here is that the ambient temperature of the environment you’re in shouldn’t be above 35°C (95°F), or 25°C (77°F) if you choose ‘maximum operating frequency’ at the prompt.
That might seem easily achievable until you think that this is an edge computing device, and could easily be intended to be deployed strapped to a lamp post in downtown Tucson, where average summer temperatures regularly reach 37°C (100°F).
In any case, something to bear in mind if you’re thinking about deployment into the field. If you are expecting higher ambient temperatures you may have to think about additional passive cooling, or perhaps even active cooling, depending on your environment and available power budget. Which isn’t that surprising, especially since a lot of computing deployments already need cooling—and heating—when deployed into the wild.
Once the installation has completed, go ahead plug in the USB Accelerator using the short USB-C to USB-A cable that accompanied the USB stick in the box. If you’ve already plugged it in, you’ll need remove it and replug it, as the installation script adds some
udev rules that allows software running on the Raspberry Pi to recognise that the Edge TPU hardware is present.
Running your first Machine Learning model
Unlike the Coral Dev Board, which comes with a pretty slick initial demo application that starts a web server with a video stream of freeway traffic with real time inferencing done on the board overlaid on top, your first model will be slightly more modest.
The software kit we downloaded includes the Edge TPU Python module that provides simple APIs that perform image classification, object detection, and weight imprinting — otherwise know as transfer learning — on the Edge TPU.
Let’s take a look at the object detection demonstration code. You can find the demo code in the
This script is designed to perform object recognition on an image. I’ve actually gone ahead and slightly modified original version of the demonstration code distributed with the software kit for the USB Accelerator. I’ve added some code to make the boxes drawn around detected objects a bit thicker, so they’re more easily seen, and added labels to each detection box. I’ve also just dropped any detected objects if the detection score is less than 0.45 certainty.
You can either grab my version of the code from GitHub, or use the version included with the board at
Here I’m running my version of the script, which resides in the
mendel user’s home directory, on an image of some fruit, also in the home directory.
$ python3 ./object_detection.py --model python-tflite-source/edgetpu/test_data/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite --label python-tflite-source/edgetpu/test_data/coco_labels.txt --input fruit.jpg --output out.jpgbanana score = 0.839844
apple score = 0.5
Saved to out.jpg
You can copy the output image off the board to your laptop using the
scp command. On your laptop type,
$ scp email@example.com:out.jpg .
out.jpg 100% 1249KB 9.9MB/s 00:00
coral with the name of your Raspberry Pi, to transfer the file back across to your laptop.
It turns out that for the second day in a row I’m apparently eating a healthy lunch. Although if you’ve read the previous article, close inspection of the fruit will show both the apple and banana are left over from yesterday’s lunch. I promise I at least moved them purposefully around the office with the intention of eating them several times.
However if we now go ahead and turn our detection threshold all the way down we do get a lot more objects detected. However, most of these have very low certainty scores. They aren’t really credible detections.
banana score = 0.839844
apple score = 0.5
book score = 0.210938
apple score = 0.210938
book score = 0.210938
dining table score = 0.160156
dining table score = 0.121094
banana score = 0.121094
apple score = 0.121094
book score = 0.121094
We get multiple detections of both the apple and the banana. Interesting at least one of the lower confidence detections of the apple has a much better bounding box than the original, high confidence, detection. We also see low confidence detection of a dining table, which is sort of reasonable given the craft mat the fruit is sitting on. The network also thinks the USB Accelerator looks like a book, which is also sort of reasonable given the shape.
However, I’d run the same network on a similar image when I went hands on with the Coral Dev Board yesterday, and I was kind of surprised by the low confidence detection of both the banana and the apple this time around.
However running the network on the original image was reassuring, as this gave the same result as the previous run with the Dev Board.
banana score = 0.964844
apple score = 0.789062
Reproducibility in science is good.
My guess is that the detection confidence of both pieces of fruit was really harmed by the the banana being tucked slightly behind the apple in today’s image. Which isn’t really that surprising, as the shape of the apple is affected.
You should also bear in mind that the demonstration models included with the Coral hardware aren’t tuned. They are, in other words, not production-quality models. Detection accuracy is dependant on model training, and Google is expecting that users will train their own models to their own needs.
Now let’s take a look at the code. Stripping away the extraneous bits around our model that handles command line parameters, load the image, and handles annotating the result, the code that actually does the inferencing is actually just two lines long.
First of all we need to instantiate a detection engine with our trained model, where here
args.model is the path to our chosen model passed on the command line.
engine = DetectionEngine(args.model)
Then we run the inference by pointing it at the input image, where here
img is a
ans = engine.DetectWithImage(img, threshold=0.05, keep_aspect_ratio=True, relative_coord=False, top_k=10)
You can see here that we can actually adjust our credibility threshold, and the maximum number of candidate objects the engine should report above that threshold. So I could have filtered things here in the original call, rather than throwing in that
if statement into the code, if I’d wanted to do that.
That’s it. That’s how easy it is to do object detection.
DetectWithImage() call returns a list of
DetectionCandidate objects which is a data structure of each candidate detections. Every object detected will have a corresponding label number returned by the model, which is why we need a label file so that we can translate the label number to something a bit more human friendly.
We were using a MobileNet SSD v2 model trained with the Common Objects in Context (COCO) dataset which detects the location of 90 types of object. So the label file for our model has a corresponding 90 objects in it, including our banana and apple.
87 teddy bear
88 hair drier
Alongside the label number the candidate detection will a certainty score, and a bounding box around the detected object which is passed as a
Adding a camera
While the Coral Dev Board has its own camera module to provide real time video, you can do the same thing with the USB Accelerator by using the official Raspberry Pi camera module. Although the Raspberry Pi module is around the Sony IMX219 8 mega-pixel sensor, rather than the 5 mega-pixel Omnivision OV5645 sensor used by the Coral Dev Board camera, that extra resolution isn’t necessary a bonus—depending on what you’re doing with the video you may need to down-sample the data before passing it to your model.
To attach the camera module to your Raspberry Pi, turn the camera module over so it’s face down and pull the black latch outward. Then slide the ribbon cable under the latch with the blue strip facing towards you. The ribbon cable should slide smoothly beneath it t, and you shouldn’t have to force it. Then push the black latch back in to secure the cable in place.
If your Raspberry Pi is powered on and running, you’ll need to power it down before attaching the camera module. In your SSH session you should go ahead and power down the board using the
shutdown command to bring it to a clean halt.
$ sudo shutdown -h now
Unplug the power cable and then pull the black latch of the board’s camera connector, located just to the right of the 3.5mm jack and the Ethernet socket, upwards. Follow the same procedure as for the camera module, this time the blue strip should face towards the Ethernet jack. Afterwards power the board back up and log back into via SSH.
Now you’ve got the camera physically connected, you’ll need to enable it. You can use the
raspi-config utility to do that.
$ sudo raspi-config
Scroll down and select “Interfacing Options,” and then select “Camera” from the next menu. Hit “Yes” when prompted, and then “Finish” to quite out of the configuration tool. Select “Yes” when asked whether you want to reboot.
You can check that the camera is working by using the
$ raspistill -o testshot.jpg
this will leave a file called
testshot.jpg in the home directory, you can use
scp to copy it from the Raspberry Pi back to your laptop.
While we can use the still images taken by the camera and feed them to our model by hand, if you have a monitor attached there is some code included in the software development kit that you can run that’ll demonstrate real-time inferencing on top of a video feed from the camera.
The script will need to access the camera from Python and, out of the box, the
picamera Python module may not be installed on your Raspberry Pi. So before running the demo code we should go ahead and do that.
$ sudo apt-get install python3-picamera
You’ll also need to download some a new model. Google have provided a number of pre-compiled models with corresponding label files that aren’t shipped with the board. For this demo go ahead and download the MobileNet V2 object classification model and associated label file.
$ cd ~
$ wget https://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/mobilenet_v2_1.0_224_quant_edgetpu.tflite
$ wget http://storage.googleapis.com/cloud-iot-edge-pretrained-models/canned_models/imagenet_labels.txt
Then go to the source directory, and run the
$ cd ~/python-tflite-source/edgetpu
$ python3 demo/classify_capture.py \
--model test_data/mobilenet_v2_1.0_224_quant_edgetpu.tflite \
So long as you’ve got a monitor installed an alpha overlay will be dropped on top of whatever is currently display—which is why, regrettably you can’t use this demo script over a VNC connection—with a live video feed from our camera module with real time inferencing results overlaid on top.
Update: So you can view the output using VNC if you enable “Experimental Direct Capture Mode” from the Options→Troubleshooting menu, making sure that you’re connecting to display
:0 rather than the default virtual desktop mode and display
:1. You should take a look at this thread if you’re still having problems. (Thanks go to Marky Mark for the tip!)
You can see that, even with all the clutter in the background, our model is correctly identifying the major feature on my work bench. That pesky banana.
This demo is also a good time to check out temperature. With the demo app running I was sort of curious as to what temperature the USB Accelerator was going to reach. So I grabbed my laser infrared thermometer and checked.
With the USB Accelerator idle the temperature of the casing was around 25°C (77°F), so only a couple of degrees above the ambient temperature in my office. After running the demo for a solid 30 minutes, it rose to 35°C (95°F).
A +10°C rise in temperature is a lot less than I initially assumed we were going to see given the lack of heat sink. So my guess at this point is that the demo application isn’t working the USB Accelerator all that hard. It’s just not running flat out. So while this moderate rise in temperature is good to see, I’d still keep those warnings about the maximum ambient operating temperature in mind when deploying in the wild.
A comparison with the Coral Dev Board
Sharing a similar form factor to the Intel Neural Compute Stick, the new Coral USB Accelerator packs the Edge TPU into a much smaller package than the Coral Dev Board.
However that means that the USB Accelerator also lacks the large heat sink, and active cooling from the fan, we have on the Dev Board. I haven’t observed really high temperatures while running models, so my assumption is that for short bursts that’s probably not important. However you should bear it in mind if you’re deploying things into the wild and you’re intending for the USB Accelerator to be in use continuously, especially if it is going to be deployed into a comparatively enclosed environment.
However this is probably not the reason behind the comparative slowness of inferencing on the Raspberry Pi with the USB Accelerator compared what we saw with the with the Dev Board.
Running inferencing against our fruit image from yesterday is a lot slower on the Raspberry Pi than it was on the Dev Board, with inferencing taking just under 6.6 seconds.
$ time python3 ./object_detection.py --model python-tflite-source/edgetpu/test_data/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite --label python-tflite-source/edgetpu/test_data/coco_labels.txt --input fruit.jpg --output out.jpgbanana score = 0.964844
apple score = 0.789062
Saved to out.jpg
That is significantly slower than the same network performing inferencing on the same image on the Dev Board, which returned in 1.5 seconds.
$ time python3 ./object_detection.py --model /usr/lib/python3/dist-packages/edgetpu/test_data/mobilenet_ssd_v2_coco_quant_postprocess_edgetpu.tflite --label /usr/lib/python3/dist-packages/edgetpu/test_data/coco_labels.txt --input fruit.jpg --output out.jpgbanana score = 0.964844
apple score = 0.789062
Saved to out.jpg
My presumption here is that the inferencing is bottlenecked by the time it takes to transfer the image from the Raspberry Pi, over the USB connection, to the USB Accelerator. While the USB Accelerator supports USB 3 speeds the Raspberry Pi does not, and is limited to USB 2 speeds. While USB 2 offers transfer rates of 480 Mbps, USB 3 offers transfer rates of 4.8 Gbps — that’s ten times faster. There’s also the storage bottleneck, while the Raspberry Pi is working from an SD Card, the Dev Board is working from onboard eMMC storage. The same flash chips, but not at the end of a data bus.
The image I’m using here is a JPG straight from my iPhone, and is 3888×2916 pixels, and around 3MB, in size. Scaling that all the way down to 640×480 pixels reduces the size of the image size to just 147KB.
Re-running our tests we get an inferencing time of 5.6 seconds with the Raspberry Pi and USB Accelerator, and 0.82 seconds with the Dev Board. That’s a significant speed up, and perhaps unsurprisingly it doesn’t much affect the detection confidence of the detections.
banana score = 0.953125
apple score = 0.839844
In fact, as you’ve probably noticed already, the model is more sure that the apple is an apple than it was before we scaled the image down.
Scaling the image down further, down to 240×180 and just 33KB, and re-running the inferencing doesn’t significantly affect the run time for the Raspberry Pi and USB Accelerator. We again get an inferencing time of 5.6 seconds. However it does slightly drop the inferencing time with the Dev Board, with the run time being 0.79 seconds with the smaller image.
The detection confidence for the apple does, however, decline somewhat.
banana score = 0.933594
apple score = 0.660156
Now obviously, these times are rough, and inflated by overheads not directly related to the inferencing. As we know from the slick initial demo application that ships with the Dev Board the Edge TPU is quite capable of performing inferencing on full frame video at > 70 fps, all the while the main CPU on the board is handling annotating that video in real time with the results.
So beyond the size of the image the timing of the applications would also be affected by the amount of time it takes to set up and teardown the script itself. Considering that the camera demo showed that the Raspberry Pi and USB Accelerator is capable of handling inference
It would therefore be interesting to see if there was still such a significant difference between the two, between the Coral Dev Board and the USB Accelerator that is, if our Linux computer wasn’t a Raspberry Pi or whether these timing really are all down to setup and teardown overhead.
Building your own models
You’ll then need to convert your TensorFlow model to the optimised FlatBuffer format to represent graphs used by TensorFlow Lite. From there you’ll need to compile your TensorFlow Lite model for compatibility with the Edge TPU with Google’s web compiler.
During the current beta period the compiler the Edge TPU compiler has some restrictions. But these restrictions should be lifted when Coral comes out of Beta testing next month.
Using a web compiler is a neat move by Google to get around a problem you face when working with Intel Movidius based hardware with an ARM-based board, like the Raspberry Pi, where you needed an additional x86 based development machine to compile your models so you can deploy them on to the accelerator hardware.
Right now, during the beta phase, the EdgeTPU web compiler is restricted to a few model architectures ; either a MobileNet V1/V2 model with a 224×224 max input size and a 1.0 max depth multiplier, an Inception V1/V2 model with a 224×224 fixed input size, or finally, an Inception V3/V4 model with a 224×224 fixed input size. All of these models must be a quantised TensorFlow Lite model (
.tflite file) less than 100MB.
These architecture restrictions are going to be removed in a future update, with any quantised TensorFlow Lite model being allowed, so long as the model uses 1-, 2-, or 3-dimensional tensors with the tensor sizes and model parameters fixed at compile time. Although Google does warn that there may be “…other operation-specific limitations” that apply, those aren’t yet clear.
The restrictions to INT8 models and small cache sizes for the Coral hardware is pretty understandable, the board is designed for comparatively low power deployments. With required power consumption levels far less than some other hardware, for instance NVIDIA’s recently released Jetson Nano board, a direct comparison isn’t necessary very fair.
Google have provided some excellent overview documentation online to get your started working with the Dev Board and camera, alongside this is more detailed Python API documentation available to download.
While both the Coral Dev Board and the USB Accelerator are very different from Google’s previous machine learning kits that launched under the AIY Projects brand, the USB Accelerator also feels like a very different product than the Dev Board. It’s pretty evident that the new Edge TPU-based hardware is aimed at a more professional audience that the previous Raspberry Pi kits.
The Dev Board is almost certainly intended as an evaluation board for the System-on-Module (SoM), which will be made available “in volume” later in the year, rather than a stand alone board intended for development. It’s aimed at small and medium sized companies, and professional hardware developers, looking to add machine learning into existing or new products.
However the USB Accelerator isn’t. While it’s not entirely clear what the underlying strategy is, I actually think it USB Accelerator is aimed at data scientists and makers, rather than embedded hardware developers. Although it’s probably a good fit as a prototyping tool for the PCI-e version of the Edge TPU that’s also coming later in the year.
Data scientists will be using the accelerator with their Linux laptop to crunch on their data, while makers will be using it with the Raspberry Pi to build robots and autonomous vehicles.
There are a lot of projects built around the Raspberry Pi and its camera module, I think we can confidently predict that a lot of them will be adding an Edge TPU co-processor for real-time video inferencing.
This post is sponsored by Coral from Google.