Building Voice-Controlled Objects with Google’s AIY Projects Voice Kit

Do-It-Yourself Artificial Intelligence for the Internet of Things

Alasdair Allan
3 min readNov 14, 2017

I’ve been playing with Google’s new AIY Projects Voice Kit for a couple of months now. I was both fortunate enough to pick up one of the original kits that were distributed with issue 57 of the MagPi, and then to later get my hands on with a few pre-production hardware samples as the second version of the kit was made available for pre-order at the end of August.

The Google AIY Projects Voice Kit (right), a retro rotary phone (left), and magic mirror (back).

Over the last month or two I’ve put together a couple of different projects using the kit. My favourite by far has to be a retro-computing interface based around a GPO 746 Rotary Telephone. Watching people interact with it was intriguing. Adding simple sounds to imitate a ‘real’ phone—like a dial tone, a hang up noise, and a simple greeting by an operator—left enough room that people were no longer quite sure whether they were talking to a machine, or human. It lent a curious hesitancy to the interactions that I haven’t seen with other voice controlled objects.

However most of my time with the kit has been spent building a magic mirror based around the kit and open source MagicMirror² platform.

Unlike a lot of magic mirror builds you might come across, my mirror was deliberately quite modest. Most builds use full-sized stripped down LCD monitors, this build used a comparatively small 30cm×30cm square frame, while the LCD was the (smaller still) official Raspberry Pi 7-inch touch screen. This was a deliberate decision, not only did it reduce the amount of wood working necessary, but it made the mirror portable, and less intimidating.

Like my previous retro-computing build I deliberately made use of sound to attempt to make the mirror more magical. While home hubs like Amazon Alexa, or the Google Home, are blatantly creations of technology I’ve always been fascinated by David Rose’s enchanted objects. Technology is never really mature until it is invisible, and while voice interfaces are a step towards that I still feel that these interfaces need to be further embedded, hidden, into the environment.

The mirror was also served as a testbed for my exploration of deep learning at the edge, allowing me to test Google’s TensorFlow on the device for simple hotword recognition.

The ability to run these trained networks “at the edge” nearer the data — without the cloud support that seems necessary to almost every task these days, or even in some cases without even a network connection — could help reduce barriers to developing, tuning, and deploying machine learning applications. It could potentially help make “smart objects” actually smart, rather than just network connected clients for machine learning algorithms running in remote data centres. It could in fact, be the start of a sea change about how we think about machine learning and how the Internet of Things might be built. Because now there is — at least the potential — to allow us to put the smarts on the smart device, rather than in the cloud.

“The positive reception to Voice Kit has encouraged us to keep the momentum going with more AIY Projects. We’ll soon bring makers the ‘eyes,’ ‘ears,’ ‘voice’ and sense of “balance” to allow simple, powerful device interfaces.”Google.

I think I’ve learned a lot more about the user experience behind voice interfaces by experimenting with the form and function of the objects offering the interface than I would have done purely by playing with code. I think kits like Google’s new Voice Kit are invaluable for both makers and product designers alike, and if Google’s promises of more kits to come proves to be prophetic then integrating vision, and other sensors, with processing at the edge could move us all closer to an Internet of Things that makes sense and gives us more choice when it comes to privacy.