More On the Picovoice Toolkit
Offline speech-to-text on a Raspberry Pi Zero?
Back at the start of the year we heard some rumours about a startup called Picovoice that was wrapping wake word detection, and both speech-to-intent and speech-to-text engines, all offline without an Internet connection.
Last week we finally got some more details.
The three voice engines; Porcupine for wake-word detection, Rhino which handles speech-to-intent, and Cheetah which does speech-to-text translation, all operate in on-device without a network connection.
The idea here is that a company will approach Picovoice to build a domain specific model for them, and by keeping the models specific to a certain product—such as a coffee maker, or a television—the model can maintain a high accuracy inside a much reduced resource requirement. They claim that their engines will run real-time speech-to-text on a Raspberry Pi Zero or even locally within a web browser. No cloud needed.
That means that, unlike most current voice engines, your conversations don’t leave your home. Given the current problems around ‘quality auditing’ by humans, that’s rather interesting. Right now all of your voice assistants are listening to you, but so are the humans behind them. Suddenly that makes privacy is a selling point again.
But while we have some more details, and a better look at the engines themselves—all three engines are now available on the company’s Github—it’s still early days. That means that that if you’re interested in using the engines for non-commercial or evaluation purposes you can now go ahead and build a voice application using one of the engines yourself.
But if you’re thinking about building a product around the new engines, you’ll need to contact their enterprise team. While it’s not clear what licensing the engines for use in an actual product is going to cost, like a lot of enterprise tooling, the cost is likely to be whatever the market will bear.
However, just the existence of this sort of tooling, as well as other moves to make machine learning easier to leverage by non-specialists, is a sign that the ecosystem around edge computing is starting to mature.