The benchmarks were done in as straightforward a manner as possible. As much as practical models were identical across the platforms, and the code was deliberately written to be as naive as possible. There are things I could have done to speed up all of the tested platforms, like for instance use async with the NCS2, but like the models I wanted to keep the code as similar as possible across the platforms, and as much as possible, something a developer getting started with the platform would write. To keep things understandable.
So for instance if you go back to the original post you’ll see that measurements using the NVIDIA Jetson Nano were, like the other platforms done using Python and TensorFlow. This gave slower results than the NVIDIA C++ benchmarks. They’re now working to resolve the bugs that led to this slow down with the TensorFlow team.
The Coral USB Accelerator saw a ×3 increase in inferencing speed when used with USB 3, which brought it in line with the Coral Dev Board. The pipeline to the Edge TPU on the dongle was being throttled by the lack of available bandwidth on USB 2. With USB 3 we saw more or less identical results with the Accelerator than with the Dev Board. So, it’s not at all unexpected. I don’t see how it can have anything to do with caching, as I see the results when real time video is used. The small speed increase I saw with the NCS2 was, to me, a far more surprising result.
My methodology outlined in the original post at https://medium.com/@aallan/benchmarking-edge-computing-ce3f13942245. See the roundup post, https://blog.hackster.io/the-big-benchmarking-roundup-a561fbfe8719, for the overall results in context.