The advent of AI-powered edge devices is creating a new market for specialized hardware, which must be not only powerful, but also super-fast, energy-efficient, small-sized, and, well, cheap.
The excitement around AI is at its peak right now, with billions of dollars being poured into the development of new technologies that promise to make our lives easier.
The more we rely on AI, the bigger the need for powerful hardware becomes. But what kind of hardware? As it turns out, not all types are created equal. GPUs might be great for training models and running inference in data centers. But when we’re talking about “smartifying” things like security cameras, autonomous vehicles, and other edge devices that will be deployed in the wild, their intrinsic limitations become painfully obvious.
A round peg in a square hole
Although GPU manufacturers are increasingly focusing on AI applications, the chips they ship still retain a lot of legacy from the days when they were primarily used for gaming. If you look closely inside even the “edge-focused” chips from GPU manufacturers, you will find things like shaders or high-precision arithmetics, which just makes no sense for machine learning.
Such excessiveness might not be an issue for cloud computing uses, where you can just throw in more GPUs and get more computing power. But when you have to squeeze every last drop of performance out of your hardware, the inefficiencies add up to some real challenges for edge uses, such as:
- High power consumption: A typical “edge-grade” GPU-based chip consumes at least 15 W, which might not sound like much, but in a typical IoT application with a battery-powered device it’s not.
- Low computational density: Due to all the redundant schematics and the architecture itself not being optimized, a GPU-based chip takes up too much space for many edge applications.
- High latency. A super-performant GPU in a self-driving car won’t be any useful if it takes it a few hundred milliseconds to respond to an emergency situation.
- High price tag: By buying a GPU for your edge device, you’re essentially paying for decades of R&D invested in gaming, and all the technologies that you won’t be using.
The rise of the Tensor
All this is where tensor processing units (TPUs) come in. TPUs are dedicated hardware specifically designed for AI applications. Although at the moment there is no strict technical definition of what a TPU is, they are generally considered to be compute-optimized ASICs that can run neural networks at high speed.
The central element in a TPU is a systolic array, a type of a spatial computing structure. In a systolic array, data flows through the network in a way that resembles the flow of blood through our veins, hence the name. Thanks to their architecture, systolic arrays can perform tasks such as matrix computation or convolution — which is what the majority of neural network computations are based on — in a single run, without reading any data from memory.
(For a more detailed explanation of how TPUs work, read this excellent article by the creators of Google’s TPU.)
TPUs trade universality for specialization. They are not as flexible and powerful as GPUs, but they can perform matrix operations at a speed that is simply unmatchable by any GPU on the market today. As a result, they can fit the same performance in a much smaller space, and they consume a fraction of the power. Their circuitry is also more straightforward, which translates into a much shorter signal path, and thus lower latency. Finally, TPUs don’t carry the burden of legacy technologies, which means that they can be manufactured at a lower cost.
Any chip can be an AI chip
An increasingly common approach among chip manufacturers is not to design a TPU from scratch, but use already existing IP cores — a reusable circuit design aimed at implementing a particular algorithm or application. As such, they present an opportunity for “non-AI” chip manufacturers to get in on the TPU game. Provided their documentation is clear, comprehensive and well-maintained, and with the right set of software tools to implement specific neural network algorithms, IP cores can be used to relatively quickly create AI chips for particular applications.
Such “tensor cores” could be used to introduce AI features into a variety of already existing devices, ranging from security cameras to drones and from infotainment systems to self-driving cars. In doing so, the chip manufacturer doesn’t need to reinvent the wheel, but only needs to implement a specific algorithm on top of an IP core. (This is done by “compiling” an existing e.g. TensorFlow neural net into an internal instruction set executable by the IP core.)
The price to license an IP core ranges from a fraction of a dollar to a few dollars per chip, but it heavily depends on the application, performance characteristics, and batch size. Some IP cores are even available for free, such as NVIDIA’s NVDLA or our own Edged Lite offering. This can be a viable option for relatively low-performance applications such as security cameras or IoT devices.
A cautionary note here is that not all IP cores available today are in a production-ready state. Software packages such as Cadence or Mentor Graphics allow pretty much anyone to code and run a simulated version of an IP core, but model or lab performance doesn’t always translate directly to real-world performance. This risk is much lower for silicon-proven cores, i.e. ones that were actually built into real chips. Research shows that the cost of fixing “post-silicon” errors is 40,000 times higher.
The future of AI chips?
As AI chips become more common in different kinds of devices, they will no longer be thought of as AI chips per se. They will simply be considered a new type of hardware that’s specialized for particular applications. We believe that, although GPUs will keep their ground in cloud-based AI applications, they will gradually lose their position in the IoT market, as they are simply too bulky, expensive and slow for edge devices.
As for the core technology, there’s still much ground to be covered. Some of the challenges are quantitative — such as fighting for smaller footprints, lower latencies or higher power efficiency. Others are qualitative, such as the need for more robust software tools to implement specific algorithms. A specific and as of yet almost-untapped area is training on the edge, which will allow us to build AI-powered devices that are field-ready from the start.
Finally, we feel like there’s a pressing need for the industry to become more open. As it stands, a lot of valuable research is proprietary and closed. This leads to fragmentation in the ecosystem, hampers progress, and leads to a “rich get richer” dynamic, where larger companies are able to leverage their existing IP and R&D efforts, while smaller companies are left to scramble for crumbs. We hope our Edged Lite initiative will be a small yet meaningful step in changing that.
In any case, these are exciting times to be in the chip industry, and we are looking forward to seeing what will happen next.