In a nutshell, an AI accelerator is a purpose-built hardware component that speeds up processing of AI workloads such as computer vision, speech recognition, natural language processing, and so on. Recent years have seen a proliferation of such accelerators, driven partly by the need to improve real-time response times for AI inference at the edge, and partly by the crush of data from IoT sensors.

So exactly which AI accelerators are out there, and how do they differ from one another? And, more importantly, which accelerators are most suitable for which use cases, and where might they be headed in the future? Let’s take a look at these questions in more detail.

Underlying principles

As we said above, AI accelerators are purpose-built components, aimed specifically at speeding up the processing of AI workloads. So how do these AI workloads differ from more traditional processing tasks? There are several key areas:

High parallelism

Most AI workloads are based on artificial neural networks, which are basically large numbers of basic processing elements (i.e. neurons) that are wired together to form complex patterns of interconnections, like a giant web. For this reason, AI computations are ideally suited to parallel processing.

Matrix computations

The structure of a neural network can be represented by a matrix, where the “weight” of each interconnection is stored as a number in the matrix. Thus, matrix operations play a large role in many AI workloads. For example, in image recognition, a neural network must perform a matrix multiplication operation between an input image and a set of weights in the net’s first “layer”. Then it will consecutively perform matrix multiplication operations between the output of each layer and a set of weights in the next layer, until we get some vector representing the final classification result as the output.

Training vs inference

Neural networks cannot work “out of the box”, but must be trained using an extensive process that involves finding a set of weights that optimally map the input data into a desired output. Once a network is trained, it can be used for “inference,” i.e. to classify new input data. These two tasks require similar but not identical processing parameters, especially in terms of memory bandwidth and number of computations.

AI accelerator classification

So far, we have talked about the underlying principles that drive the development of AI accelerators and which distinguish them from other types of processors. Now let’s drill down into the specific types of AI accelerators, and see how these differ from one another.

Graphics Processing Units

Originally developed for rendering graphics on the screen (hence the name), GPUs turned out to be surprisingly good at accelerating deep learning workloads. Unlike CPUs, which work with data in a serial fashion and require a lot of switching between different tasks, GPUs do the opposite: they handle multiple pieces of data (e.g. pixels in an image) at the same time. As the data going through the layers of a neural network is essentially a large matrix (similar to an image), this makes GPUs naturally suited for accelerating the kind of matrix-based operations involved in most deep learning workloads.

Tensor Processing Units

A relatively recent development in the world of AI accelerators, TPUs were initially developed by Google for use in its data centers. The fundamental idea behind TPUs isto put Arithmetic Logic Units (ALUs) physically close together inside a single GPU, only interconnecting the adjacent ones, thus creating a 2D matrix-like structure. This allows TPUs to achieve very high performance with very low power consumption.

Tensore cores within conventional chips

While a TPU is a separate piece of hardware, the same ideas can be incorporated in general-purpose computing chips as well. By integrating so-called “tensor cores” into a conventional chip (say, one that is used in a security camera or a smartphone), one can add AI functionality to almost any device. For example, a trained neural network can be used for inference on video data flowing in real-time from a camera sensor and detection of objects, among other tasks.

Choosing the right AI accelerator for your use case

As it often happens in technology, there is no single right answer to this question. Each of the AI accelerators described above has different characteristics that make it more suitable for certain tasks than others. For example, GPUs are a great choice for “cloud AI” tasks such as DNA sequencing or machine translation, whereas TPUs (or tensor cores, for that matter) are a better fit for “edge computing” applications, where the hardware must be small, power-efficient, and low-cost.

Another thing to keep in mind is that AI accelerators of different types tend to complement each other. For example, you can use a GPU to train a neural network and then run inferences on it using a tailored TPU chip. (“Training on the edge” is a hot topic in AI acceleration, but has not currently reached the level of maturity needed for market adoption.)

Finally, you should always consider the programmability and versatility of your AI acceleration approach. GPUs tend to be more universal — you can basically run any TensorFlow/Keras code with them — while TPUs require compiling and optimization. At the same time, the added complexity of a TPU architecture allows for far more efficient execution of the same AI code, so it’s always a question of trade-offs.

Concluding remarks

The AI acceleration revolution is still in its early stages. While there are many accelerators on the market and more are being developed all the time, there is still much we do not know and can learn about this exciting new technology.

Like any other technological trend, there are a lot of hype around AI accelerators. A lot of it is misleading, and a lot of it is only partially correct. So make sure to diligently check the accuracy of any claims and to ask your vendor for more information about not just the technology behind AI accelerators, but also whether it has been actually tested and validated or is just fashionable marketing talk.

If you want to learn more about AI accelerators and about AI technology in general, make sure to subscribe to our blog and follow us on Twitter!

AI accelerators: How do they work, differ, and how to choose the right one?