AI chips: What they are, how they work, and which ones to choose?

As artificial intelligence (AI) and machine learning become more and more prevalent, the technology is starting to outpace the traditional processors that power our computers. This has led to the development of a new type of processors, known as AI chips.

What is an AI chip?

An AI chip is a computer chip that has been designed to perform artificial intelligence tasks such as pattern recognition, natural language processing and so on. These chips are able to learn and process information in a way that is similar to the human brain.

How do AI chips work?

Unlike traditional processors, AI chips are massively parallel. This means that they can perform many tasks at the same time, just like the brain is able to process multiple streams of information simultaneously.

Most of today’s artificial intelligence and machine learning applications are powered by neural networks — algorithms that are inspired by the way the brain works. That’s why, when we talk about AI chips, we are usually talking about chips that are designed to be able to run such algorithms faster and more efficiently than traditional processors.

What types of AI chips are there?

There are three main types of AI chips: graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and application specific integrated circuits (ASICs).

GPUs: General-purpose chips that were repurposed for AI

GPUs are the oldest type of AI chip out there. Originally designed to perform graphics tasks such as rendering video or creating 3D images, they turned out to be really good at simulating the operation of large-scale neural networks.

The downside is that, coming from a different field, they retain a lot of legacy features that are not really necessary for AI tasks. This makes them larger, more expensive, and generally less efficient than AI-specific chips.

See also: Race to the Edge: Can GPUs keep up?

ASICs: Specialized chips for specialized AI tasks

ASICs — application specific integrated circuits — are special types of computer chips that are designed to do one specific type of calculation very quickly. They can be used for things like Bitcoin mining, video encoding, or, in our case, running specific artificial intelligence tasks.

The main advantage of ASICs is that they are very efficient. Because they are designed to do one thing and one thing only, they don’t have any legacy features or functionality that is not required for the task at hand. This makes them much smaller and cheaper than GPUs.

However, the downside is that ASICs are not flexible. Once they have been designed for a specific task, they cannot be easily repurposed for other tasks. So, if you want to use an ASIC for a different type of AI application, you’ll need to design and manufacture a new chip — which can be costly.

Just how specific a given ASIC is can be a matter of debate. For example, Google’s Tensor Processing Unit (TPU) — a groundbreaking AI chip developed by Google — can perform an amazing range of AI tasks, but it is still an ASIC. Then again, not every company can afford to invest that much energy and resourcse into R&D of something as specialized as a TPU. (Shameless plug: we do.)

FPGAs: The best of both worlds?

A field-programmable gate array (FPGA) is a type of computer chip that can be configured by a user after it has been manufactured. This means that it can be made to perform different tasks, depending on how it is programmed.

FPGAs have some advantages over both GPUs and ASICs. They are more flexible than ASICs, because they can be reconfigured to perform different tasks. But, unlike GPUs, they don’t have any legacy features that make them larger and more expensive.

The disadvantage is that, like it often happens, a Jack of all trades might not be a master of any. FPGAs are not as flexible as GPUs and not as efficient as ASICs. So, if you’re looking for something really flexible or really efficient, you might want to choose either a GPU or an ASIC.

Edge vs cloud AI

A side note to make here is that there are two very distinctive fields in which AI chips are used: edge AI and cloud AI.

Cloud AI is a type of AI that is performed on powerful servers in remote data centers. This is the most common way in which AI is used today, as it allows organizations to pool resources and access a vast amount of computing power.

Edge AI, on the contrary, describes artificial intelligence that is performed on devices at the edge of a network, rather than in the cloud. This can be done for a variety of reasons, such as reducing latency or saving bandwidth.

Naturally, the choice of AI chip will be different for each of these fields. For example, for edge AI applications you might want a chip that is smaller and more power-efficient. Then it can be used in devices with limited space and resources — or where there’s no Internet connection at all.

If, instead, you are looking for a chip to power your cloud AI applications, you might want something that is more powerful and can handle more data. In this case, size and power efficiency might not be as much of a concern, so a good old GPU might be the best choice.

Training vs inference

Another important distinction to make here is between training and inference — the two fundamental processes that are performed by machine learning algorithms. In a nutshell, training is when a chip learns how to do something, while inference is when it uses what it has learned.

Say, if we were training a model to recognize different types of animals, we would use a dataset of pictures of animals, along with the labels — “cat,” “dog,” etc. — to train the model to recognize these animals. Then, when we want the model to infer — i.e., recognize an animal in a new picture.

Training is usually done on powerful machines in data centers, while inference is often performed on devices at the edge of a network. This is because training requires a lot of data and computing power, while inference can be done with less resources.

Inference, in turn, is much more sensitive to latency — the time it takes for a model to process an input and give an output. This is why edge AI is often used for applications where low latency is critical, such as autonomous vehicles or augmented reality.

That’s why you might want to choose a different type of AI chip for training than for inference. For example, for training you might want something that is more powerful and can handle more data, such as a GPU. Then, for inference, you can use a smaller and more power-efficient chip, such as an ASIC. Before that, you can model the same neural network using FPGAs for field-testing.

Which AI chip is right for you?

Now that what an AI chip is, how it works and what types of AI chips are out there, we can summarize our learnings in a few short bullet points:

If you are looking for a chip to power your cloud AI applications, you might want something that is more powerful and can handle more data. In this case, size and power efficiency might not be as much of a concern, so a good old GPU might be the best choice.
If you are looking for a chip to power your edge AI applications, you might want something that is smaller and more power-efficient. Then it can be used in devices with limited space and resources — or where there’s no Internet connection at all.
If you want a chip that is more flexible than an ASIC but not as large and expensive as a GPU, you might want to consider an FPGA. However, keep in mind that they are neither as efficient as ASICs nor as flexible as GPUs.
For training, you might want something that is more powerful and can handle more data, such as a GPU. Then, for inference, you can use a smaller and more power-efficient chip, such as an ASIC.

But of course, there is no one-size-fits-all answer to this question, neither is it an either-or question.

We hope this article helped you get your head around the basics of AI chips. If you want to learn more about AI chips, edge computing and artificial intelligence, check out our other articles on the topic: