This approach greatly reduces instruction-decoding overhead, enabling the Tensor Processor Unit (TPU) to pack frequently used data in SRAM to minimise usage inefficient external memory while computing more than 16 000 multiplyaccumulate (MAC) operations per cycle. The result is performance of up to 50 billion operations per second (TOPS). The internal vector engine can also handle floating-point data, allowing it to perform RNN and mixed precision calculations.

EDGED TENSOR PROCESSING

EDGED
Normally NN acceleration requires a tradeoff between minimum latency and maximum performance. When even just a small percentage of the workload requires milliseconds responses, a traditional GPU's performance quickly degrades. This means deploying more racks, more power, and more maintenance costs, excessive power consumption. With EDGE TPU, there is no tradeoff. Responsiveness and performance go hand in hand, providing agility and enables high performance in restricted power supply applications.
EDGED has taken a new architectural approach to accelerating neural networks. Instead of creating a small programmable core and replicating it dozens or hundreds of times (such as e.g. Graphcore), we designed a single computation block with matrix and vector operations units.
LOW LATENCY FOR COMPUTE-INTENSIVE TASKS
Ref. E
UNIT IP CORE

EDGED TPU FAMILY FOR INFERENCE

EDGED TPU IP Core Customizable IP block of tensor accelerator, suitable for manufacturing in data systems
REFERENCE DESIGN CHIP H
Processor for high performance low latency processing systems
REFERENCE DESIGN CHIP E System on a chip for embedded systems with general purpose processor and EDGED TPU

OUR SOLUTIONS

Best in class chip performance @28nm, TPU of 40 TOPS, energy efficiency of 2 TOPS per Watt
Best in class throughput on given latency
Seamless compatibility with popular open source NN frameworks/libraries, such as TensorFlow, Keras, ONNX
Optional h/w support of recurrent DNN (LSTM, GRU)
28nm Silicon-proven architectures and design

APPLICATION SCOPE

Self-driving, unmanned vehicles and robotic control systems
Automatic guidance systems
Pattern recognition systems
Decision support systems (pattern recognition, forecasting, risk assessment, etc.)
Voice recognition and speech-to-text conversion, speech synthesis, keyword spotting
Image enhancement
Identification of abnormal situations in complex systems
Dark data based forecasting

EDGED TPU IP CORE

Fully parametric character IP block of EDGED TPU make possible deep optimization of performance per square nm and performance per watt characteristics based on customer's specific computational requirements, i.e. neural network topology and application specific requirements (bandwidth & latency)

Silicon-proven architectures and design @28nm HPC+.

Linley Microprocessor Report available.
Conv layers
Fully-connected, Convolution (2d/3d, Depth-wise separable, Dilated conv, Deconv)
RNNs
LSTM, GRU
ReLU-based functions
ReLU, Leaky ReLU (PReLU), ReLU6, HardTanh
Non-ReLU-based functions
Sigmoid, Tanh, SoftMax
Pooling modes
Average, Max
Other
Residual (Skip-connections), Attention, Concatenation
Tensor Transforms
Depth2Space, Space2Depth, Transpose

REFERENCE CHIP H

EDGED reference design chip H is a processor for low latency high intensity inference with PCIe host processor interface. Reference implementation is presented by a PCIe board. A typical solution for cloud is presented by a PCIe expansion board compatible with the weight and size specifications of server rack solutions. Sample board with EDGED TPU with PCIe-card form factor (half-length, half-height)
Number of cores
>16.000
Clock rate
1000 MHZ (variable)
Technology
28 nm HPC+
Power
Max. 50 W

REFERENCE CHIP E

EDGED TPU E is on reference SoC for embedded applications with a dual-core superscalar microprocessor, tensor accelerator EDGED TPU and a large set of interfaces for connecting external devices.

DISCOVER HOW THE EDGED TPU IP CORE CAN CREATE BETTER USER EXPERIENCE FOR YOUR PRODUCTS

GET MORE INFO

© EDGED. All rights reserved
GET MORE INFO