Jotrin Electronics
Description Quantity Total (USD) Operation
Shopping cart products
Shopping cart products : 0
Home > Programmable logic > 2021 Deployment of Deep Learning Models on FPGAs

2021 Deployment of Deep Learning Models on FPGAs

Update Time: 2021-06-08 11:02:36

Today, we introduce you to the methods and platforms for deploying deep learning algorithmic models on FPGAs. Jotrin is recently working on an open source project, and will give you several tutorials on FPGA deployment from scratch, including some underlying development, quantitative inference of the model, etc., because there are too many things involved.  so we have to write separate pictures.

Deployment of Deep Learning Models on FPGAs.png

FPGA and "Labyrinth"

The FPGA is a field-programmable logic gate array, which is very flexible and really smells good for field programming. Speaking of which, partners may still not quite understand, then we compare with ARM, ARM can be understood as a maze, for example, there are many maze imports and corresponding exports, there are many "dark doors" in the middle of the road can go, ARM chip programming is to trigger one of the pathways, the road is dead, we can not change. FPGA is if we want a maze, FPGA provides a big "box", there are a lot of "partitions", we can build one ourselves, what kind of road you want, similar to playing my world. Only the "mine" is a variety of logic gates. That means that FPGAs can design peripheral circuits and CPUs, is not very cool, of course, cool behind the development of the difficulty is also quite large, this "specific properties" very time to do artificial intelligence algorithm acceleration. Due to the production of special circuitry, FPGA is often used to do signal processing, with DSP or ARM use, and later also useful FPGA or CPLD to build "mining machine" when the "mine boss"

Getting Started A: PYNQ

PYNQ is Python + ZYNQ, using Python for FPGA development, first of all, to emphasize the point that Python is very hot in recent years, although very powerful, but he developed hardware is not really do hardware, I hope you do not fan.

We are analogous to the very hot MicroPython, using Python to develop hardware is to have a specific circuit design, unless you are a big brother to modify the underlying firmware, but all modify the underlying, is it possible to develop their own just fine. Of course this is geared towards the white man, the corresponding development board is shown below.

Deployment of Deep Learning Models on FPGAs2.png

This board is similar to our previous play MicroPython, but also a variety of tuning package. In fact ZYNQ is a dual-core ARM Cortex-A9 processor and an FPGA, using Python words can be developed through Jupyter, is not very fragrant, so this is very suitable for white.

FPGA on running BNN (binary neural network) is very good, "PYNQ-Z1 different machine learning dataset (dataset) test results show that: for MNIST dataset PYNQ-Z1 can achieve 168,000 images per second classification, delay 102 subtle, accuracy rate of 98.4%; for CIFAR-10, SVHN, and GTSRB datasets PYN1-Z1 can achieve classification of 1700 images per second with a latency of 2.2 ms and accuracy of 80.1%, 96.69%, and 97.66%, respectively, all with system power consumption kept at around 2.5W."

Deployment of Deep Learning Models on FPGAs3.png

How convenient this really is, let's look at a piece of code, first we call the model.

import bnn

hw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A1,'cifar10',bnn.RUNTIME_HW)

sw_classifier = bnn.CnvClassifier(bnn.NETWORK_CNVW1A1,'cifar10',bnn.RUNTIME_SW)

Conducting tests.

from IPython.display import display

im ='car.png')

im.thumbnail((64, 64), Image.ANTIALIAS)


car_class = hw_classifier.classify_image_details(im)

print("{: >10}{: >13}".format("[CLASS]","[RANKING]"))

for i in range(len(car_class)):

    print("{: >10}{: >10}".format(hw_classifier.classes[i],car_class[i]))

Also supports matplotlib for data visualization.

%matplotlib inline

import matplotlib.pyplot as plt

x_pos = np.arange(len(car_class))

fig, ax = plt.subplots() - 0.25, (car_class/100.0), 0.25)

ax.set_xticklabels(hw_classifier.classes, rotation='vertical')



This is not Python, it is really very convenient, and image processing is also compatible with the use of Pillow. the document gives some examples of image recognition, you can go to see. The file gives some examples of image recognition, so you can check it out. Some day, Achai will give you a tutorial on building PYNQ from scratch, including the quantitative inference of the model, etc.

Beginner B: DPU

DPU is a programmable engine for convolutional neural networks. The unit contains a register configuration module, a data controller module, and a convolutional computation module. Of course, the powerful PYNQ is also supported by the use of DPU, if you use this directly look at the Python API on the development board can use ZCU104. gods a lot of direct use ZYNQ open the whole, but that difficulty is really not suitable for beginners to see, and so busy with the project Achai to partners throughout the tutorial of this.

Deployment of Deep Learning Models on FPGAs4.png

We first clone down the project and compile.

git clone

cd DPU-PYNQ/upgrade


Install pynq-dpu.

pip install pynq-dpu

Start jupyter-notebook.

pynq get-notebooks pynq-dpu -p .

For the DPU design, we need to do it on our own computer. After adding the module, we compile it using the following command.

For ZYNQ + DPU development process jotrin will come out later in a separate issue, because there are too many things involved.

Support framework: Paddle-Lite

Since python are available, it is certainly Paddle-Lite inference framework is also feasible, Baidu also has a special deployment development kit EdgeBoard. edgeBoard is based on the Xilinx Zynq UltraScale + MPSoC series chip to build the computing card, the chip internal integration of ARM processor + GPU + FPGA architecture, both with the ARM processor + GPU The EdgeBoard is based on the Xilinx Zynq UltraScale+ MPSoC series of chips, with an integrated ARM processor + GPU inside the chip + FPGA architecture, which has both multi-core processing capability and video streaming hard decode processing capability, as well as the programmable features of FPGA.

Introduced these kinds of, in fact, we can see that it is not difficult to get started using, difficult is the underlying hardware design and algorithm acceleration quantification and so on, these are packaged things, we really develop or have to slowly buckle the bottom.


Previous: How to convert numeric constants to memory addresses in embedded development?

Next: integrate FPGAs: embedded FPGAs (SoC) and FPGA Chiplets (SiP)



Account Center


Live Chat