Renzym Machine Learning Accelerator

License UPRJ_CI Caravel Build

This repo contains a design to accelerate inference in machine learning problems. More specifically, the accelerator offloads and accelerates convolution and max pooling operations. Top level wrapper has eleven identical REN_CONV_TOP cores that are R/W accessible from wishbone master interface. The main idea is that a processor that wants to offload, writes three rows of an image and a set of kernels that need to be applied on image using wishbone interface to internal memory of accelerator, configures accelerator and kickstarts it. Accelerator then performs the configured operations (convolution and maxpool), writes back results to results RAM and asserts a done signal which is readable using wishbone interface. The processor can poll done signal and read back result ram for results. Since there are eleven cores, so processor can software pipeline the execution to further optimize time consumed in calculation.

Block Diagram

REN_CONV_TOP Core

Each REN_CONV_TOP Core consists of a register file for configuration and status writes/reads, REN_CONV engine and associated image, kernel and result RAMs. REN_CONV engine computes 3xN convolution, where N is configurable. This block starts working when start signal is asserted. The address generation unit (AGU) starts generating addresses of image and kernals RAMs, the data from RAMs is forwarded to DATA PATH. It multiplies and accumulates results to calculate convolution result. The convolution result is forwarded to a bypass-able max pool block. If enabled, max pool block forwards max of every pair of values from convolution output. If disabled, data from convolution is forwarded as is. The output of max pool is written to results ram. Configurations and RAMs are kept external to REN_CONV engine so that this generic block can multiplex between different configurations and data sets. In future implementations, multi-pass operations can be designed to support re-use of outputs as inputs to next pass without disturbing CPU. External config block also ensures that there can be different such blocks for different interfaces (e.g. AXI, Wishbone etc.) without affecting rest of the design.

Detailed Diagram

Memory Map

The address map for the module is given in the following table.

NameAddressAccessibilityDescription
Core 0
Config/Status0x30000000-0x3000000CRWControl/Status Registers
Image RAM0x30000100-0x300001FCWImage data 3-bytes per word
Kernel RAM0x30000200-0x300002FCWKernels data 3-bytes per word
Result RAM0x30000300-0x300003FCRKernels data 1-byte per word
Core 1
Config/Status0x3100000-0x3100000CRWControl/Status Registers
Image RAM0x31000100-0x310001FCWImage data 3-bytes per word
Kernel RAM0x31000200-0x310002FCWKernels data 3-bytes per word
Result RAM0x31000300-0x310003FCRKernels data 1-byte per word
Core 2
Config/Status0x32000000-0x3200000CRWControl/Status Registers
Image RAM0x32000100-0x320001FCWImage data 3-bytes per word
Kernel RAM0x32000200-0x320002FCWKernels data 3-bytes per word
Result RAM0x32000300-0x320003FCRKernels data 1-byte per word
Core 3
Config/Status0x33000000-0x3300000CRWControl/Status Registers
Image RAM0x33000100-0x330001FCWImage data 3-bytes per word
Kernel RAM0x33000200-0x330002FCWKernels data 3-bytes per word
Result RAM0x33000300-0x330003FCRKernels data 1-byte per word

Bit-field descriptions for Control/Status Register 0

31 --- 83210
ReservedSoft ResetStartOverflowDone

Register 0 bit fields description

Bit FieldDescription
DoneStatus bit showing completion of operation
OverflowSticky status bit set in case of accumulator overflow
StartControl bit. Set to kickstart operation
Soft resetResets AGU and Data path registers

Bit-field descriptions for Control/Status Register 1

31 --- 2418-1615-82-0
StrideKernsColsKern Cols

Register 1 bit fields description

Bit FieldDescription
Kern ColsNo. of cols of kernel - 1
ColsNo. of columns of image -1
KernsNo. of kernels-1
StrideStride during convolution

Bit-field descriptions for Control/Status Register 2

31 --- 2120-18171611-87-0
ReservedMaskKern Address ModeShiftResult Cols

Register 2 bit fields description

Bit FieldDescription
Result ColsNo. of cols of result - 1
ShiftAccumulator's result can be shifted to support different Q-formats
Kern Address ModeIn mode zero next kernel starts at 4, in mode 1 at 8
MaskConvolution rows can be masked to support non 3xN convs