blob: 7b6fd9d6f945134225c8e5fe52ebb03742bbf8a8 [file] [log] [blame] [view]
tamood736081e2021-06-19 09:49:12 +05001# Renzym Machine Learning Accelerator
manarabdelatyf2b6ea22021-04-20 19:07:40 +02002
Manar90842af2021-04-20 11:19:16 +02003[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![UPRJ_CI](https://github.com/efabless/caravel_project_example/actions/workflows/user_project_ci.yml/badge.svg)](https://github.com/efabless/caravel_project_example/actions/workflows/user_project_ci.yml) [![Caravel Build](https://github.com/efabless/caravel_project_example/actions/workflows/caravel_build.yml/badge.svg)](https://github.com/efabless/caravel_project_example/actions/workflows/caravel_build.yml)
Manarc7bcaf92021-04-16 18:21:23 +02004
tamood736081e2021-06-19 09:49:12 +05005This repo contains a design to accelerate inference in machine learning problems.
6More specifically, the accelerator offloads and accelerates convolution and max pooling operations.
tamood78584292021-07-03 13:30:37 +05007Top level wrapper has eleven identical REN_CONV_TOP cores that are R/W accessible from wishbone master interface.
tamood736081e2021-06-19 09:49:12 +05008The main idea is that a processor that wants to offload, writes three rows of an image and a set of kernels that need to be applied on image using wishbone interface to internal memory of accelerator, configures accelerator and kickstarts it.
9Accelerator then performs the configured operations (convolution and maxpool), writes back results to results RAM and asserts a done signal which is readable using wishbone interface.
10The processor can poll done signal and read back result ram for results.
tamood78584292021-07-03 13:30:37 +050011Since there are eleven cores, so processor can software pipeline the execution to further optimize time consumed in calculation.
Manarc7bcaf92021-04-16 18:21:23 +020012
tamood736081e2021-06-19 09:49:12 +050013![Block Diagram](./docs/source/_static/Top-level.PNG)
Manarc7bcaf92021-04-16 18:21:23 +020014
tamood736081e2021-06-19 09:49:12 +050015## REN_CONV_TOP Core
16Each REN_CONV_TOP Core consists of a register file for configuration and status writes/reads, REN_CONV engine and associated image, kernel and result RAMs.
17REN_CONV engine computes 3xN convolution, where N is configurable. This block starts working when start signal is asserted. The address generation unit (AGU) starts generating addresses of image and kernals RAMs, the data from RAMs is forwarded to DATA PATH. It multiplies and accumulates results to calculate convolution result. The convolution result is forwarded to a bypass-able max pool block. If enabled, max pool block forwards max of every pair of values from convolution output. If disabled, data from convolution is forwarded as is. The output of max pool is written to results ram.
18Configurations and RAMs are kept external to REN_CONV engine so that this generic block can multiplex between different configurations and data sets. In future implementations, multi-pass operations can be designed to support re-use of outputs as inputs to next pass without disturbing CPU. External config block also ensures that there can be different such blocks for different interfaces (e.g. AXI, Wishbone etc.) without affecting rest of the design.
Manar2d350282021-04-19 23:51:16 +020019
tamood736081e2021-06-19 09:49:12 +050020![Detailed Diagram](./docs/source/_static/top.PNG)
21
22## Memory Map
23
24The address map for the module is given in the following table.
25
26| **Name** | **Address** | **Accessibility** | **Description** |
27| --- | --- | --- | --- |
28| **Core 0** |
29| Config/Status | 0x30000000-0x3000000C | RW | Control/Status Registers |
30| Image RAM | 0x30000100-0x300001FC | W | Image data 3-bytes per word |
31| Kernel RAM | 0x30000200-0x300002FC | W | Kernels data 3-bytes per word |
32| Result RAM | 0x30000300-0x300003FC | R | Kernels data 1-byte per word |
33| **Core 1** |
34| Config/Status | 0x3100000-0x3100000C | RW | Control/Status Registers |
35| Image RAM | 0x31000100-0x310001FC | W | Image data 3-bytes per word |
36| Kernel RAM | 0x31000200-0x310002FC | W | Kernels data 3-bytes per word |
37| Result RAM | 0x31000300-0x310003FC | R | Kernels data 1-byte per word |
38| **Core 2** |
39| Config/Status | 0x32000000-0x3200000C | RW | Control/Status Registers |
40| Image RAM | 0x32000100-0x320001FC | W | Image data 3-bytes per word |
41| Kernel RAM | 0x32000200-0x320002FC | W | Kernels data 3-bytes per word |
42| Result RAM | 0x32000300-0x320003FC | R | Kernels data 1-byte per word |
43| **Core 3** |
44| Config/Status | 0x33000000-0x3300000C | RW | Control/Status Registers |
45| Image RAM | 0x33000100-0x330001FC | W | Image data 3-bytes per word |
46| Kernel RAM | 0x33000200-0x330002FC | W | Kernels data 3-bytes per word |
47| Result RAM | 0x33000300-0x330003FC | R | Kernels data 1-byte per word |
48
49**Bit-field descriptions for Control/Status Register 0**
50
51| 31 --- 8 | 3 | 2 | 1 | 0 |
52| --- | --- | --- | --- | --- |
53| Reserved | Soft Reset | Start | Overflow | Done |
54
55Register 0 bit fields description
56
57| **Bit Field** | **Description** |
58| --- | --- |
59| Done | Status bit showing completion of operation |
60| Overflow | Sticky status bit set in case of accumulator overflow |
61| Start | Control bit. Set to kickstart operation |
62| Soft reset | Resets AGU and Data path registers |
63
64**Bit-field descriptions for Control/Status Register 1**
65
66| 31 --- 24 | 18-16 | 15-8 | 2-0 |
67| --- | --- | --- | --- |
68| Stride | Kerns | Cols | Kern Cols |
69
70Register 1 bit fields description
71
72| **Bit Field** | **Description** |
73| --- | --- |
74| Kern Cols | No. of cols of kernel - 1 |
75| Cols | No. of columns of image -1 |
76| Kerns | No. of kernels-1 |
77| Stride | Stride during convolution |
78
79**Bit-field descriptions for Control/Status Register 2**
80
81| 31 --- 21| 20-18 | 17 | 16 | 11-8 | 7-0 |
82| --- | --- | --- | --- | --- | --- |
83| Reserved | Mask | Kern Address Mode | Shift | Result Cols |
84
85Register 2 bit fields description
86
87| **Bit Field** | **Description** |
88| --- | --- |
89| Result Cols | No. of cols of result - 1 |
90| Shift | Accumulator's result can be shifted to support different Q-formats |
91| Kern Address Mode | In mode zero next kernel starts at 4, in mode 1 at 8 |
92| Mask | Convolution rows can be masked to support non 3xN convs |
93