Clone this repo:

Branches

  1. 3466fb2 final gds oasis by Jeff DiCorpo · 1 year, 11 months ago main
  2. 66b9102 delete unrelated files by b224hisl · 2 years ago
  3. a159c17 update files by b224hisl · 2 years ago
  4. c7654b7 Initial commit by Yifei Zhu · 2 years ago

HEHECORE

Overview

This is a small out-of-order RISC-V core written in synthesizable Verilog that supports the RV64IC unprivileged ISA and parts of the privileged ISA, namely M-mode.

Feature List

  • It currently supports RISC-V I extension
  • It currently supports M mode
  • It's a double issue architecture
  • It supports scalar register renaming
  • It currently supports only in-order issue from the issue queue
  • It has a ROB to do in-order committment
  • When an exception or an interrupt happens, the ROB will be responsible of when to trigger flush
  • It supports dynamic branch prediction(gshare)
  • It support out-of-order execution
  • nonblocking cache

Block Diagram

Pipeline Stages

architecture.png

1. Fetch

Instructions are fetched from Icache and pushed into a FIFO queue, known as the instruction buffer(fetch buffer) . Looking up the branch target buffer also occurs in this stage, redirecting the fetched instructions as necessary

F0 stage:

PC_gen will generate the correct PC, the pirority is :*Backend-end redirct target -> predicted PC -> PC+4*.

Meanwhile, Let PC do hashing with BHR, and get the hash index for Gshare.

F1 stage:

Sent the PC to Icache for instruction request

Do*pc+4*

Use PC to index a BTB entry and get*pc_pred.*

Use the hash value to index Gshare and get*taken or not-taken*.

F2 stage:

The response from Icache is put into a FIFO instruction buffer

Sent*pc+4* to the PC_gen

In Redirct logic, if Gshare predicts taken, it will sent pc_pred to PC_gen. Otherwise, pc_gen makes no sense.

2. Decode

F3 stage

In this stage, it pulls instructions out of the instruction buffer and generates the appropriate micro-op($\mu$op) to place into the pipeline
  • decode <=2 instruction per cycle

3. Issue

F4 stage

The ISA, or “logical”, register specifiers (e.g. x0-x31) are then*renamed* into “physical” register specifiers

Every instruction will do dependency check with its previous instruction to decide whether instructions will be double issued.

$\mu$op sitting in the head of ROB wait until all of their source registers are ready and they can read their operand then be issued.  This is the beginning of the out–of–order piece of the pipeline.
  • Double issue conditions:
    1. There is no dependency between the two instructions
    2. Function units are both ready
    3. Only arithemetic and load/store instructions can be double issued. (excluding conditional,csr, wfi etc.)

4. Execute F5 stage Instructions enter the Execute stage where the functional units reside. Issued ALU operations perform their calculation and get results.

For load&store instruction, first the address are calculated and put into the FIFO load/store queue. The instrution at the head of the queue sends request to Dcache if Dcache is ready.

5. Wrieback F6 stage results are written back to physical registers when completing instruction and update ROB status

  • About branch prediction If it‘s a branch instruction, the result will update Gshare and GHR. If it’s mispredicted, the instruction buffer will be flushed, and instructions will be fetched from the other path.

6. Instruction commit The Reorder Buffer (ROB), tracks the status of each instruction in the pipeline. When the head of the ROB is not-busy, the ROB commits the instruction.

  • instructions will be commited in order in ROB according to program order, at most 2 instructions can be commited at the same time

Blcok Description

Instruction Fetch Unit

F0: PC generation

Select the current pc address, whether it comes from a jump instruction or whether to judge branch in BTB and gshare or whether it comes from an exception. The default value is pc+4.

F1: fetch instruction

Instructions from Icache are put into the instruction buffer.

Pc_gen

Perform pc+4 or use the pc from other places.

  • content
  • input/output
  • signalI/Owidthdescriptioninteraction
    is_req_pcI1judge if btb need wirte infu
    btb_req_pcI32btb needed pcfu
    btb_predict_targetI32btb needed target pcfu
    prev_pcI32pc to gsharefu
    prev_branch_inI1if this pc is a branch or jumpfu
    prev_takenI1if this branch is takenfu
    rd_enI1decode is ready and want a instructiondecoder
    pc_outO32a pc outdecoder
    next_pc_outO32this pc out plus 4decoder
    instruction_outO32this pc's instructiondecoder
    valid_real_branchI1if fu give a valid real branchfu
    real_branchI32this real branch or jump branch pcfu
    trapI1judge trapwriteback
    mretI1judge mretwriteback
    trap_vectorI32vector of trapcsr
    mret_vectorI32vector of mretcsr
    stallI1if need stallhazard
    invalidateI1if is invalidatehazard
    fetch_addressO32pc addressbusio
    fetch_dataI32instructionbusio
    exception_valid_oO1exception validcsr
    ecause_oO1exception causecsr

BTB

LSU.drawio.png

instructions first go into the BTB, if there is a hit, pc = pc_from_btb; if match fail, PC = PC +4, after execute stage, BTB will be update

Revision History

Revision NumberAuthorDateDescription
0.1Xinze Wang2022.08.10init
0.2Xinze Wang2022.08.18update self check
  • content [x_btb] entry:
  • input/output
  • signalI/Owidthdescriptioninteraction
    pc_inI32from PC_genfetch
    next_pc_outO32=pc_target or =pc_currentfetch
    tokenO1to instruction bufferfetch
    req_pcI32update btbexecute
    predict_targetI32update btbexecute

Gsahre

instructions first go into the Gshare, and Gshare will give a prediction, and at execute stage, this prediction will update

content [gshare] entry:

Revision History

Revision NumberAuthorDateDescription
0.1Qiaowen Yang2022.08.10init
  • input/output
  • signalI/Owidthdescriptioninteraction
    pcI32from PC_genfetch
    cur_predO1give a predictionfetch
    prev_branch_inI1whether prev instr is branchexecute
    prev_takenI1whether prev instr takenexecute
    prev_predI1prev instr pred resultexecute
    prev_mispredO1whether prev instr mispredexecute

instruction buffer

Revision History

Revision NumberAuthorDateDescription
0.1Xinze Wang2022.08.10init
  • content [x_ib] entry:

  • input/output

  • signalI/Owidthdescriptioninteraction
    pc_inI32from PC_genfetch
    next_pc_inI32from PC_genfetch
    instruction_inI32from PC_genfetch
    pc_outO32give to decodedecode
    next_pc_outO32give to decodedecode
    instruction_outO32give to decodedecode
    wr_enI1want writefetch
    ins_fullO1stall pc infetch
    rd_enI1judge decode is readydecode

Instruction Decode Unit

Decoder

decoder.jpg

This decoder supports RV64I instructions. It gets the instr from fetch unit and gives the result to ROB unit. For branch instr, it will output a stall flag, until everything is ready.

Revision History

Revision NumberAuthorDateDescription
0.1Guohua Yin2022.08.16init
0.2Guohua Yin2022.08.22update

Items

Item NameDescription
decodedecode the instruction in-order from the instruction buffer
  • content
  • input/output
  • signalI/Owidthdescriptioninteraction
    clkI1clock signal
    rstnI1reset signal, active low, asynchronous reset
    pc_inI32get the pc from fetch unitinstr buffer
    next_pc_inI32get the next pc from fetch unitinstr buffer
    instruction_inI32get the instr from fetchfetch
    valid_inI1get the valid signalfetch
    ready_inI1get the ready signalrob
    branch_backI1handle the branch stallfu
    trappedI1pipeline controlfu
    wfi_inI1pipeline controlfu
    csr_dataI64get csr datacsr
    csr_readableI1flag about reading from csrcsr
    csr_writeableI1flag about writing to csrcsr
    csr_addressO12give to csrcsr
    uses_rs1O1use rs1rob
    uses_rs2O1use rs2rob
    uses_rdO1use rdrob
    uses_csrO1use csrrob
    pc_outO32give to rob the pcrob
    next_pc_outO32give to rob the next pcrob
    is_csrO1flag about csrrob
    write_select_outO2write select out signalrob
    rd_address_outO5give to robrob
    csr_address_outO12give to robrob
    mret_outO1give to robrob
    wfi_outO1give to robrob
    ecause_outO4give to robrob
    exception_outO1exceptionrob
    halfO1give to robrob
    valid_outO1valid flagrob
    ready_outO1tell fecth can readfetch
    csr_read_outO1read signalrob
    csr_write_outO1csr write signalrob
    csr_readable_outO1csr can be readrob
    csr_writeable_outO1can write csrrob
    csr_data_outO64to rob alurob
    imm_data_outO32to rob alu about immed-datarob
    alu_function_outO3to rob alurob
    alu_function_modifier_outO1to rob alurob
    alu_select_a_outO2alu select signal:arob
    alu_select_b_outO2alu select signal:brob
    cmp_function_outO3compare function signalrob
    jump_outO1to rob branchrob
    branch_outO1to rob branchrob
    is_alu_outO1to rob (lsu)rob
    load_outO1to rob (lsu)rob
    store_outO1to rob (lsu)rob
    load_store_size_outO2to rob (lsu)rob
    load_signed_outO1to rob (lsu)rob

RCU

Revision History

Revision NumberAuthorDateDescription
0.1Yihai Zhang2022.08.12init
2.0Yifei Zhu2022.08.24init

Items

Item NameDescription
rename tablerename the architecture register to physical one
rename table backuprename table recovery from backup when trap or branch
free listrecord the free physical register address
reorder bufferstore the corresponding data of each instruction
physical register-

Overview

rcu2.0.jpg

rename table

OperationportDescription
writeone write portwrite from free list when rob ready to be written
readone read portread when rob ready to be written
flushone flush portroll back to backup when trap comes
reset-when reset signal

rename table backup

OperationportDescription
--used for rename table rolling back itself
reset-when reset signal

physical register file

OperationportDescription
writetwo write portswrite when function unit finish
readtwo read portsread when rob ready to issue
flushtwo flush portset finish bit to regfile to indicate regfile has used
reset-when reset signal

free list (a fifo)

OperationportDescription
writeone write portwrite when instr commit
readone read portread when rob ready to be written
reset-when reset signal

reorder buffer

Item NameWidthDescription
rob op-to buffer data from decode for the use of pc, lsu, alu, csr and exception
use1 bitindicate whether rob is used
issue1 bitindicate whether rob issued
commit1 bitindicate whether rob is commited
finish1 bitindicate whether function unit writeback finished
exception1 bitindicate whether the instr raise an exception
prs16 bitthe mapped physical resource reg1 after renaming
prs26 bitthe mapped physical resource reg2 after renaming
prd6 bitthe mapped physical destination reg after renaming
rd5 bitthe architecture reg before ranaming
lprd6 bitthe mapped reg which is replaced

LSU

Revision History

Revision NumberAuthorDateDescription
0.1Peichen Guo2022.08.10init

Items

Item NameDescription
AGUAddress Generation Unit
Address Checkerto check address validation
Control Unitto interact with dcache and send stall

Overview

  • LSU top diagram

LSU.png

LSU(Load Store Unit)

Interface

decode interface:

NameGroupWidthDirectionDescription
//global interface
clk11inputclock signal.
rstn11inputreset signal, active low, asynchronous reset.
stall1RCT_EXE_STG_NUMinputhigh to indicate that stall the lsu pipeline.
//Interface with ROB
valid_i11inputinput valid signal,当指令为SL指令时置1,启动LSU
rob_index_i1ROB_INDEX_WIDTHinputROB slot id,需要和zyh进一步商量
rd_addr_i15inputrd addr,可能不需要,在没有ROT的情况下输入这个地址仅仅是因为需要进行rd != x0的判断,decoder也可以做。但如果有ROB的情况就需要rd addr了
imm_i1XLENinputimmediate ,符号扩展的XLEN位立即数,此处是地址偏移量
opcode_i11input1是store,0是load
size_i12inputsize of operation, 00是byte,01是half word, 10是word,11是double word
load_sign_i11inputsign of load, 1是unsigned
ROB_index_o1ROB_INDEX_WIDTHoutputROB Index out,代表会回填到哪行
ls_done_o11outputls 完成
lsu_ready_o11outputlsu is ready or not
//Interface with dcache
TODO: 还没有商量
//Interface with PRF
rs2_data_i1XLENinputsrc2 data,在store里是src reg,也就是store的数据
rs1_data_i1XLENinputrs1 data,在sl中是基址寄存器
rd_addr_o15outputreg 回填地址
load_data_valid_o11outputload data有效,这次回填的指令是load
load_data_o1XLENoutputload data
//Interface with Exception handler
exception_valid_o11outputjust for unit test, will be changed when exceprion handler is designed
ecause_o15outputjust for unit test, will be changed when exceprion handler is designed

Decode Interface:

NameGroupWidthDirectionDescription
//global interface
clk11inputclock signal
rstn11inputreset signal, active low, asynchronous reset
stall1RCT_EXE_STG_NUMinputhigh to indicate that stall the lsu pipeline
//Interface with ROB
valid_i11inputinput valid signal, set 1 when the instruction is SL, and start
rob_index_i1ROB_INDEX_WIDTHinputROB slot id, further discussion with zyh is required
rd_addr_i15inputrd address
imm_i1XLENinputimmediate, symbol extended XLEN bit immediate number, here is the address offset
opcode_i11input1 represent ‘store’,0 represent ‘load’
size_i12inputsize of operation, ‘00’ represent ‘byte’,‘01’ represent ‘half word’, ‘10’ represent ‘word’, ‘11’ represent ‘double’ word
load_sign_i11inputsign of load, 1 represent ‘unsigned’
ROB_index_o1ROB_INDEX_WIDTHoutputROB Index out, which represent which line to be backfilled
ls_done_o11outputls done
lsu_ready_o11outputlsu is ready or not
exception_valid_o11outputjust for unit test, will be changed when exceprion handler is designed
ecause_o15outputjust for unit test, will be changed when exceprion handler is designed
//Interface with dcache
req_valid_o11outputrequest valid
req_opcode_o11output0 for load, 1 for store
req_size_o12outputopcode[2] stands for unsigend; opcode[1:0] stands for ls width
req_addr_o1ADDR_WIDTHoutputrequest address
req_data_o1XLENoutputrequest data
req_ready_i11inputrequest ready
resp_valid_i11inputresponse valid
resp_data_i1XLENinputresponse data
resp_ready_o11outputresponse ready
//Interface with PRF
rs2_data_i1XLENinputsrc2 data,the src reg, which is the data of the store, is in the store
rs1_data_i1XLENinputrs1 data,it is the base address register in sl
rd_addr_o15outputreg backfill address
load_data_valid_o11outputload data is valid, backfill the load instruction this time
load_data_o1XLENoutputload data

Contribution

Guohua Yin,; Xinze Wang,; Yihai Zhang,; Qiaowen Yang,; Zhenxuan Luan,; Peichen Guo,; Minzi Wang,; Guangyuan Ma,; Yucheng Wang,; Shenwei Hu,; Yifei Zhu,

Verification

Verification suite includes unit test and regression test. We choose open source tool Verilator as simulator. In the module RTL design stage, the verifiers and designers firstly make clear the top-level signals and functions of the module, building an independent verification environment for submodules (such as Cache, Decoder, Gshare, etc.) that need to be verified. We give the module specific inputs and check whether the outputs meet the design expectations of the module functions, so as to accelerate the progress of RTL design.

After the preliminary completion of core design, the function of the core should be verified. Since Verilator compiles RTL code into C++ code and then runs the simulation, we use C++ to write a simulating memory and load Elf in it, so that core (including Icache and Dcache) interacts with memory, forming a basic computer system, and verifying the correctness of core functions. Step 1: Run isa-test given by RISC-V international to check whether the operation of a single instruction is correct. After passing isa-test, we decide to use RISC-V torture as stimulation for regression test, and iteratively fix our code during the testing process. The current system can achieve 99.99% accuracy when running 10000 torque test samples.

HeHe‘s Soc is equipped with instruction RAM and data RAM, in addition to the ILA interface. Our Soc verification scheme is to load Elf’s instruction segments and data segments into the corresponding RAM, and then reset the core to start running the test program. Check whether the running results are correct through ila.

Work Load Division

Front-end group: Xinze Wang(BTB, Fetch), Qiaowen Yang(Gshare), Guohua Yin(Decoder)

Back-end group: Yihai Zhang(ROB,renaming), Peichen Guo(Hazard), Mingzi Wang(Cache),

Validation group: Xinlai Wan, Guangyuan Ma, Yucheng Wang, Shenwei Hu,

SOC: Qiaowenyang

Top module: Zhenxuan Luan, Yifei Zhu