\title[Chisel]{Chisel: Constructing Hardware In a Scala Embedded Language}
Jonathan Bachrach, Huy Vo, Brian Richards,
Yunsup Lee, Andrew Waterman, Rimas Avizienis, Henry Cook,
John Wawrzynek, Krste Asanovic}
EECS UC Berkeley
21st Century Architecture Design
Harder to get hardware / software efficiency gains
\item Need massive design-space exploration
\item Hardware and software codesign and cotuning
\item Need meaningful results
\item Cycle counts
\item Cycle time, power and area
\item Real chips
\item Traditional architectural simulators, hardware-description
languages, and tools are inadequate
\item Slow
\item Inaccurate
\item Error prone
\item Difficult to modify and parameterize
Bottom Line -- Shorten Design Loop
Make it
\item Easier to make design changes
\item Fewer lines of design code ( \textbf{>> 3x} )
\item More reusable code
\item Parameterize designs
\item Faster to test results ( \textbf{>> 8x} )
\item Fast compilation
\item Fast simulation
\item Easy testing
\item Easy verification
\item Explore more design space
Chisel is ...
\item Best of hardware and software design ideas
\item Embedded within Scala language to leverage mindshare and language design
\item Algebraic construction and wiring
\item Hierarchical, object oriented, and functional construction
\item Abstract data types and interfaces
\item Bulk connections
\item Multiple targets
\item Simulation and synthesis
\item Memory IP is target-specific
single source
\includegraphics[width=0.99\textwidth]{../manual/figs/targets.pdf} \\
multiple targets
The Scala Programming Language
\item Compiled to JVM
\item Good performance
\item Great Java interoperability
\item Mature debugging, execution environments
\item Object Oriented
\item Factory Objects, Classes
\item Traits, overloading etc
\item Functional
\item Higher order functions
\item Anonymous functions
\item Currying etc
\item Extensible
\item Domain Specific Languages (DSLs)
\includegraphics[height=0.4\textheight]{figs/programming-scala.pdf} \\
Algebraic Graph Construction
Mux(x > y, x, y)
Creating Component
class Max2 extends Component {
val io = new Bundle {
val x = UFix(width = 8).asInput
val y = UFix(width = 8).asInput
val z = UFix(width = 8).asOutput }
io.z := Mux(io.x > io.y, io.x, io.y)
\includegraphics[width=0.95\textwidth]{figs/Max2c.pdf} \\
Connecting Components
val m1 = new Max2() := a := b
val m2 = new Max2() := c := d
val m3 = new Max2() := :=
\includegraphics[width=0.99\textwidth]{figs/Max4.pdf} \\
Defining Construction Functions
def Max2 = Mux(x > y, x, y)
Max2(x, y)
\includegraphics[width=0.95\textwidth]{figs/Max2.pdf} \\[1cm]
Functional Construction
Reduce(Array(a, b, c, d), Max2)
\includegraphics[width=0.99\textwidth]{figs/reduceMax.pdf} \\
class GCD extends Component {
val io = new Bundle {
val a = UFix(INPUT, 16)
val b = UFix(INPUT, 16)
val z = UFix(OUTPUT, 16)
val valid = Bool(OUTPUT) }
val x = Reg(resetVal = io.a)
val y = Reg(resetVal = io.b)
when (x > y) {
x := x - y
} .otherwise {
y := y - x
io.z := x
io.valid := y === UFix(0)
Primitive Datatypes
\item{Chisel has 4 primitive datatypes}
\item[Bits] -- raw collection of bits
\item[Fix] -- signed fixed-point number
\item[UFix] -- unsigned fixed-point number
\item[Bool] -- Boolean value
\item Can do arithmetic and logic with these datatypes
Example Literal Constructions
val sel = Bool(false)
val a = UFix(25)
val b = Fix(-35)
where val is a Scala keyword used to declare variables whose values won't change
Aggregate Data Types
\item User-extensible collection of values with named fields
\item Similar to structs
class MyFloat extends Bundle{
val sign = Bool()
val exponent = UFix(width=8)
val significand = UFix(width=23)
\item Create indexable collection of values
\item Similar to arrays
% \textbf{Vec Example}
val myVec = Vec(5){ Fix(width=23) }
Abstract Data Types
\item The user can construct new data types
\item Allows for compact, readable code
\item Example: Complex numbers
\item Useful for FFT, Correlator, other DSP
\item Define arithmetic on complex numbers
class Complex(val real: Fix, val imag: Fix)
extends Bundle {
def + (b: Complex): Complex =
new Complex(real + b.real, imag + b.imag)
val a = new Complex(Fix(32), Fix(-16))
val b = new Complex(Fix(-15), Fix(21))
val c = a + b
Polymorphism and Parameterization
\item Chisel users can define their own parameterized functions
\item Parameterization encourages reusability
\item Data types can be inferred and propagated
Example Shift Register:
def delay[T <: Data](x: T, n: Int): T =
if(n == 0) x else Reg(delay(x, n - 1))
\item The input \verb+x+ is delayed n cycles
\item \verb+x+ can by of any type that extends from \verb+Data+
\begin{frame}[fragile, shrink]
Functional Composition
class Cache(cache_type: Int = DIR_MAPPED,
associativity: Int = 1,
line_size: Int = 128,
cache_depth: Int = 16,
write_policy: Int = WRITE_THRU
) extends Component {
val io = new Bundle() {
val cpu = new IoCacheToCPU()
val mem = new IoCacheToMem().flip()
val addr_idx_width = log2(cache_depth).toInt
val addr_off_width = log2(line_size/32).toInt
val addr_tag_width = 32 - addr_idx_width - addr_off_width - 2
val log2_assoc = log2(associativity).toInt
if (cache_type == DIR_MAPPED)
State Elements
Simplest element is positive edge triggered register:
val prev_in = Reg(in)
Can assign data input later using wiring
val pc = Reg(){ UFix(width = 16) }
pc := pc + UFix(1, 16)
Can quickly define more useful circuits
def risingEdge(x: Bool) = x && !Reg(x)
Conditional Updates
Convenient to specify updates spread across several statements
val r = Reg() { UFix(width = 16) }
when (c === UFix(0)) {
r := r + UFix(1)
when (c1) { r := e1 }
when (c2) { r := e2 }
\includegraphics[width=0.95\textwidth]{figs/condupdates.pdf} }
Composition of Conditional Updates
when (a) { when (b) { body } }
when (c1) { u1 }
.elsewhen (c2) { u2 }
.otherwise { ud }
Dynamic Scoping
def condUpdateR (c: Bool, d: Data) = when (c) { r := d }
when (a) { condUpdateR(b, x) }
when (a) { when (b) { r := x } }
Symmetry of Conditional Updates
Regs and Wires
x := init
when (isEnable) {
x := data
Vecs and Mems
when (isEnable) {
m(addr) := data
Object Oriented Conditional Updates
example:
val out = (new EnqIo()){ new Packet() }
when (in.valid && out.ready) {
val in = (new DeqIo()){ new Packet() }
val outs = Vec(4){ new EnqIo()){ new Packet() } }
val tbl = Mem(4){ UFix(width = 2) }
when (in.valid) {
val k = tbl(
when (outs(k).ready) {
\includegraphics[width=0.99\textwidth]{figs/filter.pdf} \\[20mm]
Component Testing
class Mux2IO extends Bundle {
val sel = Bits(width = 1).asInput
val in0 = Bits(width = 1).asInput
val in1 = Bits(width = 1).asInput
val out = Bits(width = 1).asOutput
class Mux2Tests extends Iterator[Mux2IO] {
var i = 0
val n = pow(2, 3)
def hasNext = i < n
def next = {
val io = new Mux2IO
val k = Bits(i, width = log2up(n))
io.sel := k(0)
io.in0 := k(1)
io.in1 := k(2)
io.out := Mux(k(0), k(1), k(2))
i += 1
Chisel Line Count Breakdown
\item \verb+~+5200 lines total
\item Embeds into Scala well
Chisel versus Hand-Coded Verilog
\item 3-stage RISCV CPU hand-coded in Verilog
\item Translated to Chisel
\item Resulted in 3x reduction in lines of code
\item Most savings in wiring
\item Lots more savings to go ...
% \item Chisel-generated Verilog gives comparable synthesis quality of results
Process Language
Composeable State Machines
Do{ ... }
Exec(c){ a } / Exec{ a }
Skip / Wait(n)
Seq(a, ...)
Par(a, ...)
Alt(c, a1, a2)
While(c){ a } / Loop{ a }
Each process block uses a when
when (io.start) { ... }
to ensure that state updates are updated only when process execute.
\includegraphics[width=0.9\textwidth]{figs/process.pdf} \\
\begin{frame}[fragile, shrink]
Process Language Example
class Multiply extends Component {
val io = new Bundle{
val start = Bool(INPUT);
val x = UFix(dir = INPUT, width = 32)
val y = UFix(dir = INPUT, width = 32)
val z = UFix(dir = OUTPUT, width = 32)
val finish = Bool(OUTPUT) }
val a = Reg(){ UFix(0, 32) }
val b = Reg(){ UFix(0, 32) }
val acc = Reg(){ UFix(0, 32) }
val finish =
Exec(io.start) {
Seq(Do{ a := io.x; b := io.y; acc := UFix(0, 32) },
While(b != UFix(0, 32)) {
Do{ a := (a << UFix(1))
b := (b >> UFix(1))
acc := Mux(b(0) === Bits(1), acc+a, acc) } })
io.finish := finish
io.z := acc
Transactors and Beyond
class Router extends Transactor {
val n = 2
val io = new RouterIO(n)
val tbl = Mem(32){ UFix(width = sizeof(n)) }
defRule("rd") {
val cmd = io.reads.deq()
defRule("wr") {
val cmd = io.writes.deq()
defRule("rt") {
val pkt =
Rocket Microarchitecture
\item 6-stage RISC decoupled integer datapath + 5-stage IEEE FPU + MMU
and non-blocking caches
\item Completely written in Chisel
Single Source / Multiple Targets
single source
\includegraphics[width=0.95\textwidth]{../manual/figs/targets.pdf} \\
multiple targets
Fast Cycle-Accurate Simulation in C++
\item Compiles to single class
\item Keep state and top level io in class fields
\item \verb+clock_lo+ and \verb+clock_hi+ methods
\item Generates calls to fast multiword library using C++ templates
\item specializing for small word cases
\item remove branching as much as possible to utilize maximum ILP in processor
Simulator Comparison
Comparison of simulation time when booting Tessellation OS
\textbf{Simulator} & \textbf{Compile} & \textbf{Compile} & \textbf{Run} & \textbf{Run} & \textbf{Total} & \textbf{Total} \\
& \textbf{Time (s)} & \textbf{Speedup} & \textbf{Time (s)} & \textbf{Speedup} & \textbf{Time (s)} & \textbf{Speedup} \\
VCS & 22 & 1.000 & 5368 & 1.00 & 5390 & 1.00 \\
Chisel C++ & 119 & 0.184 & 575 & 9.33 & 694 & 7.77\\
Virtex-6 & 3660 & 0.006 & 76 & 70.60 & 3736 & 1.44\\
Simulation Crossover Points
Data Parallel Processor Tape Out Results
Completely written in Chisel
The data-parallel processor layout results using IBM 45nm SOI 10-metal layer process using memory compiler generated 6T and 8T SRAM blocks.
\item Open source with BSD license
\item \
\item complete set of documentation
\item bootcamp / release june 8, 2012
\item Library of components
\item queues, decoders, encoders, popcount, scoreboards, integer ALUs, LFSR, Booth multiplier, iterative divider, ROMs, RAMs, CAMs, TLB, caches, prefetcher, fixed-priority arbiters, round-robin arbiters, IEEE-754/2008 floating-point units
\item Set of educational processors including:
\item microcoded processor, one-stage, two-stage, and five-stage pipelines, and an out-of-order processor, all with accompanying visualizations.
\item Automated design space exploration
\item Insertion of activity counters for power monitors
\item Automatic fault insertion
\item Faster and more scalable simulation
\item More generators
\item More little languages
\item Compilation to UCLID