Table of contents
Open Table of contents
Introduction
My masterβs graduate project is to build a RISC-V processor, Trimurti, with some special features to boost performance of the inference task. Iβve built a MIPS core in traditional HDL like system verilog, and boot a OS on the system successfully. But building a complex circuit with such a prehistoric language was a suffering process, you need to maintain port definitions in different modules, define wires and connect ports by yourself, all those tasks needs to do MANUALLY. It took me about two weeks to find a cache bug caused by a typo on module connection. In this project, I decided to build the processor with a Scala-based modern hardware construct language, SpinalHDL. Its Pipeline API allows the definition of flexible pipelines, but I canβt find document about how to use Pipeline, and whatβs happening under the hood. The goal of this article is to explain the design and usage of Pipeline and associate classes, with some backgrounds and example codes.
Traditional Processor Pipeline
Itβs helpful to know how pipeline is designed in traditional RISC-V processor before we talk about Pipeline in SpinalHDL. Here we choose the well-known RI5CY as a example, which is a 32-bit in-order RISC-V core with a 4-stage pipeline.

If you look at the rtl directory, youβll see the stages and their build blocks appeared in pipeline graph immediately. You can find cv32e40p_ex_stage, cv32e40p_id_stage, cv32e40p_if_stage and cv32e40p_load_store_unit, and you can expect which stage the alu, decoder or prefetch is located in. Finally all those units are connected together in cv32e40p_core.
rtl
βββ cv32e40p_aligner.sv
βββ cv32e40p_alu.sv
βββ cv32e40p_alu_div.sv
βββ cv32e40p_apu_disp.sv
βββ cv32e40p_compressed_decoder.sv
βββ cv32e40p_controller.sv
βββ cv32e40p_core.sv
βββ cv32e40p_cs_registers.sv
βββ cv32e40p_decoder.sv
βββ cv32e40p_ex_stage.sv
βββ cv32e40p_ff_one.sv
βββ cv32e40p_fifo.sv
βββ cv32e40p_fp_wrapper.sv
βββ cv32e40p_hwloop_regs.sv
βββ cv32e40p_id_stage.sv
βββ cv32e40p_if_stage.sv
βββ cv32e40p_int_controller.sv
βββ cv32e40p_load_store_unit.sv
βββ cv32e40p_mult.sv
βββ cv32e40p_obi_interface.sv
βββ cv32e40p_popcnt.sv
βββ cv32e40p_prefetch_buffer.sv
βββ cv32e40p_prefetch_controller.sv
βββ cv32e40p_register_file_ff.sv
βββ cv32e40p_register_file_latch.sv
βββ cv32e40p_sleep_unit.sv
βββ cv32e40p_wrapper.sv
βββ ...
A direct code organization, right? We may have been designing processor pipeline this way before verilog was invented. But such a pipeline is a typical high cohesion design. Codes to implement same functionality spreads everywhere. For example the M extension, we can find decode logic in ID stage, real execute logic in EX stage, and write back logic in WB stage. Lots of registers, ports and wires across pipeline is added to connect these logic. The more functionality we add, the more ports, registers and wires you need to maintain. When the pipeline has few stages, and the logic is clear, maintaining those connectors is not a big task. However, when you want to build a highly configurable pipeline. with lots functionality in the pipeline, maintenance would become a nightmare for everyone. Fortunately, SpinalHDL gives us an alternative way to build pipeline with less mind complexity.

VexRiscv Processor Pipeline
Pipeline in VexRiscv can considered as a previous version of the one in SpinalHDL lib, it gives us some basic ideas about how to group things for the same functionality together. VexRiscv is built with a generic pipeline contains some stages and plugins which implements some real decode/execute logic, etc.
Letβs start with the definition of Stage and Stagable.
class Stageable[T <: Data](_dataType: => T) extends HardType[T](_dataType) with Nameable {
def dataType = apply()
setWeakName(this.getClass.getSimpleName.replace("$",""))
}
class Stage() extends Area{
def outsideCondScope[T](that: => T): T = ???
def input[T <: Data](key: Stageable[T]): T = ???
def output[T <: Data](key: Stageable[T]): T = ???
def insert[T <: Data](key: Stageable[T]): T = ???
val inputs = mutable.LinkedHashMap[Stageable[Data],Data]()
val outputs = mutable.LinkedHashMap[Stageable[Data],Data]()
val inserts = mutable.LinkedHashMap[Stageable[Data],Data]()
...
}
Stageable is a wrapper around common HardType[Data] types, used as a key to index through hash maps. Satage defines a inputs, outputs, signals, inserts for a stage. No magic here, just some definitions of I/O ports and some helper functions to insert keys to hash maps and returns a wire for the key.
Then the Pipeline itself.
trait Pipeline {
type T <: Pipeline
val plugins = ArrayBuffer[Plugin[T]]()
var stages = ArrayBuffer[Stage]()
def build(): Unit ={
// connect logic...
}
// build pipeline before pop component
Component.current.addPrePopTask(() => build())
}
Pipeline is a trait, which defines a plugins and stages for a pipeline. build is a function to connect all the logic together. Itβs called before pop the component. build does the following things:
- build all the plugins to insert keys and logics to stages
- combine all the inserts to a single map, and in which stage the key is inserted
- check if there are any keys in
inputsandoutputsthat are not ininserts, and throw an erro if has - complete the
inputsandoutputsby adding keys ininsertsthat are not ininputsandoutputs - connect default output to input for all the keys in
inputsandoutputsin a stage, this makes the inputs will just flow through the stage if no logic changes it is inserted - connect previous stageβs output to current stageβs input for all the keys in
inputsandoutputsin a stage, with registers inserted.
This means, if you want to produce some data from stage A, and use it in stage B after A, you can simply derive a class from Stagable and insert it into stage A, then use it by query with input method in stage B. Pipeline will insert the necessary wires and registers to connect them together. A simple example is shown below, which adds 2 to the input and output the result after 4 stages.
class VexRiscVPipelineExample extends Component with Pipeline {
type T = VexRiscVPipelineExample
object RESULT extends Stageable(UInt(32 bits))
// build a new stage
def newStage(): Stage = { val s = new Stage; stages += s; s }
val stageA = newStage()
for (i <- 1 to 4) {
// set name to avoid conflict
newStage().setName(f"mid_$i")
}
val stageB = newStage()
val io = new Bundle {
val input = in UInt(32 bits)
val output = out UInt(32 bits)
}
// functional logic
stageA.insert(RESULT):= io.input + 1
io.output:= stageB.output(RESULT) + 1
}
Combining Pipeline and Stage, weβve successfully reduce the ports and wires we need to maintain in a pipeline. However, we still need to write code for different functionality in same Component. This is where Plugin comes in.
trait Plugin[T <: Pipeline] extends Nameable {
var pipeline: T = null.asInstanceOf[T]
// Used to setup things with other plugins
def setup(pipeline: T): Unit = ???
// Used to flush out the required hardware (called after setup)
def build(pipeline: T): Unit = ???
// convenience class to specify which stage the code species should be inserted
implicit class implicitsStage(stage: Stage){
def plug[T <: Area](area: T): T = ???
}
implicit class implicitsPipeline(stage: Pipeline){
def plug[T <: Area](area: T) = ???
}
}
Plugin gives us a way to group logic for same functionality together. It split the logic into two parts, setup and build. setup is used to interact with other plugins, such as adding how to decode a new instruction to decoder plugin, and build is used to flush the hardware. Plugin also provides a plug method to mark which stage the logic is inserted. When we use the Plugin style to implement the example above, we can do it like this:
class VexRiscVPluginExample extends Plugin[VexRiscVPipelineExample] {
override def build(pipeline: VexRiscVPipelineExample): Unit = {
import pipeline._
// we can even define io in plugin
// but remind to set name for it
val io = new Bundle {
val input = in UInt (32 bits)
val output = out UInt (32 bits)
}.setName("io")
object RESULT extends Stageable(UInt(32 bits))
// these logic lives in stageA
stageA plug new Area {
import stageA._
insert(RESULT):= io.input + 1
}
stageB plug new Area {
import stageB._
io.output:= output(RESULT) + 1
}
}
}
// no actual logic in Component, yay!
class VexRiscVPipelineExample extends Component with Pipeline {
type T = VexRiscVPipelineExample
plugins += new VexRiscVPluginExample
def newStage(): Stage = { val s = new Stage; stages += s; s }
val stageA = newStage().setName("A")
for (i <- 1 to 4) {
newStage().setName(f"mid_$i")
}
val stageB = newStage().setName("B")
}
Finally, we can build a processor with code grouped by functionality, and the pipeline is automatically connected together. For more examples, check out the VexRiscv.
SpinalHDL Pipeline
SpinalHDL pipeline is an experimental feature, first intodoced in SpinalHDL 1.7.1, originally built for NaxRiscv. Compared to the Pipeline in VexRiscv, SpinalHDL pipeline introduces the concept of ConnectionLogic, allowing users to define how the stages is connected. It also adds more implicit functions and classes, make it more convient to use. And removes the Plugin concept and focused on building pipeline only. Users should build their own Plugin style framework if they want to, as what NaxRiscv did.
ConnectionLogic
SpinalHDL Pipeline introduces ConnectionLogic to define how the stages is connected. The trait is like this:
// abstraction of ports need to be connected across stages
case class ConnectionPoint(valid: Bool, ready: Bool, payload: Seq[Data]) extends Nameable
trait ConnectionLogic extends Nameable with OverridedEqualsHashCode {
// how m is connected to s
def on(m: ConnectionPoint,
s: ConnectionPoint,
flush: Bool, flushNext: Bool, flushNextHit: Bool,
throwHead: Bool, throwHeadHit: Bool): Area
def latency: Int = ???
def tokenCapacity: Int = ???
def alwasContainsSlaveToken: Boolean = false
def withPayload: Boolean = true
}
It defines how the master stage is connected to the slave stage. The on method is called when the master stage is connected to the slave stage. SpinalHDL also gives us some default implementation:
Connection.DIRECT: connect last stageβs output to current stageβs input directly, without any buffer or registerConnection.M2S: connect last stageβs output to current stageβs input with a set of register, as what VexRiscv doesConnection.S2M: in fact, I cannot figure out what this is used for, but it is used in NaxRiscv
For common usage, Commection.M2S is enough. However, if you want to build a component like ring buffer used to connect frontend and backend of a OoO processor, you may derive a class from ConnectionLogic and implement your own on method.
Examples
With lots of implicit conversions, now we can build stages like common components. The following example shows how to add 2 to the input and output the result after 4 stages with new Pipeline api.
class SpinalHDLPipelineExample extends Module {
val input = in UInt(32 bits)
val output = out UInt(32 bits)
val pipeline = new Pipeline {
val RESULT = Stageable(UInt(32 bits))
val stageA = new Stage {
// no more insert method
RESULT := input + 1
}
for (i <- 1 until 4) {
// we need to specify the connection logic now
new Stage(connection = Connection.M2S()).setName(f"mid_$i")
}
val stageB = new Stage(connection = Connection.M2S()) {
// we can even override a key's value
// and no more output or input method
overloaded(RESULT) := RESULT + 1
// ...but need some tricks to get the output in the same stage
output := internals.outputOf(StageableKey(RESULT.asInstanceOf[Stageable[Data]], null)).asInstanceOf[UInt]
}
}
// build pipeline by hand
pipeline.build()
}
When you want to build the pipeline with Plugin style, you can visit what NaxRiscv did. Generally it follows the same two-step process as VexRiscv, but with more flexibility and abality to generate codes with multithreading.