RISC-V is a Base Instruction Set Plus Extensions

This is a brief, informal introduction to the RISC-V Instruction Set Architecture. For the official description, you'll want to see the RISC-V Instruction Set Manual [1]. This document is intended to give you just enough context to know where to look for definitive answers.

After reading this document you should know:

"RISC-V" is a series of specifications describing an Instruction Set Architecture.
The specifications are provided free-of-charge and there is no fee to implement them.
There's at least one open-source core generator that takes specifications and produces Verilog.
The specifications currently describe 32-bit and 64-bit variants.
The instruction set is divided into a base "Integer" specification and a number of extensions.
Extensions exist for Floating-Point, Compressed Instructions, Atomic Operations, Hardware Multiplication (and more.)
RISC-V is supported by GCC & LLVM and an increasing number of commercial tools.

A Bit of History and Business Perspective

Let's start with the most basic question, "What is RISC-V?"

RISC-V is a family of specifications describing an Instruction Set Architecture (ISA). While there are several chips which implement the RISC-V instruction set, the chips themselves are not "RISC-V." They may implement the RISC-V specifications, but they themselves are not "RISC-V." This is a similar to the ARM™ and MIPS™ processor architectures; the names "ARM" and "MIPS" refer to the description of the architecture and not to any one particular implementation. Like ARM and MIPS (and POWER and SPARC), there are several different implementations from multiple companies.

Unlike ARM, MIPS and POWER, there's no fee for using the RISC-V specifications to create (and sell) your own implementation. If you've heard one thing about RISC-V, it's probably that it's an "Open Source Instruction Set Architecture." This doesn't mean RISC-V chips are cost-free; it just means that you're free to download the free RISC-V specification(s) and use them. No one's going to come around trying to collect a fee for an architecture license.

But building chips is expensive. Most chips are designed using customized tools that took hundreds (if not thousands) of man-months to develop. For companies that design chips, these tools are the corporate "crown jewels." This is the software that makes it easy to rapidly iterate on new designs when adding features or fixing bugs so it's no surprise companies guard this software jealously.

But the RISC-V team went the other way. Instead of making their design software proprietary, they open sourced it. The "Rocket Chip Generator" [6] software takes a specification for a RISC-V processor and outputs Verilog describing a working RISC-V core. System developers can then take that Verilog and use it for simulation or synthesis.

And remember, the specifications and many tools are open-source. Google took the Rocket Chip Generator, forked it, added some enhancements and published it back as the BottleRocket RV32IMC Core [7]. And a slightly different team from UC Berkeley published tooling to generate BOOM : The Berkeley Out-of-Order Machine [8].

There's still a lot expense associated with building chips. Even with the Instruction Set Architecture and the Chip Generator being free downloads, the cost of designing and manufacturing your own chip isn't insubstantial. But with an open specification and at least one open-sourced design package, it's relatively cheap to evaluate the RISC-V architecture and extend the existing software to better suit your needs.

And that's part of why people are talking about RISC-V. Apart from the technology benefits, the licensing model is pretty compelling from a business perspective.

So when people talk about RISC-V, they're probably talking about the freely available, license-free Instruction Set Architecture or the open-sourced tools (like the Rocket Chip Generator) that amplify an engineering team's efforts. There's also the RISC-V Foundation, which is a non-profit spun up to officially own the open-sourced design documents and coordinate work on new specifications.

And from a business perspective, this is all interesting because hardware engineering teams can inexpensively extend the capabilities of the base-level RISC-V cores. Larger companies will find their design costs go down and startups will find their ideas are finally possible.

What's In the Base Spec?

The specifications themselves describe an abstract computer architecture, complete with a description of registers, I/O and (more than one) instruction to op-code mappings. If you wanted to build a RISC-V processor or write software for one, your first stop should be the Unprivileged Instruction Set Architecture manual [1].

The RISC-V specification defines a processor architecture with 31 general-purpose registers and a small collection of Control and Status Registers (CSRs). It defines a "base level" instruction set for processors with 32, 64 and 128 bit wide registers (though the 128 bit specification is not complete.) The base-level instruction set is fairly bare-bones, so a series of standard extensions have been specified. The idea here is not every chip designer will want to use every extension, but if they do, they should use the specifications defined in the spec. An additional instruction set is defined called RV32E (Embedded) that uses only 15 general-purpose registers.

The base instruction sets are referred to as RV32I, RV64I and RV128I and each extension is identified with a letter: I for Integer, F for Single-Precision and so on. To date, the most popular extensions have been:

I - Integer (Base)
M - Hardware Multiply
A - Atomic Operations
C - Compressed Op Codes
F - Single-Precision Floating Point
D - Double-Precision Floating Point
V - Vector

To describe a core, it's common practice to add the extensions you support to the end of the base level instruction set. For instance, a 32 bit core with 31 registers that supports hardware multiply and compressed instructions would be "RV32IMC." The Integer, Multiply, Atomic, Single and Double Precision Floating Point extensions were considered so common, they have a meta-abbreviation: "G" for General-Purpose. So a RV64GC system is the same as a RV64IMAFDC system.

The Base Integer instruction set itself defines a standard (but small) set of instructions: load from / store to memory and CSRs, add, branch / jump, shift bits, logical operations, etc. Readers familiar with other RISC architectures will find nothing unusual except maybe how small the base level instruction set is.

When reading the specifications, it's worth noting there are at least two naming conventions for registers. The first simply names registers x0 through x31. Chapter 25 of the current spec describes the ABI naming convention. The ABI naming convention is useful because it gives semantic meaning to registers 0 through 15 which are important for compressed instructions and the RV32E (Embedded) ISA.

RISC-V instructions are multiples of 16-bits, but both the 32 and 64 bit base instruction sets define instructions in terms of 32-bits. The "C" (Compressed) extension maps "commonly used" 32-bit instructions to 16-bit aliases. The specification explains this was done to make simpler 32-bit-only implementations possible, but providing a standard way to increase code density through 16-bit compressed instructions.

The RISC-V foundation sponsors projects to add support to popular open source projects including GCC, LLVM, QEMU and Linux. At the time this document was written, RISC-V support had been "up-streamed" to each of these projects.

The Privileged Architecture

While RISC-V isn't attempting to be all things to all users, it is trying to be many things to many users. The two main types of systems RISC-V supports are small, embedded, single-application devices and large, beefy, high-performance processors that run "real" operating systems like Linux or BSD.

Modern, secure operating systems essentially require virtual memory support and the separation between User and Supervisor modes. User mode runs application code in a sandbox that doesn't give the app unfettered access to the hardware or other applications. Supervisor mode, which runs the operating system kernel and probably device drivers, does have such access. have that access. User mode runs applications. Supervisor mode runs operating systems. If you've never heard of this before, the Wikipedia has a decent page about protection rings and privilege modes [9].

Volume II of the RISC-V Instruction Set Manual [2] describes different operating modes and the Control and Status Registers (CSRs) that support features like virtual memory.

The spec defines three modes: Machine, Supervisor and User. We mentioned Supervisor and User above. Machine mode is for embedded devices with no mode-based security, or Hypervisors. Hypervisors are like "super-operating-systems" that let multiple instances of operating systems on one physical CPU. Hypervisors are extremely important to the modern tech ecosystem. Most cloud based applications run in a hypervisor-enabled virtual machine in someone's data center, somewhere. If you're unfamiliar with hypervisors, the Wikipedia has a decent article about the concept [10].

With three defined modes, you may think there are seven supported combinations of modes. But modes build on each other. You'll never see a RISC-V CPU without the Machine mode and if a RISC-V CPU supports Supervisor mode, it will also support User mode. The spec defines three combinations of modes and gives hints as to what they would be used for:

M - Used for simple embedded systems.
M, U - Used for secure embedded systems.
M, S, U - For systems running modern Unix-like operating systems

Everything Else

But there's way more to RISC-V than a couple of spec documents. Even the list of extensions mentioned above isn't complete. There's an extension for Vector operations which should accelerate complex math operations used in cryptography and machine learning significantly [3].

A pair of specifications: the debug spec [4] and the trace spec [5] define standard interfaces debug tools can use to understand what's happening in the CPU.

For a more complete picture of standardized extensions, check out the Technical Working Groups [11] page at the RISC-V Foundation Web Page [12].