Skip to content

Building a custom SoC to run DOOM

This page will describe my experience building a custom RISC-V CPU for the sole purpose of running DOOM on a small FPGA (the Gowin Tang Nano 20K).
I will be creating a custom instruction set (RV32doom?) that augments the base RV32IM with instructions useful for running DOOM at full resolution and framerate on this FPGA. I will also be implementing all of the Uncore components necessary for the CPU to function.

View Project on GitHub

Overview

[TODO (it isn't finished yet)]

Implementation details

Full in depth writeups will be added here as I complete components:


The Development process:

To start this project, the naive first step would be to compile DOOM for RV32IM, profile it, and see what other instructions I should include and which I should invent. However, since I'm short on resources and I actually want ot make sure this will work, I'm instead going to create a minimum working example to prove that I can boot DOOM on this FPGA at all before I start LARPing as a computer architect.

It turns out that building real hardware requires a lot of other logic just to move/store data and interface with the real world. I'll be referring to all of this "stuff" as the Uncore.

Building the display driver

Perhaps the most interesting feature of the Tang Nano 20K, and the one I am most unfamiliar with, is the HDMI port. Driving signals over HDMI (really DVI-D) isn't too complicated. The only potential issue is the size of the frame buffer. I don't want to put any more load on the DRAM than necessary, but I have limited on chip memory available. Fortunately, "full resolution" for DOOM was only 320x200 in 8-bit color. This means I can store one full frame and a copy of the color palette on chip, but not much more than that.
The frame buffer and color palette will be connected to the system through the memory bus (AXI interconnect) with the frames being copied from DRAM via the CPU or DMA.
The display driver uses the data in the frame buffer and palette to calculate pixel values and upscale the frame to 640x480 before passing the pixels through a TDMS encoder, serializer and out to the display.

Full Details

Building the memory controller

The Tang Nano 20K includes 8MB of SDRAM in the FPGA package. A basic memory controller acts as the interface between the memory bus (AXI) and the SDRAM. The SDRAM is configured with burst length of 8 for both read and writes. However, variable lengths bursts (up to 8) and single byte accesses are supported via the AXI interface.

Full Details

Planning Next Steps

Although the memory controller isn't completely functional yet, I want to get an idea if this will all fit on the FPGA. I added a small RISC-V core that I will use as the base of the rest of my design and ran the build flow to get an idea of the resource utlization:

Info: Device utilisation:
Info:                    VCC:       1/      1   100%
Info:                    IOB:      35/    384     9%
Info:                   LUT4:    2290/  20736    11%
Info:               IOLOGICI:       0/    384     0%
Info:               IOLOGICO:       4/    384     1%
Info:              MUX2_LUT5:     673/  10368     6%
Info:              MUX2_LUT6:     263/   5184     5%
Info:              MUX2_LUT7:     101/   2592     3%
Info:              MUX2_LUT8:      29/   2592     1%
Info:                    ALU:     336/  15552     2%
Info:                    GND:       1/      1   100%
Info:                    DFF:     284/  15552     1%
Info:              RAM16SDP4:       4/    648     0%
Info:                  BSRAM:      29/     46    63%
Info:                 ALU54D:       0/     24     0%
Info:        MULTADDALU18X18:       0/     24     0%
Info:           MULTALU18X18:       0/     24     0%
Info:           MULTALU36X18:       0/     24     0%
Info:              MULT36X36:       0/     12     0%
Info:              MULT18X18:       1/     48     2%
Info:                MULT9X9:       0/     96     0%
Info:                 PADD18:       0/     48     0%
Info:                  PADD9:       0/     96     0%
Info:                    GSR:       1/      1   100%
Info:                    OSC:       0/      1     0%
Info:                   rPLL:       2/      2   100%
Info:                   BUFG:       1/     24     4%
Info:                   DQCE:       0/     24     0%
Info:                    DCS:       0/      8     0%
Info:                  DHCEN:       0/     24     0%
Info:                 CLKDIV:       2/      8    25%
Info:                CLKDIV2:       0/     16     0%

This seems deceptively low (especially the single multiplier utilized). It's possible some of the logic is getting optimized out despite my best efforts. Regardless, I think I have enough resources to proceed.

Implementing The Base Core

Initially, I plan to base the core on the two-cycle RV32E CPU I previously built: tiny-riscv

I will undo some of the area optimizations like moving teh adders and shifters to more FPGA-native structures. I will also add multiply support

ISA: RV32I + Zmmul (multiply only) initially

TODOs:

  • Finish SDRAM Controller
  • Port CPU to FPGA
  • Decide how to build L1 caches
  • Connect caches to memory bus
  • Connect framebuffer to memory bus
  • Write SD card interface
  • Connect SD card to memory bus
  • Write Bootloader
  • Port DOOM
  • Attempt to run
  • Fix bugs
  • Optimize