Barcelona Abroad · Week 3 · Day 9  📊 64 Kbits free if you ask right

Memory: RAM, ROM & Block RAM

CRAFT cycle · 2.5 hours · Mon 6/8 · Week 3 open

HDL for Digital System Design · UCF ECE · Barcelona Summer 2026

CRAFT

Today at a Glance

PhaseTimeActivity
🌍 Contextualize10 minMemory everywhere · the iCE40's 64 Kbit secret
⚠️ Reframe15 minMemory ≠ array · the EBR inference pattern
🛠 Assemble70 minROM sequencer · inferred BRAM · RAW testbench · dual-port stretch
🛡 Fortify45 minConfirm SB_RAM40_4K · read-after-write coverage · 🤖 AI BRAM critique
🔗 Transfer10 minRe-ground in Weeks 1–2 · → D10 timing (tomorrow)
The pattern-matching habit: code that looks like memory often becomes LUTs. Today you learn the coding pattern that produces real Block RAM.

▸ Phase 1 of 5  ·  ~10 min

🌍 Contextualize

Storage is in everything you've touched in Barcelona

Memory Runs the City

  • BCN airport departure board — a frame buffer redrawn every second
  • Metro line-map display on each train — a small ROM of station names
  • T-Casual card readers — a per-card ride history in a tiny RAM
  • Sagrada Família visitor counters — a saturating count, persisted

Today's iCE40 has 16 Block RAMs × 4 Kbits = 64 Kbits of fast, dedicated memory. Free — if your code asks for it correctly.

The penalty for asking wrong

Same 256×8 memory implemented as LUTs: ~2 048 SB_LUT4s. As EBR: 1 Block RAM. That's the difference between fitting on the chip and not.

The on-FPGA memory landscape: distributed LUT RAM (small, fast, expensive in LUTs) vs dedicated Block RAM / EBR (larger, synchronous, free if inferred) vs ROM initialized from a hex file

▸ Phase 2 of 5  ·  ~15 min

⚠️ Reframe

Memory has physics. Code reflects it.

⚠️ Memory Is Not "Just an Array"

❌ Wrong Model

"reg [7:0] mem [0:255]; is just an array. I'll read and write it however I want."

✓ Right Model

Real memories have physical constraints: synchronous vs async read, port count, init mechanism. Coding patterns are recipes the synthesizer recognizes — match the recipe, get free EBR; miss it, burn LUTs.

Synchronous read + registered address ⇒ Yosys infers SB_RAM40_4K. Combinational read ⇒ LUT RAM (or worse).

Synchronous vs. Asynchronous Read

Asynchronous read returns data combinationally the same cycle the address changes (maps to LUTs); synchronous read registers the output, returning data one clock later — the pattern Yosys maps to Block RAM
The single decision that determines whether you get free EBR or burn 2 048 LUTs: is dout updated inside @(posedge clk)? This is the pattern you match in Ex 2.

The EBR Inference Pattern

module bram #(
    parameter ADDR_WIDTH = 8,
    parameter DATA_WIDTH = 8
)(
    input  wire                       clk,
    input  wire                       we,
    input  wire [ADDR_WIDTH-1:0]      addr,
    input  wire [DATA_WIDTH-1:0]      din,
    output reg  [DATA_WIDTH-1:0]      dout      // ← reg, registered
);
    reg [DATA_WIDTH-1:0] mem [0:(1<<ADDR_WIDTH)-1];

    always @(posedge clk) begin
        if (we) mem[addr] <= din;
        dout <= mem[addr];                       // ← READ inside @(posedge clk)
    end
endmodule
  • dout is reg · the read is sequential · Yosys can map this to SB_RAM40_4K
  • One always block for both ports — the standard "no read-during-write" pattern

⚠️ The Combinational-Read Trap

// ❌ This LOOKS efficient. It is not.
reg [7:0] mem [0:255];
assign dout = mem[addr];           // combinational read

always @(posedge clk)
    if (we) mem[addr] <= din;
// ✓ Synchronous read — Yosys can map to EBR
always @(posedge clk) begin
    if (we) mem[addr] <= din;
    dout <= mem[addr];
end
Combinational read turns a 64 Kbit FPGA into a 4 Kbit one. Always check yosys stat after writing memory code.

▸ Phase 3 of 5  ·  ~70 min  ·  You build

🛠 Assemble

ROM · RAM · TB · stretch

Build Plan

  1. Ex 1 · 25 min  ROM-driven pattern sequencer — write a .hex, load with $readmemh, step a counter through addresses, drive LEDs & 7-seg.
  2. Ex 2 · 25 min  Inferred Block RAM — 256×8 BRAM. Synthesize and confirm SB_RAM40_4K appears in yosys stat.
  3. Ex 3 · 20 min  RAM testbench — read-after-write, overwrite, read from unwritten address.
Stretch (Ex 4): Dual-port RAM — independent read + write addresses, both synchronous. Confirm Yosys still infers EBR. Why does this matter for Week 3 UART buffering?

ROM From a Hex File

module rom_seq #(
    parameter INIT_FILE = "patterns.hex"
)(
    input  wire       clk,
    input  wire [3:0] addr,
    output reg  [7:0] data
);
    reg [7:0] mem [0:15];

    initial $readmemh(INIT_FILE, mem);     // synthesizable in Yosys/iCE40

    always @(posedge clk) data <= mem[addr];   // registered read
endmodule

patterns.hex (one byte per line):

01
03
07
0F
1F
3F
7F
FF
...

Drive addr from a slow counter — LEDs trace the pattern at human speed.

RAM TB — Read-After-Write

// Self-checking pattern: write N, read back, verify
task automatic check (input [7:0] addr_i, input [7:0] data_i);
    begin
        @(posedge clk); we <= 1; addr <= addr_i; din <= data_i;
        @(posedge clk); we <= 0;                  // 1 cycle latency
        @(posedge clk);                            // capture
        if (dout !== data_i) begin
            $display("FAIL: wrote %02h at %02h, read %02h", data_i, addr_i, dout);
            fails = fails + 1;
        end
        tests = tests + 1;
    end
endtask

initial begin
    fails = 0; tests = 0;
    check(8'h00, 8'hA5);
    check(8'h01, 8'h5A);
    check(8'hFF, 8'hC3);
    check(8'h00, 8'h12);   // overwrite addr 0
    if (fails == 0) $display("PASS: %0d tests", tests);
    else            $display("FAIL: %0d/%0d", fails, tests);
    $finish;
end

Note the 2-cycle latency — write on edge 1, dout valid on edge 3. This is what "synchronous read" means.

▸ Phase 4 of 5  ·  ~45 min  ·  Confirm EBR · sweep · AI check

🛡 Fortify

Trust the synth report, not the source

Confirm Block RAM Inference

yosys -p "synth_ice40 -top bram; stat" bram.v 2>&1 \
    | grep -E "SB_RAM|SB_LUT4|SB_DFF"

✓ Wanted output

SB_RAM40_4K  1
SB_LUT4      ~30
SB_DFF       ~8

❌ Inference miss

SB_RAM40_4K  0
SB_LUT4      2048
SB_DFF       8
If you see the bad output: 99% of the time the read is combinational, or dout is declared wire instead of reg. Fix and re-synth.

🤖 Check the Machine — BRAM Prompt

Prompt: "Write a dual-port RAM in Verilog for the Lattice iCE40 that Yosys will infer as Block RAM."

  • Did the AI declare both dout outputs as reg? Many fall back to assign dout = mem[addr]; (combinational) — kills EBR inference.
  • Does it use one always @(posedge clk) per port? The iCE40 EBR has two independently clocked ports.
  • Does it instantiate SB_RAM40_4K directly? If so — politely refuse. We want inference, not hard instantiation. Inferred code is portable; the primitive isn't.
  • Run yosys stat on the AI's code and your own. Compare. Save to portfolio.

Hardware Verification

  • ROM sequencer: patterns from patterns.hex appear on LEDs / 7-seg at the expected cadence. Change the hex file, re-synth, see new pattern.
  • BRAM: write a value via buttons, change address, write another, read back the first — both values survive. Persistence is the point.
  • Wrong inference is invisible from the LED: the design still works with LUT RAM — it just costs 20× more area. yosys stat is the only honest check.
Add to your project notes: every memory in your Week 4 project will be checked for EBR inference at code review.

▸ Phase 5 of 5  ·  ~10 min  ·  Week 3 opens

🔗 Transfer

Where we left off, and where memory takes you

Where We Left Off — Weeks 1–2

Back from the catch-up day, free Friday, and weekend — here's the foundation memory builds on:

W1 Hardware thinking · data types · combinational logic · clocked logic & RTL.

D5+D6 Counters · shift registers · debouncing · self-checking + 🤖 AI testbenches. Semidynamics.

D7 FSMs · 3-block pattern · Moore & Mealy. HP Customer Center.

D8 Hierarchy · parameters · generate · first PPA width sweep. Cooking workshop.

D9 ROM, RAM & Block RAM inference. You can now describe any synchronous digital subsystem.

Sequential logic + state + memory = every functional unit of a real chip.

Tomorrow → D10 Timing & Numerical Architectures

Tue 6/9 brings setup, hold, Fmax, and PPA — the constraints that turn working code into shippable silicon. Plus numerical architectures: adders, shift-add multipliers, fixed-point.

Bring your D8 PPA table — we'll add a timing column. Eve: Flamenco.

Tonight

  • 📺 D10 video (~55 min — denser than usual)
  • 📝 D10 pre-class quiz
  • 🛠 Lock in your project selection — due tomorrow (Tue 6/9)
Reflection prompt: in one sentence — when does a piece of Verilog "look like memory" but synthesize as LUTs? Bring an answer tomorrow.

🔗 End of Day 9 · Week 3 is underway

Counters. States. Memory. Done.

You can now design any synchronous subsystem — and verify it both manually and with AI.
This week: timing, then UART — your first communication protocol.

CRAFT