Barcelona Abroad · Week 2 · Day 8  📊 PPA thread reopens

Hierarchy, Parameters & generate

CRAFT cycle · 2.5 hours · Wed 6/3 · Eve: Cooking workshop

HDL for Digital System Design · UCF ECE · Barcelona Summer 2026

CRAFT

Today at a Glance

PhaseTimeActivity
🌍 Contextualize10 minIP libraries · Semidynamics' parameterized RISC-V cores
⚠️ Reframe15 mingenerate ≠ runtime loop · elaboration-time hardware
🛠 Assemble70 minParam N-bit counter · AI-assisted param TB · generate driver · hierarchical top
🛡 Fortify45 minSweep WIDTH=4/8/16/32 · yosys stat PPA table · hardware verify
🔗 Transfer10 minCooking workshop metaphor · D9 preview
The PPA habit: every parameterized module gets a width sweep + yosys stat table. This is what runs to Day 14.

▸ Phase 1 of 5  ·  ~10 min

🌍 Contextualize

Companies ship parameter families, not single modules

Why Companies Parameterize

  • Semidynamics doesn't redesign their RISC-V core per customer — they ship one core with a configuration vector (vector width, cache size, port count).
  • HP printers use the same firmware FSM across product tiers — features turn on with generate if.
  • The Metro signal system uses one debouncer module — instantiated thousands of times with different thresholds per sensor.

The economic point

Parameters turn a module into a product line. One verified DUT covers a thousand SKUs.

Today you build your first parameter family. By Week 4 your whole project is parameterized.

▸ Phase 2 of 5  ·  ~15 min

⚠️ Reframe

When the loop runs matters more than what it does

⚠️ generate Is Not a Runtime Loop

❌ Wrong Model

"generate for (i=0; i<N; i++) is like a software for — it processes 4 things one at a time, fast."

✓ Right Model

generate for unrolls at synthesis time. N=4 creates four independent physical instances that all run in parallel on the same clock edge. The loop doesn't exist in the netlist.

Elaboration-time decisions cost zero gates. Runtime decisions cost muxes. Push as much as possible to elaboration.

generate Unrolls Into Hardware

A generate-for loop with N=4 unrolls at elaboration into four independent parallel hardware instances; the loop itself does not appear in the synthesized netlist
The loop is a code-generation instruction, not a runtime control flow. N copies of the body exist simultaneously in silicon.

Parameterized N-Bit Counter

module counter #(
    parameter WIDTH = 8
)(
    input  wire             clk, rst, en,
    output reg  [WIDTH-1:0] q,
    output wire             rollover
);
    assign rollover = (q == {WIDTH{1'b1}}) & en;

    always @(posedge clk) begin
        if (rst)     q <= 0;
        else if (en) q <= q + 1'b1;
    end
endmodule

// Instantiate three sizes from one module:
counter #(.WIDTH(4))  c4  (.clk(clk), .rst(rst), .en(en), .q(q4),  .rollover(r4));
counter #(.WIDTH(8))  c8  (.clk(clk), .rst(rst), .en(en), .q(q8),  .rollover(r8));
counter #(.WIDTH(16)) c16 (.clk(clk), .rst(rst), .en(en), .q(q16), .rollover(r16));

One module · three width-specific netlists. The synthesizer specializes each one.

generate for & generate if

// generate for: replicate hardware
genvar i;
generate
    for (i = 0; i < NUM_LEDS; i = i + 1) begin : g_blink
        blinker #(.PERIOD(1_000_000 * (i+1))) b
            (.clk(clk), .led(leds[i]));
    end
endgenerate

// generate if: include features conditionally
generate
    if (HAS_PARITY) begin : g_parity
        wire par = ^data;                 // XOR-reduce
        assign tx = {par, data};
    end else begin : g_no_parity
        assign tx = data;
    end
endgenerate
Named begin/end (: g_blink) keeps the hierarchy navigable in yosys show and GTKWave.

▸ Phase 3 of 5  ·  ~70 min  ·  You build · across widths

🛠 Assemble

One module · four widths · one top

Build Plan

  1. Ex 1 · 20 min  Parameterized counter — synth at WIDTH = 4, 8, 16, 32. Three instances in one top.
  2. Ex 2 · 🤖 25 min  AI-assisted param TB — prompt AI to write a TB that overrides WIDTH and tests rollover at 2^WIDTH - 1. Critique & fix.
  3. Ex 3 · 15 min  generate for LED driver — 4 blinkers, period scales with index.
  4. Ex 4 · 10 min  Hierarchical top — debouncer (D5) + counter (today) + 7-seg (D2) wired together.
Stretch (Ex 6): Parameterized LFSR — WIDTH determines the polynomial. Synthesize at 4, 8, 16. Predict the area scaling, then check.

🤖 AI-Assisted Param TB — The Prompt

Write a self-checking Icarus-compatible testbench for this counter module:

  module counter #(parameter WIDTH=8) (
    input clk, rst, en, output [WIDTH-1:0] q, output rollover);

Requirements:
  - Test WIDTH = 4, 8, and 16 in the SAME testbench using #(.WIDTH(N)) overrides
  - For each instance, verify rollover fires at q == (2**WIDTH - 1)
  - Verify reset clears q to 0 from any value
  - Verify en=0 freezes q
  - Use !== for comparisons, print PASS/FAIL summary, $finish at end
  - Target IEEE 1364-2005 Verilog (no SystemVerilog)
  • Does the AI use 2**WIDTH - 1 per instance, or hardcode 255? Hardcoding is the #1 failure mode.
  • Does it pass three independent DUTs, or one DUT three times? The whole point is separate netlists.
  • Save prompt + correction to the AI portfolio.

Hierarchical Top — Composition Pays Off

Module hierarchy tree: a top-level module instantiates sub-modules (debouncer, counter, seven-segment decoder), each a reusable block from earlier days
module lab_instrument (
    input  wire clk,
    input  wire btn_async,
    output wire [6:0] seg
);
    wire btn_clean, tick;
    wire [3:0] count;

    debouncer #(.THRESHOLD(250_000))      u_db   (.clk(clk), .in_async(btn_async),
                                                  .out_clean(btn_clean));
    counter   #(.WIDTH(4))                u_cnt  (.clk(clk), .rst(1'b0), .en(btn_clean),
                                                  .q(count), .rollover());
    hex_to_7seg                           u_disp (.hex(count), .seg(seg));
endmodule

Three modules. Three days of work. Composed in 10 lines. This is the payoff for hierarchy.

▸ Phase 4 of 5  ·  ~45 min  ·  Verify · measure PPA · harden

🛡 Fortify

Make the synthesizer show you the cost

Width Sweep — Capture the Table

for W in 4 8 16 32; do
    yosys -p "read_verilog -DWIDTH=$W counter.v; \
              synth_ice40 -top counter; stat" \
          2>&1 | grep -E "SB_LUT4|SB_DFF"
done

Fill in the table together on the projector:

WIDTHSB_LUT4SB_DFFPredicted
4??~4 LUTs, 4 FFs
8??~8 LUTs, 8 FFs
16??~16 LUTs, 16 FFs
32??~32 LUTs, 32 FFs
Is scaling linear? If yes, why exactly? If not, where does the overhead come from?

The PPA Habit Starts Here

P — Power

More flip-flops ⇒ more switching ⇒ more dynamic power. (We'll quantify Day 10.)

P — Performance

Wider counter ⇒ longer carry chain ⇒ lower Fmax. (Day 10 measures this.)

A — Area

Today's table is your first Area data point. Keep it for the project report.

The thread

D3 seeded it · D8 formalizes the habit · D10 adds timing · D14 uses it to defend project choices.

Hardware Verification

  • Hierarchical top: press the debounced button — 7-seg counts 0 → F → 0 cleanly. One press = one increment (the debouncer is doing its job).
  • generate for blinkers: all 4 LEDs blink at different rates from one module instantiation.
  • Wrong widths: deliberately wire the 4-bit counter to drive 7-seg expecting 8 bits — find the warning in the Yosys log. That's a hardware bug caught at synthesis.
Week 3's memory module is your next parameterized DUT — depth, width, init file all become parameters.

▸ Phase 5 of 5  ·  ~10 min

🔗 Transfer

Take it to the kitchen tonight

Tonight: Cooking Workshop = HDL Metaphor

Software / single cook

  • One recipe, one cook, one step at a time
  • Step 2 starts when step 1 finishes
  • This is your software for loop

Hardware / multiple cooks

  • Same recipe, four cooks, in parallel
  • Veg chopper, pan watcher, sauce stirrer, plater — all at once
  • This is your generate for
Watch the chef tonight — count how many tasks happen in parallel. That's the mental model for next week's RAM (which reads and writes in the same cycle).

Next: Catch-up Day → then Memory in Week 3

Thu 6/4 is a catch-up day — finish this week's labs and start scoping a project. Project selection is due Tue 6/9 (Week 3, D10 day). Fri 6/5 is free.

Week 3 opens with Memory (D9, Mon 6/8): ROM, RAM, and Block RAM inference. The iCE40's 16 EBR blocks are free, fast memory — but only if your code matches the inference pattern. You'll build a ROM-driven LED sequencer and a RAM with read-after-write verification.

Looking ahead

  • 📝 Project selection due Tue 6/9 (6-option list)
  • 📺 D9 Memory video (~45 min) — watch over the Fri–Sun weekend
  • 🍷 Enjoy the cooking workshop · Friday is free

🔗 End of Day 8 · A la cocina!

One module. Every width. Zero copy-paste.

You can now build a parameter family and read its PPA cost from the synthesizer.
After the catch-up day and weekend, Week 3 opens with where the data lives.

CRAFT