Topic 8 · Hierarchy & Reuse

Generate Blocks

Video 3 of 4 · ~9 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

HierarchyParametersGenerateReuse

🌍 Where This Lives

Where it shows up

Every modern processor contains thousands of nearly-identical adders, registers, lanes, and ports. Nobody types those out. A 64-core chip and a 4-core chip are usually the same source code with one number changed. A 1024-bit memory bus is written once. A 32-lane SIMD unit is written once. The factory cuts the same stencil at every scale.

When it goes wrong

The Pentium FDIV bug was wrong entries in a lookup table that had been transcribed by hand instead of generated — five misses out of 1,066. Multiple GPU vendors have shipped silicon where one of N “identical” lanes was subtly miswired because someone copy-pasted N times and missed a substitution. When you scale a design by hand, the cost isn't the typing — it's the lane that almost matches the others, and the bug that takes six months in the field to find.

⚠️ generate for Is Not a Software Loop

❌ Wrong Model

generate for (i=0; i<N; i++) is a for-loop. It executes N times during simulation, one iteration per clock.”

✓ Right Model

A generate for is not a runtime loop. It is compile-time replication. The synthesizer unrolls the loop at compile time, producing N physical copies of the hardware it contains. After synthesis, no loop exists. Only N parallel circuits running simultaneously.

The receipt: generate for i=0..7 with a full-adder inside = 8 physical full-adders, all active in parallel. Like having typed 8 instantiations. Yosys literally unrolls before synthesis.

👁️ I Do — Array of Debouncers

module button_array #(
    parameter N = 4         // # buttons
) (
    input  wire         i_clk, i_reset,
    input  wire [N-1:0] i_buttons,
    output wire [N-1:0] o_clean
);
    genvar gi;        // compile-time index
    generate
        for (gi=0; gi<N; gi=gi+1)
        begin : g_btn      // <-- named scope
            debounce #(.CLKS_STABLE(500_000)) u_deb (
                .i_clk   (i_clk),
                .i_reset (i_reset),
                .i_noisy (i_buttons[gi]),
                .o_clean (o_clean[gi])
            );
        end
    endgenerate
endmodule
RTL block diagram of button_array shown at N=4. A gold dashed generate-scope box labeled g_btn wraps four debounce instances g_btn[0..3].u_deb. The i_buttons[N-1:0] bus (orange) fans out one bit per instance; outputs collect back into o_clean[N-1:0] (green). i_clk (blue) and i_reset (red) form shared buses across all instances. Source file size is constant — silicon scales with N.
Three idioms: (1) genvar gi = compile-time loop index. (2) begin : g_btn names the generate scope — shows up in waveforms as dut.g_btn[0].u_deb. (3) Inner code reads as one instantiation; elaborator produces N physical copies.

🤝 We Do — generate if for Conditional Hardware

module configurable_pipe #(
    parameter USE_SYNCHRONIZER = 1
) (
    input  wire i_clk, i_in,
    output wire o_out
);
    generate
    if (USE_SYNCHRONIZER) begin : g_sync
        reg r_meta, r_stable;
        always @(posedge i_clk) begin
            r_meta   <= i_in;
            r_stable <= r_meta;
        end
        assign o_out = r_stable;
    end else begin : g_nosync
        assign o_out = i_in;   // direct
    end
    endgenerate
endmodule
RTL diagram showing two compile-time configurations side-by-side. Left (green) USE_SYNCHRONIZER=1: i_in (orange) feeds r_meta FF (blue) feeding r_stable FF (blue), Q drives o_out (green). Both FFs share i_clk (blue dashed). Right (red) USE_SYNCHRONIZER=0: i_in (orange) routes directly through to o_out with no flip-flops; ghost dashed boxes indicate zero silicon. Only one branch elaborates.
Together: generate if includes hardware conditionally at compile time. USE_SYNCHRONIZER=0 means the tool builds no synchronizer — not even a wire-through-an-FF. Internal signals skip the synchronizer; external signals get one.

🧪 You Do — Generate an N-Bit Ripple Adder

Build an N-bit ripple-carry adder from N full-adder instances. Each full_adder has inputs a, b, cin and outputs sum, cout. The carry chains from bit i to bit i+1.

RTL block diagram of an N-bit ripple-carry adder. Three full_adder instances g_fa[0].u_fa, g_fa[1].u_fa, and g_fa[N-1].u_fa (with ellipsis between) sit inside a gold dashed generate-scope box. The carry chain (pink) flows left-to-right: cin → carry[1] → carry[2] → … → carry[N-1] → cout. Each cell takes a[i] (purple) and b[i] (brown) inputs from above and emits sum[i] (green) upward.
Sketch:
wire [N:0] carry;
assign carry[0] = cin;
genvar gi;
generate for (gi = 0; gi < N; gi = gi + 1) begin : g_fa
    full_adder u_fa (
        .a(a[gi]), .b(b[gi]), .cin(carry[gi]),
        .sum(sum[gi]), .cout(carry[gi+1])
    );
end endgenerate
assign cout = carry[N];
The carry wire of width N+1 chains the full-adders — element gi+1 reads what element gi wrote. Classic bit-serial pattern.
▶ LIVE DEMO

Scaling with Generate: 4 → 16 Debouncers

~4 minutes

▸ COMMANDS

cd lecture_examples/week2_day08/d08_s3_ex3/
make stat N=4    # 4 debouncers
make stat N=8
make stat N=16   # 16 — exceeds iCE40 HX1K!
make sim N=4     # testbench scales too

▸ EXPECTED STDOUT

N=4:  160 cells (~12% HX1K)
N=8:  320 cells (~25%)
N=16: 640 cells (~50%)
# scaling is exactly linear
# — no fixed overhead

▸ KEY OBSERVATION

One Verilog file, one parameter change, three different-sized designs. The Verilog doesn't grow — only the gate count does. This is what makes generate a force multiplier.

🔧 What Did the Tool Build?

$ yosys -p "read_verilog ... button_array.v; chparam -set N 4 button_array; \
           synth_ice40; stat"

=== button_array ===   # N=4
   Number of cells:  160   (= 4 × ~40-cell debouncer)
   Contains sub-instances:
     g_btn[0].u_deb   (debounce)
     g_btn[1].u_deb   (debounce)
     g_btn[2].u_deb   (debounce)
     g_btn[3].u_deb   (debounce)
What to notice: The named generate scope (g_btn) appears in hierarchy output and in GTKWave signal paths. tb.dut.g_btn[2].u_deb.r_count is the counter of the 3rd debouncer. Without that naming, you couldn't navigate generated hardware.
Checkpoint: Generate + parameter is the “infinite IP” pattern. One source file serves 1, 10, 1000 instances — never rewrite, just re-instantiate.

🤖 Check the Machine

Ask AI: “Write a parameterized N-bit parity generator using generate-for blocks. Include a generate-if to optionally pipeline the result.”

TASK

AI combines generate-for + generate-if.

BEFORE

Predict: XOR tree via generate-for, optional pipeline stage via generate-if.

AFTER

Strong AI uses genvar, named scopes. Weak AI forgets the named scope.

TAKEAWAY

Require named scopes in your prompt. AI often forgets without explicit ask.

Key Takeaways

generate for = compile-time hardware replication, not a runtime loop.

generate if = conditional hardware; unused branch produces zero silicon.

 Always use genvar and named scopes (begin : g_foo).

 Combined with parameters, generate scales a design by changing one number.

If you're about to copy-paste a module N times, use generate instead.

🔗 Transfer

Design for Reuse

Video 4 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You have all the tools: hierarchy, parameters, generate. Video 4 ties them together into a design philosophy: how to build modules someone else (or future-you) can drop into a project without reading the source. This is how your foundational-topic modules become a permanent toolbox — and it’s the skill the next theme relies on.