Barcelona Abroad · Week 3 · Day 10 📊 PPA becomes the rubric

Timing, Numerical Architectures & PPA

CRAFT cycle · 2.5 hours · Tue 6/9 · Eve: Flamenco

HDL for Digital System Design · UCF ECE · Barcelona Summer 2026

CRAFT

Today at a Glance

Phase	Time	Activity
🌍 Contextualize	10 min	Metro precision · Fmax = product spec, not a number
⚠️ Reframe	15 min	Faster clock ≠ better · the critical path owns you
🛠 Assemble	70 min	Adder arch comparison · shift-and-add multiplier · fixed-point Q4.4
🛡 Fortify	45 min	Read `nextpnr` timing report · fill the PPA table · 🤖 AI multiplier critique
🔗 Transfer	10 min	Metro visit teaser · tomorrow UART (everything converges)

The week's habit: no design ships without a PPA row. By Thursday your project module gets one too.

▸ Phase 1 of 5 · ~10 min

🌍 Contextualize

A missed deadline is a safety stop

Timing Runs Barcelona

TMB Metro signalling — train detection and interlocks have hard real-time deadlines. Miss one and the line throws an emergency brake. (We'll see the control room next Monday.)
Semidynamics' RISC-V cores — every vector ALU has a published Fmax per process node. That number sells the chip.
HP large-format printer head — nozzle firing windows are nanoseconds wide. Tomorrow's UART is the gentle introduction.

From last week → this week

D8 gave you a PPA habit (width sweep). D10 gives it teeth: every PPA row now has a Fmax column. Working code that misses Fmax is a product that doesn't ship.

"My design works in simulation" is a sentence that ends a lot of internships. After today, you'll never trust a sim without a timing report.

▸ Phase 2 of 5 · ~15 min

⚠️ Reframe

Faster ≠ better. Critical path = boss.

⚠️ "Just Crank the Clock"

❌ Wrong Model

"If 25 MHz works, 100 MHz must be faster. Same code, just a different PLL."

✓ Right Model

Every clocked design has an Fmax = 1 / (longest combinational delay between two FFs). Cross that line and your FFs latch garbage. Setup violation = wrong answer, not a slow answer.

Fmax is set by the worst path, not the average. One slow path drags the whole design down.

The Critical Path Owns Your Fmax

Clock period must be ≥ the longest FF-to-FF delay. The slowest path is the one nextpnr reports — and the one you'd pipeline to go faster.

The PPA Triangle

The PPA trade-off triangle: Power, Performance, and Area sit at three corners — improving one generally costs one of the others; the designer chooses which corner the product favors

Performance — Fmax, throughput, cycles per op

Power — switching activity × capacitance × V²

Area — LUTs & FFs on FPGA · gates & µm² on ASIC

On iCE40 we measure performance with nextpnr Fmax, area with yosys stat, and treat power as toggle-rate × C (conceptual).

The trade you'll see today

Combinational a * b — 1 cycle, big LUT pile, low Fmax.

Sequential shift-and-add — 8 cycles, tiny area, high Fmax.

Pick the corner of the triangle your product cares about.

Optimizing all three simultaneously is impossible. Choosing the right corner is the designer's job.

What `a + b` Actually Builds

// One line of Verilog…
assign sum = a + b;

…lets the synthesizer choose the architecture: ripple-carry, carry-lookahead, prefix-tree.
On iCE40 Yosys picks ripple-carry — area-efficient, slow-ish Fmax — because the chip has fast carry chains.
On a Xilinx 7-series, the same line might map to a DSP slice.
The line is the spec. The hardware is the synth choice. You need both reports to know what you really built.

assign product = a * b; on iCE40 → quadratic LUT explosion. No DSP blocks. This is today's most expensive line of code.

▸ Phase 3 of 5 · ~70 min · You build

🛠 Assemble

Two adders · two multipliers · one Q4.4

Build Plan

Ex 1 · 30 min Adder architecture comparison — ripple-carry (reuse D2 full_adder) vs behavioral +, at 8-bit and 16-bit. Record LUTs / FFs / Fmax for all 4 variants.
Ex 2 · 30 min Shift-and-add multiplier — IDLE → COMPUTE → DONE FSM, accumulator + shift register. Then a behavioral *. Compare both.
Ex 3 · 20 min Fixed-point Q4.4 — multiply two Q4.4 values, extract integer bits, drive 7-seg. Confirm 2.5 × 3.0 → 7 on display.

Ex 4 (10 min): Paper exercise — read a real nextpnr timing report. Ex 5 stretch: PLL / CDC demo if you finish early.

Adder Compare — The Setup

// Variant A — manual ripple-carry (reuse D2 full_adder)
module rca8 (input [7:0] a, b, input cin,
             output [7:0] sum, output cout);
    wire [7:0] c;
    full_adder fa0 (a[0], b[0], cin,  sum[0], c[0]);
    full_adder fa1 (a[1], b[1], c[0], sum[1], c[1]);
    // … chain c[1]→fa2, c[2]→fa3, …
    assign cout = c[7];
endmodule

// Variant B — behavioral
module add8 (input [7:0] a, b, output [8:0] sum);
    assign sum = a + b;          // tool chooses
endmodule

Synthesize both at 8 and 16 bits. The interesting question isn't which is smaller — it's whether they're the same circuit.

Shift-and-Add Multiplier (Skeleton)

module shift_add_mult #(parameter W = 8)(
    input  wire           clk, rst, start,
    input  wire [W-1:0]   a, b,
    output reg  [2*W-1:0] product,
    output reg            done
);
    typedef enum logic [1:0] {IDLE, COMPUTE, DONE} state_t;
    state_t state;
    reg [W-1:0]   mplier;
    reg [2*W-1:0] mcand;
    reg [3:0]     bit_idx;

    always @(posedge clk) begin
        if (rst) begin state <= IDLE; done <= 0; end
        else case (state)
            IDLE:    if (start) begin
                        product <= 0; mplier <= b; mcand <= a;
                        bit_idx <= 0; state <= COMPUTE;
                    end
            COMPUTE: begin
                        if (mplier[0]) product <= product + mcand;
                        mplier <= mplier >> 1;
                        mcand  <= mcand  << 1;
                        bit_idx <= bit_idx + 1;
                        if (bit_idx == W-1) state <= DONE;
                    end
            DONE:    begin done <= 1; state <= IDLE; end
        endcase
    end
endmodule

8 cycles per multiply. Trade: latency up 8×, area down ~5–10×.

Fixed-Point: Where the Bits Go

// Q4.4 × Q4.4 → Q8.8
wire [7:0]  a_q44, b_q44;       // 4 int . 4 frac
wire [15:0] prod_q88 = a_q44 * b_q44;

// Integer part for the 7-seg = bits [11:8] of Q8.8
wire [3:0]  int_part = prod_q88[11:8];

✓ Sanity check

2.5 → 8'b0010_1000
3.0 → 8'b0011_0000
product = 16'b0000_0111_1000_0000 = 7.5
[11:8] = 4'b0111 = 7 ✓

⚠ The classic mistake

Reading prod_q88[7:0] as the integer part. That's the fractional bits — you'll see junk on the 7-seg and wonder why.

▸ Phase 4 of 5 · ~45 min · Fill the table · believe the report

🛡 Fortify

No design ships without a PPA row

Read a `nextpnr` Timing Report

yosys -p "synth_ice40 -top add8; write_json add8.json" add8.v
nextpnr-ice40 --hx1k --json add8.json --pcf go_board.pcf \
              --asc add8.asc 2>&1 | tail -30

Info: Max frequency for clock 'i_Clk_$glb_clk':  78.43 MHz (PASS at 25.00 MHz)
Info: Critical path report:
Info: curr total
Info:  0.4  0.4  Source rca8.fa3.sum_$_DFFE_PP_/Q
Info:  2.1  2.5  Net rca8.fa3.sum to rca8.fa4.a
Info:  0.7  3.2  Source rca8.fa4.A → cout chain
…

Fmax line is the headline — PASS / FAIL at your target.
Critical path tells you where the slowest signal lives. That's where you'd pipeline.

Today's PPA Table

Module	Config	LUTs	FFs	Fmax (MHz)	Latency
Adder	RCA, 8-bit	?	0	?	1 cyc
Adder	RCA, 16-bit	?	0	?	1 cyc
Adder	Behavioral, 8-bit	?	0	?	1 cyc
Adder	Behavioral, 16-bit	?	0	?	1 cyc
Mult	Behavioral, 8-bit	?	0	?	1 cyc
Mult	Shift-add, 8-bit	?	~24	?	8 cyc

Analysis prompt: in two sentences — when do you pick sequential? When combinational? Write it in your portfolio. This template is the one you'll use in the final report.

🤖 Check the Machine — Multiplier Prompt

Prompt: "Write an 8-bit Verilog multiplier optimized for the iCE40 with as few LUTs as possible."

Did the AI write assign product = a * b;? Quick, wrong: that's the LUT-maximum version on iCE40. Push back.
Did it write a sequential shift-and-add? Run yosys stat on both. Add the row to your PPA table.
Did it claim a DSP slice? iCE40 HX1K doesn't have any. AI sometimes hallucinates Xilinx primitives. Reject and request iCE40-specific code.
Save prompt + response + your correction to the AI portfolio. This is exactly the pattern Day 14 grades.

Hardware Verification

Adder: drive a from switches, b from a counter, sum to LEDs. Watch the carry ripple at slow clock — that's the critical path you measured.
Multiplier: press button → start. done LED pulses after 8 cycles. Result on 7-seg.
Fixed-point: 2.5 × 3.0 → 7 on 7-seg. Try 1.5 × 2.0 → 3. Try 3.5 × 3.5 → 12 (with overflow into bit 12 — note the silent wrap).

The honest check: a design that meets timing in sim can still fail on the board if you target 100 MHz with a 78 MHz Fmax. Always reconcile sim, synth, and pnr before claiming "done."

▸ Phase 5 of 5 · ~10 min · Toward UART · toward Metro

🔗 Transfer

Tomorrow your tools converge on one interface

Where This Goes

Tomorrow (D11): UART TX. You'll use every piece from Weeks 1–3 — FSM (D7), counter (D4), shift register (D5), timing math (today). It's the synthesis exam, framed as a lab.

Next Monday: Barcelona Metro Control Room visit. The trains run on hard-real-time digital control. You'll recognize the constraint: Fmax for the city.

Tonight

📺 D11 video — UART protocol + baud math (~50 min)
📝 D11 quiz
📐 Calculate CLKS_PER_BIT for 9600 / 19200 / 115200 baud at 25 MHz

Reflection prompt: in one sentence — name a real product where you'd pick the shift-and-add multiplier over *. Bring an answer tomorrow.

🔗 End of Day 10 · Hasta mañana

Working code is half the job.
Timing closure is the other half.

Every design you ship this week gets a Fmax / LUT / FF row.
Tomorrow: UART, where the math you just did decides whether your bytes arrive intact.

CRAFT

Timing, Numerical Architectures & PPA

Today at a Glance

🌍 Contextualize

Timing Runs Barcelona

From last week → this week

⚠️ Reframe

⚠️ "Just Crank the Clock"

❌ Wrong Model

✓ Right Model

The Critical Path Owns Your Fmax

The PPA Triangle

The trade you'll see today

What a + b Actually Builds

🛠 Assemble

Build Plan

Adder Compare — The Setup

Shift-and-Add Multiplier (Skeleton)

Fixed-Point: Where the Bits Go

✓ Sanity check

⚠ The classic mistake

🛡 Fortify

Read a nextpnr Timing Report

Today's PPA Table

🤖 Check the Machine — Multiplier Prompt

Hardware Verification

🔗 Transfer

Where This Goes

Tonight

Working code is half the job.Timing closure is the other half.

What `a + b` Actually Builds

Read a `nextpnr` Timing Report

Working code is half the job.
Timing closure is the other half.