Barcelona Abroad · Week 3 · Day 10  📊 PPA becomes the rubric

Timing, Numerical Architectures & PPA

CRAFT cycle · 2.5 hours · Tue 6/9 · Eve: Flamenco

HDL for Digital System Design · UCF ECE · Barcelona Summer 2026

CRAFT

Today at a Glance

PhaseTimeActivity
🌍 Contextualize10 minMetro precision · Fmax = product spec, not a number
⚠️ Reframe15 minFaster clock ≠ better · the critical path owns you
🛠 Assemble70 minAdder arch comparison · shift-and-add multiplier · fixed-point Q4.4
🛡 Fortify45 minRead nextpnr timing report · fill the PPA table · 🤖 AI multiplier critique
🔗 Transfer10 minMetro visit teaser · tomorrow UART (everything converges)
The week's habit: no design ships without a PPA row. By Thursday your project module gets one too.

▸ Phase 1 of 5  ·  ~10 min

🌍 Contextualize

A missed deadline is a safety stop

Timing Runs Barcelona

  • TMB Metro signalling — train detection and interlocks have hard real-time deadlines. Miss one and the line throws an emergency brake. (We'll see the control room next Monday.)
  • Semidynamics' RISC-V cores — every vector ALU has a published Fmax per process node. That number sells the chip.
  • HP large-format printer head — nozzle firing windows are nanoseconds wide. Tomorrow's UART is the gentle introduction.

From last week → this week

D8 gave you a PPA habit (width sweep). D10 gives it teeth: every PPA row now has a Fmax column. Working code that misses Fmax is a product that doesn't ship.

"My design works in simulation" is a sentence that ends a lot of internships. After today, you'll never trust a sim without a timing report.

▸ Phase 2 of 5  ·  ~15 min

⚠️ Reframe

Faster ≠ better. Critical path = boss.

⚠️ "Just Crank the Clock"

❌ Wrong Model

"If 25 MHz works, 100 MHz must be faster. Same code, just a different PLL."

✓ Right Model

Every clocked design has an Fmax = 1 / (longest combinational delay between two FFs). Cross that line and your FFs latch garbage. Setup violation = wrong answer, not a slow answer.

Fmax is set by the worst path, not the average. One slow path drags the whole design down.

The Critical Path Owns Your Fmax

Combinational logic between two flip-flops forms a path with a propagation delay; the longest such path (the critical path) sets the minimum clock period and therefore the maximum frequency
Clock period must be ≥ the longest FF-to-FF delay. The slowest path is the one nextpnr reports — and the one you'd pipeline to go faster.

The PPA Triangle

The PPA trade-off triangle: Power, Performance, and Area sit at three corners — improving one generally costs one of the others; the designer chooses which corner the product favors

Performance — Fmax, throughput, cycles per op

Power — switching activity × capacitance × V²

Area — LUTs & FFs on FPGA · gates & µm² on ASIC

On iCE40 we measure performance with nextpnr Fmax, area with yosys stat, and treat power as toggle-rate × C (conceptual).

The trade you'll see today

Combinational a * b — 1 cycle, big LUT pile, low Fmax.

Sequential shift-and-add — 8 cycles, tiny area, high Fmax.

Pick the corner of the triangle your product cares about.

Optimizing all three simultaneously is impossible. Choosing the right corner is the designer's job.

What a + b Actually Builds

// One line of Verilog…
assign sum = a + b;
  • …lets the synthesizer choose the architecture: ripple-carry, carry-lookahead, prefix-tree.
  • On iCE40 Yosys picks ripple-carry — area-efficient, slow-ish Fmax — because the chip has fast carry chains.
  • On a Xilinx 7-series, the same line might map to a DSP slice.
  • The line is the spec. The hardware is the synth choice. You need both reports to know what you really built.
assign product = a * b; on iCE40 → quadratic LUT explosion. No DSP blocks. This is today's most expensive line of code.

▸ Phase 3 of 5  ·  ~70 min  ·  You build

🛠 Assemble

Two adders · two multipliers · one Q4.4

Build Plan

  1. Ex 1 · 30 min  Adder architecture comparison — ripple-carry (reuse D2 full_adder) vs behavioral +, at 8-bit and 16-bit. Record LUTs / FFs / Fmax for all 4 variants.
  2. Ex 2 · 30 min  Shift-and-add multiplier — IDLE → COMPUTE → DONE FSM, accumulator + shift register. Then a behavioral *. Compare both.
  3. Ex 3 · 20 min  Fixed-point Q4.4 — multiply two Q4.4 values, extract integer bits, drive 7-seg. Confirm 2.5 × 3.0 → 7 on display.
Ex 4 (10 min): Paper exercise — read a real nextpnr timing report. Ex 5 stretch: PLL / CDC demo if you finish early.

Adder Compare — The Setup

// Variant A — manual ripple-carry (reuse D2 full_adder)
module rca8 (input [7:0] a, b, input cin,
             output [7:0] sum, output cout);
    wire [7:0] c;
    full_adder fa0 (a[0], b[0], cin,  sum[0], c[0]);
    full_adder fa1 (a[1], b[1], c[0], sum[1], c[1]);
    // … chain c[1]→fa2, c[2]→fa3, …
    assign cout = c[7];
endmodule

// Variant B — behavioral
module add8 (input [7:0] a, b, output [8:0] sum);
    assign sum = a + b;          // tool chooses
endmodule

Synthesize both at 8 and 16 bits. The interesting question isn't which is smaller — it's whether they're the same circuit.

Shift-and-Add Multiplier (Skeleton)

module shift_add_mult #(parameter W = 8)(
    input  wire           clk, rst, start,
    input  wire [W-1:0]   a, b,
    output reg  [2*W-1:0] product,
    output reg            done
);
    typedef enum logic [1:0] {IDLE, COMPUTE, DONE} state_t;
    state_t state;
    reg [W-1:0]   mplier;
    reg [2*W-1:0] mcand;
    reg [3:0]     bit_idx;

    always @(posedge clk) begin
        if (rst) begin state <= IDLE; done <= 0; end
        else case (state)
            IDLE:    if (start) begin
                        product <= 0; mplier <= b; mcand <= a;
                        bit_idx <= 0; state <= COMPUTE;
                    end
            COMPUTE: begin
                        if (mplier[0]) product <= product + mcand;
                        mplier <= mplier >> 1;
                        mcand  <= mcand  << 1;
                        bit_idx <= bit_idx + 1;
                        if (bit_idx == W-1) state <= DONE;
                    end
            DONE:    begin done <= 1; state <= IDLE; end
        endcase
    end
endmodule

8 cycles per multiply. Trade: latency up 8×, area down ~5–10×.

Fixed-Point: Where the Bits Go

// Q4.4 × Q4.4 → Q8.8
wire [7:0]  a_q44, b_q44;       // 4 int . 4 frac
wire [15:0] prod_q88 = a_q44 * b_q44;

// Integer part for the 7-seg = bits [11:8] of Q8.8
wire [3:0]  int_part = prod_q88[11:8];

✓ Sanity check

2.5 → 8'b0010_1000
3.0 → 8'b0011_0000
product = 16'b0000_0111_1000_0000 = 7.5
[11:8] = 4'b0111 = 7

⚠ The classic mistake

Reading prod_q88[7:0] as the integer part. That's the fractional bits — you'll see junk on the 7-seg and wonder why.

▸ Phase 4 of 5  ·  ~45 min  ·  Fill the table · believe the report

🛡 Fortify

No design ships without a PPA row

Read a nextpnr Timing Report

yosys -p "synth_ice40 -top add8; write_json add8.json" add8.v
nextpnr-ice40 --hx1k --json add8.json --pcf go_board.pcf \
              --asc add8.asc 2>&1 | tail -30
Info: Max frequency for clock 'i_Clk_$glb_clk':  78.43 MHz (PASS at 25.00 MHz)
Info: Critical path report:
Info: curr total
Info:  0.4  0.4  Source rca8.fa3.sum_$_DFFE_PP_/Q
Info:  2.1  2.5  Net rca8.fa3.sum to rca8.fa4.a
Info:  0.7  3.2  Source rca8.fa4.A → cout chain
…
  • Fmax line is the headline — PASS / FAIL at your target.
  • Critical path tells you where the slowest signal lives. That's where you'd pipeline.

Today's PPA Table

ModuleConfigLUTsFFsFmax (MHz)Latency
AdderRCA, 8-bit?0?1 cyc
AdderRCA, 16-bit?0?1 cyc
AdderBehavioral, 8-bit?0?1 cyc
AdderBehavioral, 16-bit?0?1 cyc
MultBehavioral, 8-bit?0?1 cyc
MultShift-add, 8-bit?~24?8 cyc
Analysis prompt: in two sentences — when do you pick sequential? When combinational? Write it in your portfolio. This template is the one you'll use in the final report.

🤖 Check the Machine — Multiplier Prompt

Prompt: "Write an 8-bit Verilog multiplier optimized for the iCE40 with as few LUTs as possible."

  • Did the AI write assign product = a * b;? Quick, wrong: that's the LUT-maximum version on iCE40. Push back.
  • Did it write a sequential shift-and-add? Run yosys stat on both. Add the row to your PPA table.
  • Did it claim a DSP slice? iCE40 HX1K doesn't have any. AI sometimes hallucinates Xilinx primitives. Reject and request iCE40-specific code.
  • Save prompt + response + your correction to the AI portfolio. This is exactly the pattern Day 14 grades.

Hardware Verification

  • Adder: drive a from switches, b from a counter, sum to LEDs. Watch the carry ripple at slow clock — that's the critical path you measured.
  • Multiplier: press button → start. done LED pulses after 8 cycles. Result on 7-seg.
  • Fixed-point: 2.5 × 3.0 → 7 on 7-seg. Try 1.5 × 2.0 → 3. Try 3.5 × 3.5 → 12 (with overflow into bit 12 — note the silent wrap).
The honest check: a design that meets timing in sim can still fail on the board if you target 100 MHz with a 78 MHz Fmax. Always reconcile sim, synth, and pnr before claiming "done."

▸ Phase 5 of 5  ·  ~10 min  ·  Toward UART · toward Metro

🔗 Transfer

Tomorrow your tools converge on one interface

Where This Goes

Tomorrow (D11): UART TX. You'll use every piece from Weeks 1–3 — FSM (D7), counter (D4), shift register (D5), timing math (today). It's the synthesis exam, framed as a lab.

Next Monday: Barcelona Metro Control Room visit. The trains run on hard-real-time digital control. You'll recognize the constraint: Fmax for the city.

Tonight

  • 📺 D11 video — UART protocol + baud math (~50 min)
  • 📝 D11 quiz
  • 📐 Calculate CLKS_PER_BIT for 9600 / 19200 / 115200 baud at 25 MHz
Reflection prompt: in one sentence — name a real product where you'd pick the shift-and-add multiplier over *. Bring an answer tomorrow.

🔗 End of Day 10 · Hasta mañana

Working code is half the job.
Timing closure is the other half.

Every design you ship this week gets a Fmax / LUT / FF row.
Tomorrow: UART, where the math you just did decides whether your bytes arrive intact.

CRAFT