CRAFT cycle · 2.5 hours · Tue 6/9 · Eve: Flamenco
HDL for Digital System Design · UCF ECE · Barcelona Summer 2026
| Phase | Time | Activity |
|---|---|---|
| 🌍 Contextualize | 10 min | Metro precision · Fmax = product spec, not a number |
| ⚠️ Reframe | 15 min | Faster clock ≠ better · the critical path owns you |
| 🛠 Assemble | 70 min | Adder arch comparison · shift-and-add multiplier · fixed-point Q4.4 |
| 🛡 Fortify | 45 min | Read nextpnr timing report · fill the PPA table · 🤖 AI multiplier critique |
| 🔗 Transfer | 10 min | Metro visit teaser · tomorrow UART (everything converges) |
▸ Phase 1 of 5 · ~10 min
A missed deadline is a safety stop
D8 gave you a PPA habit (width sweep). D10 gives it teeth: every PPA row now has a Fmax column. Working code that misses Fmax is a product that doesn't ship.
"My design works in simulation" is a sentence that ends a lot of internships. After today, you'll never trust a sim without a timing report.
▸ Phase 2 of 5 · ~15 min
Faster ≠ better. Critical path = boss.
"If 25 MHz works, 100 MHz must be faster. Same code, just a different PLL."
Every clocked design has an Fmax = 1 / (longest combinational delay between two FFs). Cross that line and your FFs latch garbage. Setup violation = wrong answer, not a slow answer.
nextpnr reports — and the one you'd pipeline to go faster.
Performance — Fmax, throughput, cycles per op
Power — switching activity × capacitance × V²
Area — LUTs & FFs on FPGA · gates & µm² on ASIC
On iCE40 we measure performance with nextpnr Fmax, area with yosys stat, and treat power as toggle-rate × C (conceptual).
Combinational a * b — 1 cycle, big LUT pile, low Fmax.
Sequential shift-and-add — 8 cycles, tiny area, high Fmax.
Pick the corner of the triangle your product cares about.
a + b Actually Builds// One line of Verilog…
assign sum = a + b;
assign product = a * b; on iCE40 → quadratic LUT explosion. No DSP blocks. This is today's most expensive line of code.
▸ Phase 3 of 5 · ~70 min · You build
Two adders · two multipliers · one Q4.4
full_adder) vs behavioral +, at 8-bit and 16-bit. Record LUTs / FFs / Fmax for all 4 variants.*. Compare both.nextpnr timing report. Ex 5 stretch: PLL / CDC demo if you finish early.
// Variant A — manual ripple-carry (reuse D2 full_adder)
module rca8 (input [7:0] a, b, input cin,
output [7:0] sum, output cout);
wire [7:0] c;
full_adder fa0 (a[0], b[0], cin, sum[0], c[0]);
full_adder fa1 (a[1], b[1], c[0], sum[1], c[1]);
// … chain c[1]→fa2, c[2]→fa3, …
assign cout = c[7];
endmodule
// Variant B — behavioral
module add8 (input [7:0] a, b, output [8:0] sum);
assign sum = a + b; // tool chooses
endmodule
Synthesize both at 8 and 16 bits. The interesting question isn't which is smaller — it's whether they're the same circuit.
module shift_add_mult #(parameter W = 8)(
input wire clk, rst, start,
input wire [W-1:0] a, b,
output reg [2*W-1:0] product,
output reg done
);
typedef enum logic [1:0] {IDLE, COMPUTE, DONE} state_t;
state_t state;
reg [W-1:0] mplier;
reg [2*W-1:0] mcand;
reg [3:0] bit_idx;
always @(posedge clk) begin
if (rst) begin state <= IDLE; done <= 0; end
else case (state)
IDLE: if (start) begin
product <= 0; mplier <= b; mcand <= a;
bit_idx <= 0; state <= COMPUTE;
end
COMPUTE: begin
if (mplier[0]) product <= product + mcand;
mplier <= mplier >> 1;
mcand <= mcand << 1;
bit_idx <= bit_idx + 1;
if (bit_idx == W-1) state <= DONE;
end
DONE: begin done <= 1; state <= IDLE; end
endcase
end
endmodule
8 cycles per multiply. Trade: latency up 8×, area down ~5–10×.
// Q4.4 × Q4.4 → Q8.8
wire [7:0] a_q44, b_q44; // 4 int . 4 frac
wire [15:0] prod_q88 = a_q44 * b_q44;
// Integer part for the 7-seg = bits [11:8] of Q8.8
wire [3:0] int_part = prod_q88[11:8];
2.5 → 8'b0010_1000
3.0 → 8'b0011_0000
product = 16'b0000_0111_1000_0000 = 7.5[11:8] = 4'b0111 = 7 ✓
Reading prod_q88[7:0] as the integer part. That's the fractional bits — you'll see junk on the 7-seg and wonder why.
▸ Phase 4 of 5 · ~45 min · Fill the table · believe the report
No design ships without a PPA row
nextpnr Timing Reportyosys -p "synth_ice40 -top add8; write_json add8.json" add8.v
nextpnr-ice40 --hx1k --json add8.json --pcf go_board.pcf \
--asc add8.asc 2>&1 | tail -30
Info: Max frequency for clock 'i_Clk_$glb_clk': 78.43 MHz (PASS at 25.00 MHz) Info: Critical path report: Info: curr total Info: 0.4 0.4 Source rca8.fa3.sum_$_DFFE_PP_/Q Info: 2.1 2.5 Net rca8.fa3.sum to rca8.fa4.a Info: 0.7 3.2 Source rca8.fa4.A → cout chain …
| Module | Config | LUTs | FFs | Fmax (MHz) | Latency |
|---|---|---|---|---|---|
| Adder | RCA, 8-bit | ? | 0 | ? | 1 cyc |
| Adder | RCA, 16-bit | ? | 0 | ? | 1 cyc |
| Adder | Behavioral, 8-bit | ? | 0 | ? | 1 cyc |
| Adder | Behavioral, 16-bit | ? | 0 | ? | 1 cyc |
| Mult | Behavioral, 8-bit | ? | 0 | ? | 1 cyc |
| Mult | Shift-add, 8-bit | ? | ~24 | ? | 8 cyc |
Prompt: "Write an 8-bit Verilog multiplier optimized for the iCE40 with as few LUTs as possible."
assign product = a * b;? Quick, wrong: that's the LUT-maximum version on iCE40. Push back.yosys stat on both. Add the row to your PPA table.a from switches, b from a counter, sum to LEDs. Watch the carry ripple at slow clock — that's the critical path you measured.done LED pulses after 8 cycles. Result on 7-seg.▸ Phase 5 of 5 · ~10 min · Toward UART · toward Metro
Tomorrow your tools converge on one interface
Tomorrow (D11): UART TX. You'll use every piece from Weeks 1–3 — FSM (D7), counter (D4), shift register (D5), timing math (today). It's the synthesis exam, framed as a lab.
Next Monday: Barcelona Metro Control Room visit. The trains run on hard-real-time digital control. You'll recognize the constraint: Fmax for the city.
CLKS_PER_BIT for 9600 / 19200 / 115200 baud at 25 MHz*. Bring an answer tomorrow.
🔗 End of Day 10 · Hasta mañana
Every design you ship this week gets a Fmax / LUT / FF row.
Tomorrow: UART, where the math you just did decides whether your bytes arrive intact.