Video 4 of 4 · ~10 minutes
Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Every CPU has an ALU. RISC-V's RV32I ISA defines 10 ALU ops; ARM Cortex-M defines ~20; x86 defines hundreds. The pattern is identical at every scale: an always @(*) with a case on opcode, default assignments, and a careful operator selection. The ALU is the canonical combinational module — if you can read one, you can read every datapath in every processor.
Topic 3.2 introduced the 4-bit ALU as a case demo. Today we treat it as a composition of everything from this week: vectors and concatenation (2.1), operator costs (2.2), sized literals (2.3), always @(*) (3.1), case (3.2), default assignments to prevent latches (3.3). Topic 7 will wrap this same datapath in an FSM control path; Topic 11 will pipeline a similar block.
case“An ALU is a case on opcode. Pick the operator for each branch and you're done. The cost of each branch is just ‘one operation.’”
An ALU is a composition of every combinational technique you've learned. Each case branch carries a hardware price tag (D2.2 cost table). The always @(*) wrapper enforces zero memory (D3.1). The defaults at the top of the block are the difference between “clean synthesis” and “inferred latch” (D3.3). Choose poorly anywhere and the whole block balloons or breaks.
SHIFT op with a variable shift amount and your 33-cell ALU jumps past 100 cells. Same one-line case branch, ~3× the hardware. The operator picks the regime — just like the operator-cost table told you.
alu_4bitmodule alu_4bit (
input wire [3:0] i_a, i_b, // ← D2.1 vectors
input wire [1:0] i_op,
output reg [3:0] o_result,
output wire o_zero,
output reg o_carry
);
always @(*) begin // ← D3.1 combinational block
o_carry = 1'b0; // ← D3.3 default (no latch)
o_result = 4'b0000; // ← D2.3 sized literal default
case (i_op) // ← D3.2 parallel mux topology
2'b00: {o_carry, o_result} = i_a + i_b; // D2.1 concat, D2.2 + cost
2'b01: {o_carry, o_result} = i_a - i_b; // D2.2 carry-chain again
2'b10: o_result = i_a & i_b; // D2.2 cheap (LUTs only)
2'b11: o_result = i_a | i_b; // D2.2 cheap (LUTs only)
default: o_result = 4'b0000; // D3.2 case-default safety
endcase
end
assign o_zero = (o_result == 4'b0000); // D2.2 relational, 1-bit result
endmodule
{carry, result} concat (D2.1) is how you steal the carry-out of an N-bit add without writing an (N+1)-bit operand. ADD & SUB each burn a 4-bit carry chain; AND & OR don't — that's why the next slide's gate count is asymmetric.
XOR and SHIFTPromote i_op to 3 bits. Add two ops. Predict the cost before running yosys.
input wire [2:0] i_op; // 2 → 3 bits, 8 codes available
input wire [2:0] i_shamt; // shift amount (variable!)
// ...
case (i_op)
3'b000: {o_carry, o_result} = i_a + i_b;
3'b001: {o_carry, o_result} = i_a - i_b;
3'b010: o_result = i_a & i_b;
3'b011: o_result = i_a | i_b;
3'b100: o_result = i_a ^ i_b; // ← XOR: cheap, like AND/OR
3'b101: o_result = i_a << i_shamt; // ← variable shift: barrel shifter!
default: o_result = 4'b0000;
endcase
case mux now selects 1-of-6 instead of 1-of-4. Predicted total: ~50 LUTs, 8 carries (same two adders), up from 25/8 baseline.
This compiles, but yosys reports two inferred latches. Find both bugs:
always @(*) begin
case (i_op)
3'b000: o_result = i_a + i_b;
3'b001: o_result = i_a - i_b;
3'b010: o_result = i_a & i_b;
3'b011: o_result = i_a | i_b;
3'b100: begin
o_result = i_a ^ i_b;
o_carry = 1'b1; // only set here
end
endcase
end
default branch — opcodes 101–111 leave o_result unassigned → latch on o_result.
Bug 2: o_carry is only assigned in the 3'b100 branch → latch on o_carry for every other opcode.
Fix: Default-assign both signals at the top of the block (D3.3 technique #1) and add a default branch (technique #3). Either alone fixes one latch; together fixes both.
~4 minutes
▸ COMMANDS
cd lecture_examples/week1_day03/d03_s4_ex5/
cat alu_4bit_ext.v # ALU + XOR + variable shift
make sim # self-checking testbench
make stat-baseline # original alu_4bit
make stat-ext # extended alu_4bit_ext
diff <(make stat-baseline) <(make stat-ext)
▸ EXPECTED STDOUT
==== alu_4bit (baseline) ====
SB_LUT4 25
SB_CARRY 8
==== alu_4bit_ext (+XOR +SHIFT) ====
SB_LUT4 43 (~1.7× LUTs)
SB_CARRY 8 (same 2 adders)
PASS: ADD/SUB/AND/OR/XOR opcodes
PASS: SHIFT 4'b1010 << 2 = 4'b1000
PASS: zero flag, carry flag
=== 14 passed, 0 failed ===
▸ WHAT TO LOOK FOR
CARRY count is unchanged — the new ops don't carry-propagate. The LUT delta (25 → 43) is almost entirely the variable shifter plus the wider 1-of-6 output mux. Toggle i_shamt to constant-shift in the source to see the LUT count collapse — pure rewiring, free.
| Module | Ops | SB_LUT4 | SB_CARRY | Driver |
|---|---|---|---|---|
alu_4bit | 4 (ADD/SUB/AND/OR) | 25 | 8 | Two adders + 4:1 mux |
alu_4bit_ext | + XOR | ~29 | 8 | + 4 LUTs (per-bit) |
alu_4bit_ext | + XOR + << i_shamt | 43 | 8 | + barrel shifter |
alu_4bit_ext | + XOR + << 2 (const) | ~30 | 8 | Const shift = pure rewiring |
N · log₂(N) muxes. By 32-bit you've spent more chip on the shifter than on the adders.
Ask AI: “Extend this 4-bit ALU with a multiply opcode. Estimate the LUT cost on iCE40 HX1K and warn me about any latch risks.”
TASK
Ask AI for a multiply-extended ALU + cost estimate + latch audit.
BEFORE
Predict: 4×4 multiply ≈ 30–40 LUTs (D2.2). Latch risk if it forgets o_carry default.
AFTER
Strong AI defaults all outputs & estimates within 2×. Weak AI omits defaults → latch.
TAKEAWAY
Verify with make stat. AI ordering trustworthy, absolute counts often off.
① An ALU is a composition: vectors, sized literals, always @(*), case, defaults, operator costs.
② Every case branch carries a hardware price tag — the D2.2 cost table is the budget.
③ Default-assign every output at the top + always include default. Two latches prevented per case.
④ Predict gate cost from source, then verify with make stat. Read code, see gates.
Q1: Why must you use @(*) instead of a manual sensitivity list?
@(*) automatically includes all signals read inside the block.Q2: What causes an unintentional latch?
always @(*).Q3: Name three techniques to prevent latch inference.
default in case statements.Q4: Adding a variable-shift op (a << n) to a 4-bit ALU. Why does the LUT count jump but the carry count stay the same?
🔗 End of Topic 3
Topic 4 · Clocks, flip-flops, and your first blinking LED
▸ WHY THIS MATTERS NEXT
Topics 1–3 covered combinational logic — logic with no memory. Today's ALU evaluates instantly: change the inputs, the output follows on the next propagation delay. Topic 4 adds time: flip-flops, clocks, sequential design. You'll meet a second assignment operator (<=) that exists specifically to model what real silicon does on a clock edge — and you'll see why mixing it with today's = is the most expensive bug you can write.