Topic 3 · Procedural Combinational Logic

Combinational Capstone — Extending the ALU

Video 4 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

Video 4 of 4 for Topic 3. Capstone for combinational logic. Topic 3.2 introduced the 4-bit ALU as a case exemplar; today we treat it as the synthesis of everything from Topics 1–3 and extend it. Tomorrow Topic 4 adds time.

🌍 Where This Lives

In Industry

Every CPU has an ALU. RISC-V's RV32I ISA defines 10 ALU ops; ARM Cortex-M defines ~20; x86 defines hundreds. The pattern is identical at every scale: an always @(*) with a case on opcode, default assignments, and a careful operator selection. The ALU is the canonical combinational module — if you can read one, you can read every datapath in every processor.

In This Course

Topic 3.2 introduced the 4-bit ALU as a case demo. Today we treat it as a composition of everything from this week: vectors and concatenation (2.1), operator costs (2.2), sized literals (2.3), always @(*) (3.1), case (3.2), default assignments to prevent latches (3.3). Topic 7 will wrap this same datapath in an FSM control path; Topic 11 will pipeline a similar block.

⚠️ An ALU Isn't Just `case`

❌ Wrong Model

“An ALU is a case on opcode. Pick the operator for each branch and you're done. The cost of each branch is just ‘one operation.’”

✓ Right Model

An ALU is a composition of every combinational technique you've learned. Each case branch carries a hardware price tag (D2.2 cost table). The always @(*) wrapper enforces zero memory (D3.1). The defaults at the top of the block are the difference between “clean synthesis” and “inferred latch” (D3.3). Choose poorly anywhere and the whole block balloons or breaks.

The receipt: Add a SHIFT op with a variable shift amount and your 33-cell ALU jumps past 100 cells. Same one-line case branch, ~3× the hardware. The operator picks the regime — just like the operator-cost table told you.

The ALU is the place where every D2–D3 lesson either pays off or punishes you. Every branch of the case is an operator from the cost table; every missing default is a latch waiting to happen. The whole-week story compresses into this one module.

👁️ I Do — Anatomy of `alu_4bit`

module alu_4bit (
    input  wire [3:0] i_a, i_b,         // ← D2.1 vectors
    input  wire [1:0] i_op,
    output reg  [3:0] o_result,
    output wire       o_zero,
    output reg        o_carry
);
    always @(*) begin                     // ← D3.1 combinational block
        o_carry  = 1'b0;                  // ← D3.3 default (no latch)
        o_result = 4'b0000;               // ← D2.3 sized literal default
        case (i_op)                       // ← D3.2 parallel mux topology
            2'b00: {o_carry, o_result} = i_a + i_b;  // D2.1 concat, D2.2 + cost
            2'b01: {o_carry, o_result} = i_a - i_b;  // D2.2 carry-chain again
            2'b10: o_result = i_a & i_b;             // D2.2 cheap (LUTs only)
            2'b11: o_result = i_a | i_b;             // D2.2 cheap (LUTs only)
            default: o_result = 4'b0000;             // D3.2 case-default safety
        endcase
    end
    assign o_zero = (o_result == 4'b0000); // D2.2 relational, 1-bit result
endmodule

My thinking: Six prior-day techniques in 14 lines. The defaults at the top are both latch insurance (D3.3) and a known-good output for unmatched opcodes. The {carry, result} concat (D2.1) is how you steal the carry-out of an N-bit add without writing an (N+1)-bit operand. ADD & SUB each burn a 4-bit carry chain; AND & OR don't — that's why the next slide's gate count is asymmetric.

🤝 We Do — Add `XOR` and `SHIFT`

Promote i_op to 3 bits. Add two ops. Predict the cost before running yosys.

input  wire [2:0] i_op;            // 2 → 3 bits, 8 codes available
input  wire [2:0] i_shamt;          // shift amount (variable!)
// ...
case (i_op)
    3'b000: {o_carry, o_result} = i_a + i_b;
    3'b001: {o_carry, o_result} = i_a - i_b;
    3'b010: o_result = i_a & i_b;
    3'b011: o_result = i_a | i_b;
    3'b100: o_result = i_a ^ i_b;        // ← XOR: cheap, like AND/OR
    3'b101: o_result = i_a << i_shamt;   // ← variable shift: barrel shifter!
    default: o_result = 4'b0000;
endcase

Together — predict before measuring: XOR adds ~4 LUTs (per-bit, no carry). Variable shift adds a 4-bit barrel shifter — ~3 layers of 2:1 muxes per bit → ~12–16 LUTs. Plus the case mux now selects 1-of-6 instead of 1-of-4. Predicted total: ~50 LUTs, 8 carries (same two adders), up from 25/8 baseline.

🧪 You Do — Find Two Latches

This compiles, but yosys reports two inferred latches. Find both bugs:

always @(*) begin
    case (i_op)
        3'b000: o_result = i_a + i_b;
        3'b001: o_result = i_a - i_b;
        3'b010: o_result = i_a & i_b;
        3'b011: o_result = i_a | i_b;
        3'b100: begin
            o_result = i_a ^ i_b;
            o_carry  = 1'b1;             // only set here
        end
    endcase
end

Bug 1: No default branch — opcodes 101–111 leave o_result unassigned → latch on o_result. Bug 2: o_carry is only assigned in the 3'b100 branch → latch on o_carry for every other opcode. Fix: Default-assign both signals at the top of the block (D3.3 technique #1) and add a default branch (technique #3). Either alone fixes one latch; together fixes both.

▶ LIVE DEMO

Baseline ALU vs. Extended ALU — Yosys Receipt

~4 minutes

▸ COMMANDS

cd lecture_examples/week1_day03/d03_s4_ex5/
cat alu_4bit_ext.v       # ALU + XOR + variable shift
make sim                 # self-checking testbench
make stat-baseline       # original alu_4bit
make stat-ext            # extended alu_4bit_ext
diff <(make stat-baseline) <(make stat-ext)

▸ EXPECTED STDOUT

==== alu_4bit (baseline) ====
   SB_LUT4    25
   SB_CARRY    8

==== alu_4bit_ext (+XOR +SHIFT) ====
   SB_LUT4    43    (~1.7× LUTs)
   SB_CARRY    8    (same 2 adders)

PASS: ADD/SUB/AND/OR/XOR opcodes
PASS: SHIFT 4'b1010 << 2 = 4'b1000
PASS: zero flag, carry flag
=== 14 passed, 0 failed ===

▸ WHAT TO LOOK FOR

CARRY count is unchanged — the new ops don't carry-propagate. The LUT delta (25 → 43) is almost entirely the variable shifter plus the wider 1-of-6 output mux. Toggle i_shamt to constant-shift in the source to see the LUT count collapse — pure rewiring, free.

[Terminal] Both stats lined up. Pause on the LUT delta. The receipt: the prediction from the We Do (~50 LUTs) landed within ~16% of measured (43) — right ballpark, slightly conservative. That's the operator-cost table earning its keep. Optional flex: edit the source to make the shift constant (i_a << 2) and rerun stat — LUT count drops back near baseline. That's D2.2's “constant shift = free” row, live.

🔧 What Did the Tool Build?

Module	Ops	SB_LUT4	SB_CARRY	Driver
`alu_4bit`	4 (ADD/SUB/AND/OR)	25	8	Two adders + 4:1 mux
`alu_4bit_ext`	+ XOR	~29	8	+ 4 LUTs (per-bit)
`alu_4bit_ext`	+ XOR + `<< i_shamt`	43	8	+ barrel shifter
`alu_4bit_ext`	+ XOR + `<< 2` (const)	~30	8	Const shift = pure rewiring

Reading the receipt: Adders dominate carries (unchanged at 8). The variable shifter dominates LUTs (+13 vs. constant shift). Swap variable shift for constant shift and the LUT growth nearly disappears. The case branches you choose determine the chip area, line by line.

Scaling check: Bump operand width from 4 to 16 bits. Adders: linear (+12 carries each). Variable shift: quadratic-ish — barrel shifter grows as N · log₂(N) muxes. By 32-bit you've spent more chip on the shifter than on the adders.

🤖 Check the Machine

Ask AI: “Extend this 4-bit ALU with a multiply opcode. Estimate the LUT cost on iCE40 HX1K and warn me about any latch risks.”

TASK

Ask AI for a multiply-extended ALU + cost estimate + latch audit.

BEFORE

Predict: 4×4 multiply ≈ 30–40 LUTs (D2.2). Latch risk if it forgets o_carry default.

AFTER

Strong AI defaults all outputs & estimates within 2×. Weak AI omits defaults → latch.

TAKEAWAY

Verify with make stat. AI ordering trustworthy, absolute counts often off.

Key Takeaways

① An ALU is a composition: vectors, sized literals, always @(*), case, defaults, operator costs.

② Every case branch carries a hardware price tag — the D2.2 cost table is the budget.

③ Default-assign every output at the top + always include default. Two latches prevented per case.

④ Predict gate cost from source, then verify with make stat. Read code, see gates.

Combinational design = the vocabulary you've already learned, deliberately composed. Now you read RTL like a senior engineer.

Pre-Class Self-Check

Q1: Why must you use @(*) instead of a manual sensitivity list?

Manual lists risk sim/synth mismatch. @(*) automatically includes all signals read inside the block.

Q2: What causes an unintentional latch?

Not assigning a signal in every possible path through always @(*).

Pre-Class Self-Check (cont.)

Q3: Name three techniques to prevent latch inference.

1. Default assignment at top. 2. Complete if/else chains. 3. default in case statements.

Q4: Adding a variable-shift op (a << n) to a 4-bit ALU. Why does the LUT count jump but the carry count stay the same?

Variable shift synthesizes to a barrel shifter — a stack of 2:1 muxes (LUTs only, no carry propagation). Adders are the only ops that consume the SB_CARRY chain.

🔗 End of Topic 3

Tomorrow: Sequential Logic

Topic 4 · Clocks, flip-flops, and your first blinking LED

▸ WHY THIS MATTERS NEXT

Topics 1–3 covered combinational logic — logic with no memory. Today's ALU evaluates instantly: change the inputs, the output follows on the next propagation delay. Topic 4 adds time: flip-flops, clocks, sequential design. You'll meet a second assignment operator (<=) that exists specifically to model what real silicon does on a clock edge — and you'll see why mixing it with today's = is the most expensive bug you can write.

Combinational Capstone — Extending the ALU

🌍 Where This Lives

In Industry

In This Course

⚠️ An ALU Isn't Just case

❌ Wrong Model

✓ Right Model

👁️ I Do — Anatomy of alu_4bit

🤝 We Do — Add XOR and SHIFT

🧪 You Do — Find Two Latches

Baseline ALU vs. Extended ALU — Yosys Receipt

🔧 What Did the Tool Build?

🤖 Check the Machine

Key Takeaways

Pre-Class Self-Check

Pre-Class Self-Check (cont.)

Tomorrow: Sequential Logic

⚠️ An ALU Isn't Just `case`

👁️ I Do — Anatomy of `alu_4bit`

🤝 We Do — Add `XOR` and `SHIFT`