Topic 3 · Procedural Combinational Logic

Combinational Capstone — Extending the ALU

Video 4 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

always @(*)if/else & caseLatch ProblemALU Capstone

🌍 Where This Lives

In Industry

Every CPU has an ALU. RISC-V's RV32I ISA defines 10 ALU ops; ARM Cortex-M defines ~20; x86 defines hundreds. The pattern is identical at every scale: an always @(*) with a case on opcode, default assignments, and a careful operator selection. The ALU is the canonical combinational module — if you can read one, you can read every datapath in every processor.

In This Course

Topic 3.2 introduced the 4-bit ALU as a case demo. Today we treat it as a composition of everything from this week: vectors and concatenation (2.1), operator costs (2.2), sized literals (2.3), always @(*) (3.1), case (3.2), default assignments to prevent latches (3.3). Topic 7 will wrap this same datapath in an FSM control path; Topic 11 will pipeline a similar block.

⚠️ An ALU Isn't Just case

❌ Wrong Model

“An ALU is a case on opcode. Pick the operator for each branch and you're done. The cost of each branch is just ‘one operation.’”

✓ Right Model

An ALU is a composition of every combinational technique you've learned. Each case branch carries a hardware price tag (D2.2 cost table). The always @(*) wrapper enforces zero memory (D3.1). The defaults at the top of the block are the difference between “clean synthesis” and “inferred latch” (D3.3). Choose poorly anywhere and the whole block balloons or breaks.

The receipt: Add a SHIFT op with a variable shift amount and your 33-cell ALU jumps past 100 cells. Same one-line case branch, ~3× the hardware. The operator picks the regime — just like the operator-cost table told you.

👁️ I Do — Anatomy of alu_4bit

module alu_4bit (
    input  wire [3:0] i_a, i_b,         // ← D2.1 vectors
    input  wire [1:0] i_op,
    output reg  [3:0] o_result,
    output wire       o_zero,
    output reg        o_carry
);
    always @(*) begin                     // ← D3.1 combinational block
        o_carry  = 1'b0;                  // ← D3.3 default (no latch)
        o_result = 4'b0000;               // ← D2.3 sized literal default
        case (i_op)                       // ← D3.2 parallel mux topology
            2'b00: {o_carry, o_result} = i_a + i_b;  // D2.1 concat, D2.2 + cost
            2'b01: {o_carry, o_result} = i_a - i_b;  // D2.2 carry-chain again
            2'b10: o_result = i_a & i_b;             // D2.2 cheap (LUTs only)
            2'b11: o_result = i_a | i_b;             // D2.2 cheap (LUTs only)
            default: o_result = 4'b0000;             // D3.2 case-default safety
        endcase
    end
    assign o_zero = (o_result == 4'b0000); // D2.2 relational, 1-bit result
endmodule
My thinking: Six prior-day techniques in 14 lines. The defaults at the top are both latch insurance (D3.3) and a known-good output for unmatched opcodes. The {carry, result} concat (D2.1) is how you steal the carry-out of an N-bit add without writing an (N+1)-bit operand. ADD & SUB each burn a 4-bit carry chain; AND & OR don't — that's why the next slide's gate count is asymmetric.

🤝 We Do — Add XOR and SHIFT

Promote i_op to 3 bits. Add two ops. Predict the cost before running yosys.

input  wire [2:0] i_op;            // 2 → 3 bits, 8 codes available
input  wire [2:0] i_shamt;          // shift amount (variable!)
// ...
case (i_op)
    3'b000: {o_carry, o_result} = i_a + i_b;
    3'b001: {o_carry, o_result} = i_a - i_b;
    3'b010: o_result = i_a & i_b;
    3'b011: o_result = i_a | i_b;
    3'b100: o_result = i_a ^ i_b;        // ← XOR: cheap, like AND/OR
    3'b101: o_result = i_a << i_shamt;   // ← variable shift: barrel shifter!
    default: o_result = 4'b0000;
endcase
Together — predict before measuring: XOR adds ~4 LUTs (per-bit, no carry). Variable shift adds a 4-bit barrel shifter — ~3 layers of 2:1 muxes per bit → ~12–16 LUTs. Plus the case mux now selects 1-of-6 instead of 1-of-4. Predicted total: ~50 LUTs, 8 carries (same two adders), up from 25/8 baseline.

🧪 You Do — Find Two Latches

This compiles, but yosys reports two inferred latches. Find both bugs:

always @(*) begin
    case (i_op)
        3'b000: o_result = i_a + i_b;
        3'b001: o_result = i_a - i_b;
        3'b010: o_result = i_a & i_b;
        3'b011: o_result = i_a | i_b;
        3'b100: begin
            o_result = i_a ^ i_b;
            o_carry  = 1'b1;             // only set here
        end
    endcase
end
Bug 1: No default branch — opcodes 101111 leave o_result unassigned → latch on o_result. Bug 2: o_carry is only assigned in the 3'b100 branch → latch on o_carry for every other opcode. Fix: Default-assign both signals at the top of the block (D3.3 technique #1) and add a default branch (technique #3). Either alone fixes one latch; together fixes both.
▶ LIVE DEMO

Baseline ALU vs. Extended ALU — Yosys Receipt

~4 minutes

▸ COMMANDS

cd lecture_examples/week1_day03/d03_s4_ex5/
cat alu_4bit_ext.v       # ALU + XOR + variable shift
make sim                 # self-checking testbench
make stat-baseline       # original alu_4bit
make stat-ext            # extended alu_4bit_ext
diff <(make stat-baseline) <(make stat-ext)

▸ EXPECTED STDOUT

==== alu_4bit (baseline) ====
   SB_LUT4    25
   SB_CARRY    8

==== alu_4bit_ext (+XOR +SHIFT) ====
   SB_LUT4    43    (~1.7× LUTs)
   SB_CARRY    8    (same 2 adders)

PASS: ADD/SUB/AND/OR/XOR opcodes
PASS: SHIFT 4'b1010 << 2 = 4'b1000
PASS: zero flag, carry flag
=== 14 passed, 0 failed ===

▸ WHAT TO LOOK FOR

CARRY count is unchanged — the new ops don't carry-propagate. The LUT delta (25 → 43) is almost entirely the variable shifter plus the wider 1-of-6 output mux. Toggle i_shamt to constant-shift in the source to see the LUT count collapse — pure rewiring, free.

🔧 What Did the Tool Build?

ModuleOpsSB_LUT4SB_CARRYDriver
alu_4bit4 (ADD/SUB/AND/OR)258Two adders + 4:1 mux
alu_4bit_ext+ XOR~298+ 4 LUTs (per-bit)
alu_4bit_ext+ XOR + << i_shamt438+ barrel shifter
alu_4bit_ext+ XOR + << 2 (const)~308Const shift = pure rewiring
Reading the receipt: Adders dominate carries (unchanged at 8). The variable shifter dominates LUTs (+13 vs. constant shift). Swap variable shift for constant shift and the LUT growth nearly disappears. The case branches you choose determine the chip area, line by line.
Scaling check: Bump operand width from 4 to 16 bits. Adders: linear (+12 carries each). Variable shift: quadratic-ish — barrel shifter grows as N · log₂(N) muxes. By 32-bit you've spent more chip on the shifter than on the adders.

🤖 Check the Machine

Ask AI: “Extend this 4-bit ALU with a multiply opcode. Estimate the LUT cost on iCE40 HX1K and warn me about any latch risks.”

TASK

Ask AI for a multiply-extended ALU + cost estimate + latch audit.

BEFORE

Predict: 4×4 multiply ≈ 30–40 LUTs (D2.2). Latch risk if it forgets o_carry default.

AFTER

Strong AI defaults all outputs & estimates within 2×. Weak AI omits defaults → latch.

TAKEAWAY

Verify with make stat. AI ordering trustworthy, absolute counts often off.

Key Takeaways

 An ALU is a composition: vectors, sized literals, always @(*), case, defaults, operator costs.

 Every case branch carries a hardware price tag — the D2.2 cost table is the budget.

 Default-assign every output at the top + always include default. Two latches prevented per case.

 Predict gate cost from source, then verify with make stat. Read code, see gates.

Combinational design = the vocabulary you've already learned, deliberately composed. Now you read RTL like a senior engineer.

Pre-Class Self-Check

Q1: Why must you use @(*) instead of a manual sensitivity list?

Manual lists risk sim/synth mismatch. @(*) automatically includes all signals read inside the block.

Q2: What causes an unintentional latch?

Not assigning a signal in every possible path through always @(*).

Pre-Class Self-Check (cont.)

Q3: Name three techniques to prevent latch inference.

1. Default assignment at top. 2. Complete if/else chains. 3. default in case statements.

Q4: Adding a variable-shift op (a << n) to a 4-bit ALU. Why does the LUT count jump but the carry count stay the same?

Variable shift synthesizes to a barrel shifter — a stack of 2:1 muxes (LUTs only, no carry propagation). Adders are the only ops that consume the SB_CARRY chain.

🔗 End of Topic 3

Tomorrow: Sequential Logic

Topic 4 · Clocks, flip-flops, and your first blinking LED

▸ WHY THIS MATTERS NEXT

Topics 1–3 covered combinational logic — logic with no memory. Today's ALU evaluates instantly: change the inputs, the output follows on the next propagation delay. Topic 4 adds time: flip-flops, clocks, sequential design. You'll meet a second assignment operator (<=) that exists specifically to model what real silicon does on a clock edge — and you'll see why mixing it with today's = is the most expensive bug you can write.