Topic 2 · Combinational Building Blocks

Operators

Video 2 of 4 · ~14 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

Data TypesOperatorsSized Literals7-Seg Display

🌍 Where This Lives

In Industry

The first performance review on any RTL codebase is an operator audit: where are the multiplies? The dividers? The wide comparators? These drive area, timing, and power. Senior designers read code and see gates.

In This Course

Your Topic 3 ALU uses +, -, &, |. Topic 9 memory addressing uses comparators. Topic 11 UART uses reduction operators for parity. Every lab after today uses this vocabulary.

Industry alignment: “Write area-efficient RTL” appears on almost every FPGA/ASIC job posting. That literally means: know which operators are cheap and which are expensive. This video is step 1.

⚠️ Syntax Does Not Equal Cost

❌ Wrong Model

“One operator = one operation. a + b and a & b look the same, so they cost the same.”

✓ Right Model

Each operator has a hardware footprint. & on 32 bits = 32 LUTs. + on 32 bits = 32 LUTs + 32 carry cells (a ripple chain with real delay). * on 32 bits = hundreds of LUTs or a dedicated DSP block.

The receipt: An iCE40 HX1K has 1280 LUTs total. One 32-bit multiply can consume 10% of your chip. Two of them and you've spent more area on arithmetic than on your actual design.

🧱 Refresher — LUTs vs CARRYs

Topic 1 covered the LUT. Before the cost table, meet its neighbor in every iCE40 logic tile: the dedicated CARRY cell.

SB_LUT4 — the universal gate

a b c d 16-bit truth-table (any 4-input fn) y
  • 4 inputs → any 1-bit Boolean function
  • Configurable: 16 bits of memory pick the function
  • Used for: bitwise, logical, mux, reduction, comparators

SB_CARRY — the dedicated adder

cin cell0 cell1 cell2 cout s0 s1 s2 hard-wired carry chain → fast, free, fixed-purpose
  • 1 cell per bit, chained directly to its neighbor
  • Not configurable — only does carry-propagate
  • Used for: +, -, <, >, magnitude compares
Why the split matters: One iCE40 logic tile = 1 LUT4 + 1 CARRY + 1 FF. The LUT4 generates the sum bit; the CARRY propagates the carry bit on a dedicated wire that bypasses the routing fabric. That's why a + b shows up as both SB_LUT4 and SB_CARRY in yosys stat — and why a bitwise a & b shows zero CARRYs: no carry to propagate.

The Logic Cell, Revisited

Topic 1's tile picture had a LUT and a flip-flop. Here it is again with the CARRY cell drawn in — so 1280 LCs really means 1280 LUTs and 1280 CARRYs.

iCE40 fabric with logic cells split into LUT and CARRY halves; zoom-in shows one logic cell containing a 4-input LUT, a CARRY cell with dedicated cin/cout, a D flip-flop, and a 2:1 output mux
1:1 pairing, separate budgets. Each LC has one LUT4 and one CARRY hardwired together; the CARRYs in a column form a fixed top-to-bottom chain. They're counted separately because they're separate primitives — a & b burns LUTs only, a + b burns both, and a long adder can run out of contiguous CARRY chain before it runs out of LUTs.

The Operator Cost Table

CategoryOperatorsShape & Result WidthExampleiCE40 Cost
Bitwise& | ^ ~Binary, per-bit; result = operand widtha & bCheap
Logical&& || !Binary, whole-operand boolean; result = 1 bita && bCheap
Reduction (unary)& | ^Unary, folds across all bits; result = 1 bit&aCheap
Arithmetic +/-+ -Binary, per-bit + carry chaina + bModerate
Arithmetic **Binary, full multipliera * bExpensive
Relational== < >Binary, compare; result = 1 bita > bModerate
Shift (constant)<< 3Unary w/ literal; pure rewiringa << 3Free
Shift (variable)<< nBinary; needs full barrel shiftera << nExpensive
Conditional? :Ternary; selects between two valuessel ? a : bCheap
Bitwise
Logical
Reduction
Add / Sub
Multiply
>
Relational
Shift <<3
Shift <<n
a b 2:1 sel
Conditional

👁️ I Do — Bitwise vs Logical

wire [3:0] a = 4'b1010;
wire [3:0] b = 4'b0101;

wire [3:0] w_bitwise = a & b;    // = 4'b0000 (per-bit AND)
wire       w_logical = a && b;   // = 1'b1 (both nonzero → true)
My thinking: Single & operates on every bit independently, result same width as operands. Double && treats each operand as a boolean (any bit set = true), result always 1 bit. Mixing them up is a classic bug — especially with C/Java muscle memory.

🤝 We Do — The Conditional Mux

// 2:1 mux
assign y = sel ? a : b;

// 4:1 mux — fill in:
assign y = sel[1] ? ( sel[0] ? /* ? */ : /* ? */ )
                  : ( sel[0] ? /* ? */ : /* ? */ );
Answer: sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a). Selects: 00→a, 01→b, 10→c, 11→d. Yosys will build 3 muxes in a tree — 2 at the first level, 1 at the root.

🧪 You Do — Predict Operator Costs

For each expression on 8-bit buses, rank by iCE40 LUT cost (low/med/high):

  1. assign x = a & b;
  2. assign x = a + b;
  3. assign x = a * b;
  4. assign x = a << 3;
  5. assign x = a << n; (where n is a 3-bit wire)
  6. assign x = (a > 8'd100);
Ranking (cheap→expensive): (4) FREE · (1) ~8 LUTs · (6) ~4 LUTs · (2) 8 LUTs + 7 carry · (5) ~24 LUTs barrel · (3) ~140 LUTs multiply.
▶ LIVE DEMO

Operator Cost Showdown — &, +, *

~5 minutes · verify the You-Do ranking with Yosys

▸ COMMANDS

cd lecture_examples/week1_day02/d02_s2_ex2/
cat op_compare.v            # 3 modules, one assign each
make stat-ops               # synth all 3 → side-by-side
make sim                    # (warmup) mux from We Do still passes

▸ EXPECTED STDOUT (8-bit)

==== bitwise_and ====
   SB_LUT4    8
   SB_CARRY   0
==== adder ====
   SB_LUT4    8
   SB_CARRY   7
==== multiplier ====
   SB_LUT4   142
   SB_CARRY   10

▸ WHAT TO LOOK FOR

Same operand width, same one-line body — only the operator changes. & uses zero carry cells (parallel lanes). + needs a carry between bits. * blows up roughly N×: ~18× the LUTs of the adder. That ratio is the lesson; the numbers will match the recap table on the next slide.

🔧 Recap — What the Tool Just Built

Same three 8-bit modules from make stat-ops, lined up:

ModuleBodySB_LUT4SB_CARRYVerdict
bitwise_andy = a & b;80Cheap
addery = a + b;87Moderate
multipliery = a * b;14210Expensive
Try it yourself: edit op_compare.v, bump WIDTH to 16 or 32, rerun make stat-ops. The AND scales linearly; the multiplier scales quadratically.
Mental math: The iCE40 HX1K has 1280 LUTs. A single 8×8 multiply eats ~11% of the chip. A 32×32 multiply eats… not possible on an HX1K without tricks.

🤖 Check the Machine

Ask AI: “Rank these on iCE40 LUT cost: a+b, a*b, a<<3, a<<n, a==b, all 16-bit.”

TASK

Ask for LUT cost ranking on 16-bit operands.

BEFORE

Predict: constant shift free → == cheap → + mod → barrel shift exp → * most.

AFTER

AI ordering usually correct. Absolute counts often 2× off — verify with Yosys.

TAKEAWAY

AI gives good ordinal rankings. Trust the ranking. Verify absolute numbers with stat.

Rule: For area-sensitive designs, Yosys stat is ground truth. AI estimates are useful for early rough sizing.

Key Takeaways

 Bitwise (&) = per-bit. Logical (&&) = 1-bit true/false.

? : is the mux. Nest for wider muxes.

 Constant shifts are free. Variable shifts are expensive.

 Multiply costs ~N² LUTs. Always check utilization.

Every operator has a hardware price tag. Read the receipts.

🔗 Transfer

Sized Literals & Width Matching

Video 3 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You just saw that 8+8 can cost 8 LUTs + 7 carry cells. But what's the bit-width of the result? 8? 9? Here's a puzzle: 4'd15 + 4'd1 gives 0, not 16 — unless you size the result correctly. Video 3 shows you why, and how to stop silent overflow bugs.