Topic 2 · Combinational Building Blocks

Operators

Video 2 of 4 · ~14 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

The first performance review on any RTL codebase is an operator audit: where are the multiplies? The dividers? The wide comparators? These drive area, timing, and power. Senior designers read code and see gates.

In This Course

Your Topic 3 ALU uses +, -, &, |. Topic 9 memory addressing uses comparators. Topic 11 UART uses reduction operators for parity. Every lab after today uses this vocabulary.

Industry alignment: “Write area-efficient RTL” appears on almost every FPGA/ASIC job posting. That literally means: know which operators are cheap and which are expensive. This video is step 1.

⚠️ Syntax Does Not Equal Cost

❌ Wrong Model

“One operator = one operation. a + b and a & b look the same, so they cost the same.”

✓ Right Model

Each operator has a hardware footprint. & on 32 bits = 32 LUTs. + on 32 bits = 32 LUTs + 32 carry cells (a ripple chain with real delay). * on 32 bits = hundreds of LUTs or a dedicated DSP block.

The receipt: An iCE40 HX1K has 1280 LUTs total. One 32-bit multiply can consume 10% of your chip. Two of them and you've spent more area on arithmetic than on your actual design.

🧱 Refresher — LUTs vs CARRYs

Topic 1 covered the LUT. Before the cost table, meet its neighbor in every iCE40 logic tile: the dedicated CARRY cell.

SB_LUT4 — the universal gate

4 inputs → any 1-bit Boolean function
Configurable: 16 bits of memory pick the function
Used for: bitwise, logical, mux, reduction, comparators

SB_CARRY — the dedicated adder

1 cell per bit, chained directly to its neighbor
Not configurable — only does carry-propagate
Used for: +, -, <, >, magnitude compares

Why the split matters: One iCE40 logic tile = 1 LUT4 + 1 CARRY + 1 FF. The LUT4 generates the sum bit; the CARRY propagates the carry bit on a dedicated wire that bypasses the routing fabric. That's why a + b shows up as both SB_LUT4 and SB_CARRY in yosys stat — and why a bitwise a & b shows zero CARRYs: no carry to propagate.

Fast refresher before the cost table. Topic 1 introduced the LUT — the configurable 4-input gate. Today we add its silent partner, the CARRY cell. Every iCE40 logic tile pairs them: one LUT, one CARRY, one flip-flop. The key insight: bitwise ops (AND/OR/XOR) are pure parallel functions — bit i out depends only on bit i in — so they only burn LUTs. Arithmetic and magnitude comparisons need information to flow between bit positions (the carry), and the chip has dedicated silicon for that. The CARRY cell isn't a LUT — it's a tiny fixed adder helper wired directly to its neighbor with no routing delay. That's why adders are nearly free in delay on small widths even though they cost N LUTs and N CARRYs in area. Watch for this in the next slide's table: any row that mentions "carry chain" means SB_CARRY shows up alongside SB_LUT4. The live demo will confirm the split numerically — bitwise AND = 8/0, adder = 8/7, multiply = 142/10 (a multiplier is just lots of adders).

The Logic Cell, Revisited

Topic 1's tile picture had a LUT and a flip-flop. Here it is again with the CARRY cell drawn in — so 1280 LCs really means 1280 LUTs and 1280 CARRYs.

iCE40 fabric with logic cells split into LUT and CARRY halves; zoom-in shows one logic cell containing a 4-input LUT, a CARRY cell with dedicated cin/cout, a D flip-flop, and a 2:1 output mux

1:1 pairing, separate budgets. Each LC has one LUT4 and one CARRY hardwired together; the CARRYs in a column form a fixed top-to-bottom chain. They're counted separately because they're separate primitives — a & b burns LUTs only, a + b burns both, and a long adder can run out of contiguous CARRY chain before it runs out of LUTs.

Same picture as Topic 1, with the missing half drawn in. Three quick facts to anchor before the cost table: (1) Counts. iCE40 HX1K = 1280 logic cells. Each LC = 1 LUT4 + 1 CARRY + 1 FF. So 1280 LUTs and 1280 CARRYs — an exact 1:1. (2) Pairing. LUT[i] and CARRY[i] are physically bonded inside LC[i] at fab time. CARRY[i]'s cout is hard-wired to CARRY[i+1]'s cin in the same column — that's the carry chain, and it bypasses the routing fabric, which is why adders are fast. (3) Independence. The CARRY can sit idle while the LUT does its own thing — that's the a & b case (LUTs lit, CARRYs dark). When CARRY is active, the paired LUT becomes the sum-bit generator (a XOR b XOR cin) — you don't get to use that LUT for some other function. So “burning a CARRY” effectively burns its LUT too. That's why an 8-bit a + b shows up as 8 LUTs paired with 7 CARRYs in yosys, not 8/0 (one CARRY per inter-bit boundary). Why count them separately? Two reasons. First, they're separate primitives in the bitstream — SB_LUT4 and SB_CARRY are distinct cells with distinct config. Second, they're separate budgets: a long ripple chain needs vertically contiguous CARRYs (placement-constrained), and a design can run out of those before it runs out of LUTs even though the totals are equal. The split tells you which limit you'll hit first.

The Operator Cost Table

Category	Operators	Shape & Result Width	Example	iCE40 Cost
Bitwise	`&` `\|` `^` `~`	Binary, per-bit; result = operand width	`a & b`	Cheap
Logical	`&&` `\|\|` `!`	Binary, whole-operand boolean; result = 1 bit	`a && b`	Cheap
Reduction (unary)	`&` `\|` `^`	Unary, folds across all bits; result = 1 bit	`&a`	Cheap
Arithmetic +/-	`+` `-`	Binary, per-bit + carry chain	`a + b`	Moderate
Arithmetic *	`*`	Binary, full multiplier	`a * b`	Expensive
Relational	`==` `<` `>`	Binary, compare; result = 1 bit	`a > b`	Moderate
Shift (constant)	`<< 3`	Unary w/ literal; pure rewiring	`a << 3`	Free
Shift (variable)	`<< n`	Binary; needs full barrel shifter	`a << n`	Expensive
Conditional	`? :`	Ternary; selects between two values	`sel ? a : b`	Cheap

Bitwise

Logical

Reduction

Add / Sub

Multiply

Relational

Shift <<3

Shift <<n

Conditional

Print this table. Put it on your wall. “Cheap” means it fits easily. “Expensive” means you should check your utilization. Multiply is particularly nasty without DSP blocks — iCE40 HX1K has no hard multipliers, so * gets built out of LUTs. Spend extra time on the top three rows: bitwise, logical, and reduction all look like the same symbols but produce different result widths and different hardware. The shape (binary vs unary, single vs double symbol) is what disambiguates them — Verilog won't warn you if you pick the wrong one. The icons along the bottom give the topology at a glance: parallel lanes for bitwise, fan-in to one bit for logical and reduction, a chain for add/sub, an N×N grid for multiply, a comparator funnel for relational, pure rewiring for constant shift, a stack of mux layers for variable shift, and a 2:1 mux for the ternary. Border color = cost: green free, blue cheap, gold moderate, red expensive.

👁️ I Do — Bitwise vs Logical

wire [3:0] a = 4'b1010;
wire [3:0] b = 4'b0101;

wire [3:0] w_bitwise = a & b;    // = 4'b0000 (per-bit AND)
wire       w_logical = a && b;   // = 1'b1 (both nonzero → true)

My thinking: Single & operates on every bit independently, result same width as operands. Double && treats each operand as a boolean (any bit set = true), result always 1 bit. Mixing them up is a classic bug — especially with C/Java muscle memory.

🤝 We Do — The Conditional Mux

// 2:1 mux
assign y = sel ? a : b;

// 4:1 mux — fill in:
assign y = sel[1] ? ( sel[0] ? /* ? */ : /* ? */ )
                  : ( sel[0] ? /* ? */ : /* ? */ );

Answer: sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a). Selects: 00→a, 01→b, 10→c, 11→d. Yosys will build 3 muxes in a tree — 2 at the first level, 1 at the root.

[Leave blanks for 30 seconds] The 4:1 nested ternary is the pattern you'll use constantly. Each nested ?: is one mux. Yosys sees the tree and maps to iCE40 mux primitives.

🧪 You Do — Predict Operator Costs

For each expression on 8-bit buses, rank by iCE40 LUT cost (low/med/high):

assign x = a & b;
assign x = a + b;
assign x = a * b;
assign x = a << 3;
assign x = a << n; (where n is a 3-bit wire)
assign x = (a > 8'd100);

Ranking (cheap→expensive): (4) FREE · (1) ~8 LUTs · (6) ~4 LUTs · (2) 8 LUTs + 7 carry · (5) ~24 LUTs barrel · (3) ~140 LUTs multiply.

▶ LIVE DEMO

Operator Cost Showdown — `&`, `+`, `*`

~5 minutes · verify the You-Do ranking with Yosys

▸ COMMANDS

cd lecture_examples/week1_day02/d02_s2_ex2/
cat op_compare.v            # 3 modules, one assign each
make stat-ops               # synth all 3 → side-by-side
make sim                    # (warmup) mux from We Do still passes

▸ EXPECTED STDOUT (8-bit)

==== bitwise_and ====
   SB_LUT4    8
   SB_CARRY   0
==== adder ====
   SB_LUT4    8
   SB_CARRY   7
==== multiplier ====
   SB_LUT4   142
   SB_CARRY   10

▸ WHAT TO LOOK FOR

Same operand width, same one-line body — only the operator changes. & uses zero carry cells (parallel lanes). + needs a carry between bits. * blows up roughly N×: ~18× the LUTs of the adder. That ratio is the lesson; the numbers will match the recap table on the next slide.

[Terminal] cd to ex2. cat op_compare.v first — show that each module is one assign with one operator, so every LUT/carry is on the hook for that operator alone. Then make stat-ops — this is the showdown. Read the three blocks aloud: 8/0, 8/7, 142/10. Pause on the ~18× LUT ratio — that's the receipt. Optional: make sim as a warmup callback to We Do — the 4:1 mux still passes 16/16. Skip if running short. Tie back to the You-Do ranking: students predicted * would dominate — here's the proof in LUTs.

🔧 Recap — What the Tool Just Built

Same three 8-bit modules from make stat-ops, lined up:

Module	Body	SB_LUT4	SB_CARRY	Verdict
`bitwise_and`	`y = a & b;`	8	0	Cheap
`adder`	`y = a + b;`	8	7	Moderate
`multiplier`	`y = a * b;`	142	10	Expensive

Try it yourself: edit op_compare.v, bump WIDTH to 16 or 32, rerun make stat-ops. The AND scales linearly; the multiplier scales quadratically.

Mental math: The iCE40 HX1K has 1280 LUTs. A single 8×8 multiply eats ~11% of the chip. A 32×32 multiply eats… not possible on an HX1K without tricks.

🤖 Check the Machine

Ask AI: “Rank these on iCE40 LUT cost: a+b, a*b, a<<3, a<<n, a==b, all 16-bit.”

TASK

Ask for LUT cost ranking on 16-bit operands.

BEFORE

Predict: constant shift free → == cheap → + mod → barrel shift exp → * most.

AFTER

AI ordering usually correct. Absolute counts often 2× off — verify with Yosys.

TAKEAWAY

AI gives good ordinal rankings. Trust the ranking. Verify absolute numbers with stat.

Rule: For area-sensitive designs, Yosys stat is ground truth. AI estimates are useful for early rough sizing.

Key Takeaways

① Bitwise (&) = per-bit. Logical (&&) = 1-bit true/false.

② ? : is the mux. Nest for wider muxes.

③ Constant shifts are free. Variable shifts are expensive.

④ Multiply costs ~N² LUTs. Always check utilization.

Every operator has a hardware price tag. Read the receipts.

🔗 Transfer

Sized Literals & Width Matching

Video 3 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You just saw that 8+8 can cost 8 LUTs + 7 carry cells. But what's the bit-width of the result? 8? 9? Here's a puzzle: 4'd15 + 4'd1 gives 0, not 16 — unless you size the result correctly. Video 3 shows you why, and how to stop silent overflow bugs.

Operators

🌍 Where This Lives

In Industry

In This Course

⚠️ Syntax Does Not Equal Cost

❌ Wrong Model

✓ Right Model

🧱 Refresher — LUTs vs CARRYs

SB_LUT4 — the universal gate

SB_CARRY — the dedicated adder

The Logic Cell, Revisited

The Operator Cost Table

👁️ I Do — Bitwise vs Logical

🤝 We Do — The Conditional Mux

🧪 You Do — Predict Operator Costs

Operator Cost Showdown — &, +, *

🔧 Recap — What the Tool Just Built

🤖 Check the Machine

Key Takeaways

Sized Literals & Width Matching

Operator Cost Showdown — `&`, `+`, `*`