Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Data TypesOperatorsSized Literals7-Seg Display
🌍 Where This Lives
In Industry
The first performance review on any RTL codebase is an operator audit: where are the multiplies? The dividers? The wide comparators? These drive area, timing, and power. Senior designers read code and see gates.
In This Course
Your Topic 3 ALU uses +, -, &, |. Topic 9 memory addressing uses comparators. Topic 11 UART uses reduction operators for parity. Every lab after today uses this vocabulary.
Industry alignment: “Write area-efficient RTL” appears on almost every FPGA/ASIC job posting. That literally means: know which operators are cheap and which are expensive. This video is step 1.
⚠️ Syntax Does Not Equal Cost
❌ Wrong Model
“One operator = one operation. a + b and a & b look the same, so they cost the same.”
✓ Right Model
Each operator has a hardware footprint. & on 32 bits = 32 LUTs. + on 32 bits = 32 LUTs + 32 carry cells (a ripple chain with real delay). * on 32 bits = hundreds of LUTs or a dedicated DSP block.
The receipt: An iCE40 HX1K has 1280 LUTs total. One 32-bit multiply can consume 10% of your chip. Two of them and you've spent more area on arithmetic than on your actual design.
🧱 Refresher — LUTs vs CARRYs
Topic 1 covered the LUT. Before the cost table, meet its neighbor in every iCE40 logic tile: the dedicated CARRY cell.
SB_LUT4 — the universal gate
4 inputs → any 1-bit Boolean function
Configurable: 16 bits of memory pick the function
Used for: bitwise, logical, mux, reduction, comparators
SB_CARRY — the dedicated adder
1 cell per bit, chained directly to its neighbor
Not configurable — only does carry-propagate
Used for: +, -, <, >, magnitude compares
Why the split matters: One iCE40 logic tile = 1 LUT4 + 1 CARRY + 1 FF. The LUT4 generates the sum bit; the CARRY propagates the carry bit on a dedicated wire that bypasses the routing fabric. That's why a + b shows up as both SB_LUT4 and SB_CARRY in yosys stat — and why a bitwise a & b shows zero CARRYs: no carry to propagate.
The Logic Cell, Revisited
Topic 1's tile picture had a LUT and a flip-flop. Here it is again with the CARRY cell drawn in — so 1280 LCs really means 1280 LUTs and 1280 CARRYs.
1:1 pairing, separate budgets. Each LC has one LUT4 and one CARRY hardwired together; the CARRYs in a column form a fixed top-to-bottom chain. They're counted separately because they're separate primitives — a & b burns LUTs only, a + b burns both, and a long adder can run out of contiguous CARRY chain before it runs out of LUTs.
The Operator Cost Table
Category
Operators
Shape & Result Width
Example
iCE40 Cost
Bitwise
&|^~
Binary, per-bit; result = operand width
a & b
Cheap
Logical
&&||!
Binary, whole-operand boolean; result = 1 bit
a && b
Cheap
Reduction (unary)
&|^
Unary, folds across all bits; result = 1 bit
&a
Cheap
Arithmetic +/-
+-
Binary, per-bit + carry chain
a + b
Moderate
Arithmetic *
*
Binary, full multiplier
a * b
Expensive
Relational
==<>
Binary, compare; result = 1 bit
a > b
Moderate
Shift (constant)
<< 3
Unary w/ literal; pure rewiring
a << 3
Free
Shift (variable)
<< n
Binary; needs full barrel shifter
a << n
Expensive
Conditional
? :
Ternary; selects between two values
sel ? a : b
Cheap
Bitwise
Logical
Reduction
Add / Sub
Multiply
Relational
Shift <<3
Shift <<n
Conditional
👁️ I Do — Bitwise vs Logical
wire [3:0] a = 4'b1010;
wire [3:0] b = 4'b0101;
wire [3:0] w_bitwise = a & b; // = 4'b0000 (per-bit AND)
wire w_logical = a && b; // = 1'b1 (both nonzero → true)
My thinking: Single & operates on every bit independently, result same width as operands. Double && treats each operand as a boolean (any bit set = true), result always 1 bit. Mixing them up is a classic bug — especially with C/Java muscle memory.
🤝 We Do — The Conditional Mux
// 2:1 mux
assign y = sel ? a : b;
// 4:1 mux — fill in:
assign y = sel[1] ? ( sel[0] ? /* ? */ : /* ? */ )
: ( sel[0] ? /* ? */ : /* ? */ );
Answer:sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a). Selects: 00→a, 01→b, 10→c, 11→d. Yosys will build 3 muxes in a tree — 2 at the first level, 1 at the root.
🧪 You Do — Predict Operator Costs
For each expression on 8-bit buses, rank by iCE40 LUT cost (low/med/high):
cd lecture_examples/week1_day02/d02_s2_ex2/
cat op_compare.v # 3 modules, one assign each
make stat-ops # synth all 3 → side-by-side
make sim # (warmup) mux from We Do still passes
Same operand width, same one-line body — only the operator changes. & uses zero carry cells (parallel lanes). + needs a carry between bits. * blows up roughly N×: ~18× the LUTs of the adder. That ratio is the lesson; the numbers will match the recap table on the next slide.
🔧 Recap — What the Tool Just Built
Same three 8-bit modules from make stat-ops, lined up:
Module
Body
SB_LUT4
SB_CARRY
Verdict
bitwise_and
y = a & b;
8
0
Cheap
adder
y = a + b;
8
7
Moderate
multiplier
y = a * b;
142
10
Expensive
Try it yourself: edit op_compare.v, bump WIDTH to 16 or 32, rerun make stat-ops. The AND scales linearly; the multiplier scales quadratically.
Mental math: The iCE40 HX1K has 1280 LUTs. A single 8×8 multiply eats ~11% of the chip. A 32×32 multiply eats… not possible on an HX1K without tricks.
🤖 Check the Machine
Ask AI: “Rank these on iCE40 LUT cost: a+b, a*b, a<<3, a<<n, a==b, all 16-bit.”
Every operator has a hardware price tag. Read the receipts.
🔗 Transfer
Sized Literals & Width Matching
Video 3 of 4 · ~8 minutes
▸ WHY THIS MATTERS NEXT
You just saw that 8+8 can cost 8 LUTs + 7 carry cells. But what's the bit-width of the result? 8? 9? Here's a puzzle: 4'd15 + 4'd1 gives 0, not 16 — unless you size the result correctly. Video 3 shows you why, and how to stop silent overflow bugs.