Video 2 of 4 · ~10 minutes
Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Every byte that has ever crossed a USB cable arrived one bit at a time. Every pixel on your monitor over HDMI: one bit at a time. Every NFC tap, Bluetooth handshake, SD card read, keyboard keystroke over USB-C. Wires are narrow and numbers are wide — somewhere on both ends, something has to take a parade of bits and reassemble them into a value.
Miss the alignment by a single cycle and 0x55 turns into 0xAA. An SD card looks empty when it isn't. A bootloader stares forever at a magic number that will never arrive. A pressure sensor reports −2,147,483,648 PSI. Off-by-one in a serial stream is the most common bring-up bug in any embedded product — and the most embarrassing, because the data is right there on the scope.
“Shifting is a software bit operation. data = data << 1 is the same in hardware.”
A shift register is an N-deep pipeline of flip-flops, each one's Q feeding the next one's D. On each clock edge, every flop loads its predecessor's value. The “shift” is physical — bits march through flops, one stage per clock cycle.
SB_DFF (~0.6% of the chip). Scales linearly with depth.
| Type | Input | Output | Use Case |
|---|---|---|---|
| SIPO | Serial (1 bit/cycle) | Parallel (N bits after N cycles) | UART RX, SPI MISO receive |
| PISO | Parallel (N bits loaded) | Serial (1 bit/cycle) | UART TX, SPI MOSI transmit |
| SISO | Serial | Serial (delayed) | Delay lines, FIFO stages |
| PIPO | Parallel | Parallel (shifted) | Barrel shifters, ALU shifts |
module shift_reg_sipo #(parameter WIDTH = 8) (
input wire i_clk, i_reset, i_shift_en, i_serial_in,
output reg [WIDTH-1:0] o_parallel_out
);
always @(posedge i_clk) begin
if (i_reset) o_parallel_out <= {WIDTH{1'b0}};
else if (i_shift_en) o_parallel_out <= {o_parallel_out[WIDTH-2:0], i_serial_in};
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// shift left, new bit at LSB
end
endmodule
{old_bits_shifted, new_bit} does the shift. Every cycle the enable is high, the oldest bit falls off the MSB and the new serial bit enters at LSB. After 8 enable-active cycles, the parallel output holds the received byte.
{o_parallel[N-2:0], i_serial_in} is literally wires — flop k's Q drives flop k+1's D, and the new serial bit drives flop 0. No combinational logic, no muxes; just a chain. That's why the synthesizer reports zero LUTs.
module shift_reg_piso #(parameter WIDTH = 8) (
input wire i_clk, i_reset, i_load, i_shift_en,
input wire [WIDTH-1:0] i_parallel_in,
output wire o_serial_out
);
reg [WIDTH-1:0] r_shift;
always @(posedge i_clk) begin
if (i_reset) r_shift <= {WIDTH{1'b0}};
else if (i_load) r_shift <= i_parallel_in; // load wins over shift
else if (i_shift_en) r_shift <= {1'b0, r_shift[WIDTH-1:1]}; // shift right
end
assign o_serial_out = r_shift[0]; // LSB first
endmodule
r_shift[0] is exposed as the serial output — LSB first, matching UART convention. After WIDTH shifts the register holds zeros; for UART you'd swap the reset / fill bit to 1'b1 (idle state).
i_parallel_in bit (on load cycles) and the right-neighbor's Q (on shift cycles). The mux is the price of being loadable. Output is just the LSB flop's Q tied directly to o_serial_out.
Starting o_parallel = 8'b00000000. Serial input stream: 1, 0, 1, 1, 0, 0, 1, 0 (one bit per cycle, enable=1 every cycle).
What is o_parallel after 8 cycles?
8'b10110010 = 8'hB2.
The first serial bit (1) ends up as MSB after 8 shifts. Last received bit (0) is LSB. Reading the result MSB-first gives the input stream in transmission order.
~4 minutes
▸ COMMANDS
cd lecture_examples/week2_day05/d05_s2_ex2/
make sim
make wave
make stat # 8× SB_DFF
▸ EXPECTED STDOUT
Cycle 1: in=1, par=00000001
Cycle 2: in=0, par=00000010
Cycle 3: in=1, par=00000101
...
Cycle 8: par=10110010
PASS: assembled 0xB2
▸ WAVEFORM — THE SIGNATURE SHAPE
In your waveform viewer (GTKWave, Surfer, ModelSim — pick your tool), add all 8 bits of o_parallel separately (not as a bus). Watch them “walk” to the left each clock edge — new bits enter at the right (LSB), old bits climb toward the MSB and fall off. That marching pattern is what a shift register looks like.
$ yosys -p "read_verilog shift_reg_sipo.v; synth_ice40 -top shift_reg_sipo; stat" -q
=== shift_reg_sipo ===
Number of wires: 14
Number of cells: 8
SB_CARRY 0
SB_DFFESR 8 ← 8 D-flops with enable + sync reset
SB_LUT4 0 ← zero combinational logic!
o_parallel[6:0] feeds back into bits [7:1], i_serial_in feeds bit [0]. The iCE40 DFFESR has enable and sync reset built in, so even the shift enable logic fits inside the flops. Peak efficiency for this pattern.
Ask AI: “Write a parameterized N-bit SIPO shift register that can also shift right, with a direction select input. Predict the LUT count on iCE40.”
TASK
Bidirectional parameterized SIPO.
BEFORE
Predict: N flops, ~N LUTs for direction mux (1 LUT per bit selecting neighbor).
AFTER
Good AI handles parameterization. Watch for wrong concatenation direction.
TAKEAWAY
Bidirectional adds a per-bit mux → doubles LUT count compared to unidirectional.
① Shift registers are pipelines of flops, one per bit.
② SIPO = serial receive. PISO = serial transmit. Every protocol pairs them.
③ Concatenation {old[N-2:0], new} = shift left by one.
④ N-bit shift register = N flops, zero LUTs on iCE40.
🔗 Transfer
Video 3 of 4 · ~10 minutes
▸ WHY THIS MATTERS NEXT
Serial data arrives from outside your chip — from a keyboard, sensor, or another SoC — with no relationship to your clock. Video 3 covers metastability: what happens when asynchronous signals hit a flip-flop at the wrong moment, and the 2-flop synchronizer pattern that fixes it. This is the most-ignored, highest-stakes concept in synchronous design.