Topic 5 · Counters, Shifters & Sync

Shift Registers

Video 2 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

Counter VariationsShift RegistersMetastabilityDebouncing

🌍 Where This Lives

Where it shows up

Every byte that has ever crossed a USB cable arrived one bit at a time. Every pixel on your monitor over HDMI: one bit at a time. Every NFC tap, Bluetooth handshake, SD card read, keyboard keystroke over USB-C. Wires are narrow and numbers are wide — somewhere on both ends, something has to take a parade of bits and reassemble them into a value.

When it goes wrong

Miss the alignment by a single cycle and 0x55 turns into 0xAA. An SD card looks empty when it isn't. A bootloader stares forever at a magic number that will never arrive. A pressure sensor reports −2,147,483,648 PSI. Off-by-one in a serial stream is the most common bring-up bug in any embedded product — and the most embarrassing, because the data is right there on the scope.

⚠️ A Shift Register Is a Pipeline of Flops

❌ Wrong Model

“Shifting is a software bit operation. data = data << 1 is the same in hardware.”

✓ Right Model

A shift register is an N-deep pipeline of flip-flops, each one's Q feeding the next one's D. On each clock edge, every flop loads its predecessor's value. The “shift” is physical — bits march through flops, one stage per clock cycle.

The receipt: An 8-bit shift register costs 8 flops. Not one shifter unit — 8 independent flops. On iCE40, 8× SB_DFF (~0.6% of the chip). Scales linearly with depth.

The Four Shift Register Types

TypeInputOutputUse Case
SIPOSerial (1 bit/cycle)Parallel (N bits after N cycles)UART RX, SPI MISO receive
PISOParallel (N bits loaded)Serial (1 bit/cycle)UART TX, SPI MOSI transmit
SISOSerialSerial (delayed)Delay lines, FIFO stages
PIPOParallelParallel (shifted)Barrel shifters, ALU shifts
SIPO and PISO are the workhorses — every serial protocol pairs one of each at the two endpoints.

👁️ I Do — SIPO (Serial-In, Parallel-Out)

module shift_reg_sipo #(parameter WIDTH = 8) (
    input  wire              i_clk, i_reset, i_shift_en, i_serial_in,
    output reg  [WIDTH-1:0]  o_parallel_out
);
    always @(posedge i_clk) begin
        if (i_reset)         o_parallel_out <= {WIDTH{1'b0}};
        else if (i_shift_en) o_parallel_out <= {o_parallel_out[WIDTH-2:0], i_serial_in};
        //                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        //                                      shift left, new bit at LSB
    end
endmodule
My thinking: Concatenation {old_bits_shifted, new_bit} does the shift. Every cycle the enable is high, the oldest bit falls off the MSB and the new serial bit enters at LSB. After 8 enable-active cycles, the parallel output holds the received byte.

🔧 SIPO — RTL View

SIPO RTL block diagram: chain of D flip-flops, each Q feeding next D, parallel taps off every flop, single serial input at bit 0
Read the picture: the concatenation {o_parallel[N-2:0], i_serial_in} is literally wires — flop k's Q drives flop k+1's D, and the new serial bit drives flop 0. No combinational logic, no muxes; just a chain. That's why the synthesizer reports zero LUTs.

🤝 We Do — PISO (Parallel-In, Serial-Out)

module shift_reg_piso #(parameter WIDTH = 8) (
    input  wire              i_clk, i_reset, i_load, i_shift_en,
    input  wire [WIDTH-1:0]  i_parallel_in,
    output wire              o_serial_out
);
    reg [WIDTH-1:0] r_shift;
    always @(posedge i_clk) begin
        if (i_reset)         r_shift <= {WIDTH{1'b0}};
        else if (i_load)     r_shift <= i_parallel_in;            // load wins over shift
        else if (i_shift_en) r_shift <= {1'b0, r_shift[WIDTH-1:1]}; // shift right
    end
    assign o_serial_out = r_shift[0];                              // LSB first
endmodule
Together: Load has priority over shift (you can't load and shift same cycle). r_shift[0] is exposed as the serial output — LSB first, matching UART convention. After WIDTH shifts the register holds zeros; for UART you'd swap the reset / fill bit to 1'b1 (idle state).

🔧 PISO — RTL View

PISO RTL block diagram: each bit has a 2:1 mux selecting between parallel-in load value and neighbor's Q, output is LSB of the register
Read the picture: every flop now has a small 2:1 mux at its D input — choosing between the i_parallel_in bit (on load cycles) and the right-neighbor's Q (on shift cycles). The mux is the price of being loadable. Output is just the LSB flop's Q tied directly to o_serial_out.

🧪 You Do — Trace the SIPO

Starting o_parallel = 8'b00000000. Serial input stream: 1, 0, 1, 1, 0, 0, 1, 0 (one bit per cycle, enable=1 every cycle).

What is o_parallel after 8 cycles?

Answer: 8'b10110010 = 8'hB2. The first serial bit (1) ends up as MSB after 8 shifts. Last received bit (0) is LSB. Reading the result MSB-first gives the input stream in transmission order.
▶ LIVE DEMO

Shift Register Waveform

~4 minutes

▸ COMMANDS

cd lecture_examples/week2_day05/d05_s2_ex2/
make sim
make wave
make stat   # 8× SB_DFF

▸ EXPECTED STDOUT

Cycle 1: in=1, par=00000001
Cycle 2: in=0, par=00000010
Cycle 3: in=1, par=00000101
...
Cycle 8: par=10110010
PASS: assembled 0xB2

▸ WAVEFORM — THE SIGNATURE SHAPE

In your waveform viewer (GTKWave, Surfer, ModelSim — pick your tool), add all 8 bits of o_parallel separately (not as a bus). Watch them “walk” to the left each clock edge — new bits enter at the right (LSB), old bits climb toward the MSB and fall off. That marching pattern is what a shift register looks like.

🔧 What Did the Tool Build?

$ yosys -p "read_verilog shift_reg_sipo.v; synth_ice40 -top shift_reg_sipo; stat" -q

=== shift_reg_sipo ===
   Number of wires:                 14
   Number of cells:                  8
     SB_CARRY                        0
     SB_DFFESR                       8    ← 8 D-flops with enable + sync reset
     SB_LUT4                         0    ← zero combinational logic!
Zero LUTs. The shift is pure wire routing — o_parallel[6:0] feeds back into bits [7:1], i_serial_in feeds bit [0]. The iCE40 DFFESR has enable and sync reset built in, so even the shift enable logic fits inside the flops. Peak efficiency for this pattern.

🤖 Check the Machine

Ask AI: “Write a parameterized N-bit SIPO shift register that can also shift right, with a direction select input. Predict the LUT count on iCE40.”

TASK

Bidirectional parameterized SIPO.

BEFORE

Predict: N flops, ~N LUTs for direction mux (1 LUT per bit selecting neighbor).

AFTER

Good AI handles parameterization. Watch for wrong concatenation direction.

TAKEAWAY

Bidirectional adds a per-bit mux → doubles LUT count compared to unidirectional.

Key Takeaways

 Shift registers are pipelines of flops, one per bit.

 SIPO = serial receive. PISO = serial transmit. Every protocol pairs them.

 Concatenation {old[N-2:0], new} = shift left by one.

 N-bit shift register = N flops, zero LUTs on iCE40.

Serial protocols are SIPO at one end and PISO at the other. Build both well.

🔗 Transfer

Metastability & Synchronizers

Video 3 of 4 · ~10 minutes

▸ WHY THIS MATTERS NEXT

Serial data arrives from outside your chip — from a keyboard, sensor, or another SoC — with no relationship to your clock. Video 3 covers metastability: what happens when asynchronous signals hit a flip-flop at the wrong moment, and the 2-flop synchronizer pattern that fixes it. This is the most-ignored, highest-stakes concept in synchronous design.