Topic 12 · UART RX · SPI · Integration

RX Oversampling

Video 1 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

16× oversampling is the standard across every UART implementation since the original 8250 UART chip (1970s). Intel's 8250, 16550A, and descendants all use 16×. FTDI USB-UART chips use 16×. ARM's PL011 UART uses 16×. Your FPGA vendor IP uses 16×. Once you learn this technique, you can reverse-engineer any UART RX in under 10 minutes. Also: 16× is the foundation of every clock-recovery scheme in modems, SerDes, and high-speed links.

In This Course

Topic 11 TX was straightforward — you controlled the timing. Today's RX is the hard part, and the “16× oversampling” trick is what makes it work. Once you have RX working (Video 2), you have full-duplex UART on your Go Board. Topic 12 Video 3 introduces SPI; Video 4 integrates everything.

⚠️ Receive Timing Is Not Your Timing

❌ Wrong Model

“For RX, I'll just reverse the TX: count CLKS_PER_BIT cycles, sample the line, move to the next bit.”

✓ Right Model

You don't know when the byte arrives. The transmitter's clock and yours drift independently up to ±2%. If you sample at CLKS_PER_BIT intervals starting from the detected start edge, by bit 8 your sample point has drifted well into the next bit. You need to sample in the middle of each bit, not at the edges, to tolerate drift. 16× oversampling gives you the machinery to do this.

The receipt: Without oversampling, ±2% clock mismatch gives you ~2% sampling error per bit × 10 bits = 20% drift by end of byte. You'll sample bit 8's value during bit 9's time. Byte decoded wrong.

👁️ I Do — The 16× Oversampling Scheme

   One bit time (CLKS_PER_BIT cycles)
   ├─────────────────────────────────────┤
    │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
    0 1 2 3 4 5 6 7 8 9 A B C D E F        ← 16 oversample slots

              ▲         ▲         
              │         │
            start      mid-bit sample point
            edge       (slot 8)   → this is the value

My thinking: Divide each bit time into 16 oversample slots. Trigger on the start-bit falling edge. From that edge, count 8 slots forward (middle of start bit) to verify it's still LOW. Then count 16 slots to reach the middle of bit 0. 16 more to middle of bit 1. And so on. Each bit is sampled at its midpoint, where drift is least likely to corrupt it.

Math: CLKS_PER_OSX = CLKS_PER_BIT / 16. At 25 MHz/115200 baud: 217/16 = 13.6 cycles per oversample slot. The FPGA counter counts up to 13 or 14 cycles per oversample.

🤝 We Do — Start-Bit Detection

// 1. Synchronize the async RX line
//    (2-FF sync — Topic 5 returns!)
reg r_rx_sync1, r_rx_sync2, r_rx_sync3;
always @(posedge i_clk) begin
    r_rx_sync1 <= i_rx;        // metastab
    r_rx_sync2 <= r_rx_sync1; // stable
    r_rx_sync3 <= r_rx_sync2; // 1 cycle earlier
end
wire falling_edge =
     r_rx_sync3 & ~r_rx_sync2;

RTL diagram: i_rx (orange, async) enters a chain of three D flip-flops r_rx_sync1, r_rx_sync2, r_rx_sync3 (purple), all clocked by i_clk (blue). An AND gate combines r_rx_sync3 with the inverted r_rx_sync2 to produce falling_edge (teal), a 1-cycle pulse on a 1→0 transition. r_rx_sync2 is also exported as the clean r_rx_sync used by the data-sampling logic.

Together: Synchronize async i_rx through 2 flops, plus a third flop for edge detection. falling_edge pulses for 1 cycle when the line transitions 1→0. RX FSM uses this to leave IDLE; the oversampler then counts to slot 8 to verify it wasn't a glitch.

🧪 You Do — Why 16 And Not 4 Or 64?

Why does UART RX use 16× oversampling? Why not 4× (cheaper) or 64× (more precise)?

Tradeoff analysis:

4×: Middle of bit = slot 2. Only 1 sample on each side of true middle → poor tolerance to clock drift. Fails at ±1% mismatch.
8×: Better, but still sensitive to start-bit detection jitter.
16×: Middle at slot 8. ±7 slots (~43% of bit time) of drift tolerance. Standard since 1970s.
64×: Excellent tolerance but 4× the counter width, 4× the clock speed requirement. Diminishing returns for UART.

Answer: 16× is the sweet spot for UART's ±2% tolerance requirement. Modems and SerDes use higher oversampling (8-128×) because their data rates leave less room for error.

▶ LIVE DEMO

Oversampling Decision Visualization

~4 minutes

▸ COMMANDS

cd lecture_examples/week3_day12/d12_s2_ex1/
python3 plot_sampling.py
# walks +0%..+10% drift,
# 4× / 8× / 16× / 64× side-by-side

▸ EXPECTED OUTPUT

Drift   4×   8×  16×  64×
+2.0%    9    9    9    9
+3.0%    8    9    9    9
+5.0%    5    7    8    9
+7.0%    3    5    6    6

(value = last bit sampled
 correctly in a 10-bit frame)

🤖 Check the Machine

Ask AI: “Design a UART RX module with 16× oversampling. Describe the counter hierarchy, start-bit validation logic, and sample-point calculation.”

TASK

AI describes 16× oversampling RX.

BEFORE

Predict: 2 counters (oversample + bit), 2-FF sync, start-bit revalidation at slot 8.

AFTER

Strong AI mentions glitch rejection via mid-start resample. Weak AI skips this.

TAKEAWAY

The mid-bit resample is what distinguishes robust RX designs from fragile ones.

Key Takeaways

① RX is harder than TX — you don't control the timing.

② 16× oversampling is the universal UART RX trick.

③ Sample each bit at its midpoint (slot 8 of 16).

④ 2-FF sync + edge detect + mid-bit revalidation = robust design.

Oversampling makes asynchronous communication reliable. It's the trick that makes UART work.

🔗 Transfer

RX Implementation

Video 2 of 4 · ~12 minutes

▸ WHY THIS MATTERS NEXT

You have the theory. Video 2 is the build: FSM states, oversample counter, sample-and-shift logic. Full working Verilog. End of Video 2: your Go Board echoes characters you type in your terminal. Full-duplex. Your RTL has become a conversation partner.