Files
talos-builder/patches/linux/0001-net-macb-flush-PCIe-posted-write-after-TSTART-doorbe.patch
T
lukaszraczylo 2ed45176d6 Add macb silent TX stall fix series to kernel build
Three-patch series targeting the BCM2712/RP1 (Raspberry Pi 5)
silent TX hang documented at:
  * https://github.com/cilium/cilium/issues/43198
  * https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877

  0001: flush PCIe posted write after TSTART doorbell
  0002: re-check ISR after IER re-enable in macb_tx_poll
  0003: add TX stall watchdog fallback for lost TCOMP

New patches live in patches/linux/ and are copied into
checkouts/pkgs/kernel/build/patches/ via a new 'patches-linux'
Makefile target, wired into the existing 'patches' aggregate.

Verified to apply cleanly against raspberrypi/linux @ f2f68e79f16f
(the ref pinned by the preceding commit).

Author of the patches: Lukasz Raczylo <lukasz@raczylo.com>.
2026-04-24 18:47:51 +01:00

71 lines
2.7 KiB
Diff

From 0000000000000000000000000000000000000001 Mon Sep 17 00:00:00 2001
From: Lukasz Raczylo <lukasz@raczylo.com>
Date: Fri, 24 Apr 2026 00:00:00 +0000
Subject: [PATCH 1/3] net: macb: flush PCIe posted write after TSTART doorbell
macb_start_xmit() and macb_tx_restart() both kick transmission by
OR-ing MACB_BIT(TSTART) into NCR. On PCIe-attached macb instances --
notably BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 -- the
doorbell write is a posted PCIe write that can sit in the fabric's
write queue until something drains it. A source-level comment at
the TSTART site already acknowledges the problem:
/* TSTART write might get dropped, so make the IRQ retrigger
* a buffer read */
and arms a recovery handshake via queue->tx_pending /
queue->txubr_pending that is picked up on the next TCOMP interrupt.
That recovery path only runs if a TCOMP interrupt actually fires;
if the lost doorbell means no TX starts, there is no TCOMP, and the
ring stalls silently.
Add a read-back of NCR after the TSTART write. The read serialises
the PCIe posted-write queue and ensures the doorbell reaches the MAC
before macb_start_xmit() / macb_tx_restart() return. The existing
'TSTART might get dropped' handshake is preserved as a safety net
for cases where the fabric genuinely drops the write despite the
read barrier, but with this barrier it should rarely if ever be
needed on PCIe-attached parts.
Observed to be the most common trigger for the silent TX stall
documented in the linked reports.
Link: https://github.com/cilium/cilium/issues/43198
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
---
drivers/net/ethernet/cadence/macb_main.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1949,6 +1949,13 @@
spin_lock(&bp->lock);
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+ /*
+ * Flush the PCIe posted-write queue so the TSTART doorbell
+ * reliably reaches the MAC. Without this, the write can sit
+ * in the fabric and the MAC never advances, causing a silent
+ * TX stall.
+ */
+ (void)macb_readl(bp, NCR);
spin_unlock(&bp->lock);
out_tx_ptr_unlock:
@@ -2630,6 +2637,11 @@
queue->tx_pending = 1;
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
+ /*
+ * Flush the PCIe posted-write queue; see the comment in
+ * macb_tx_restart() for the reasoning.
+ */
+ (void)macb_readl(bp, NCR);
spin_unlock(&bp->lock);
if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
--
2.44.0