mirror of
https://github.com/lukaszraczylo/talos-builder.git
synced 2026-06-14 02:11:38 +00:00
2ed45176d6
Three-patch series targeting the BCM2712/RP1 (Raspberry Pi 5) silent TX hang documented at: * https://github.com/cilium/cilium/issues/43198 * https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 0001: flush PCIe posted write after TSTART doorbell 0002: re-check ISR after IER re-enable in macb_tx_poll 0003: add TX stall watchdog fallback for lost TCOMP New patches live in patches/linux/ and are copied into checkouts/pkgs/kernel/build/patches/ via a new 'patches-linux' Makefile target, wired into the existing 'patches' aggregate. Verified to apply cleanly against raspberrypi/linux @ f2f68e79f16f (the ref pinned by the preceding commit). Author of the patches: Lukasz Raczylo <lukasz@raczylo.com>.
71 lines
2.7 KiB
Diff
71 lines
2.7 KiB
Diff
From 0000000000000000000000000000000000000001 Mon Sep 17 00:00:00 2001
|
|
From: Lukasz Raczylo <lukasz@raczylo.com>
|
|
Date: Fri, 24 Apr 2026 00:00:00 +0000
|
|
Subject: [PATCH 1/3] net: macb: flush PCIe posted write after TSTART doorbell
|
|
|
|
macb_start_xmit() and macb_tx_restart() both kick transmission by
|
|
OR-ing MACB_BIT(TSTART) into NCR. On PCIe-attached macb instances --
|
|
notably BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 -- the
|
|
doorbell write is a posted PCIe write that can sit in the fabric's
|
|
write queue until something drains it. A source-level comment at
|
|
the TSTART site already acknowledges the problem:
|
|
|
|
/* TSTART write might get dropped, so make the IRQ retrigger
|
|
* a buffer read */
|
|
|
|
and arms a recovery handshake via queue->tx_pending /
|
|
queue->txubr_pending that is picked up on the next TCOMP interrupt.
|
|
That recovery path only runs if a TCOMP interrupt actually fires;
|
|
if the lost doorbell means no TX starts, there is no TCOMP, and the
|
|
ring stalls silently.
|
|
|
|
Add a read-back of NCR after the TSTART write. The read serialises
|
|
the PCIe posted-write queue and ensures the doorbell reaches the MAC
|
|
before macb_start_xmit() / macb_tx_restart() return. The existing
|
|
'TSTART might get dropped' handshake is preserved as a safety net
|
|
for cases where the fabric genuinely drops the write despite the
|
|
read barrier, but with this barrier it should rarely if ever be
|
|
needed on PCIe-attached parts.
|
|
|
|
Observed to be the most common trigger for the silent TX stall
|
|
documented in the linked reports.
|
|
|
|
Link: https://github.com/cilium/cilium/issues/43198
|
|
Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877
|
|
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
|
|
---
|
|
drivers/net/ethernet/cadence/macb_main.c | 14 ++++++++++++++
|
|
1 file changed, 14 insertions(+)
|
|
|
|
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
|
|
--- a/drivers/net/ethernet/cadence/macb_main.c
|
|
+++ b/drivers/net/ethernet/cadence/macb_main.c
|
|
@@ -1949,6 +1949,13 @@
|
|
|
|
spin_lock(&bp->lock);
|
|
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
|
|
+ /*
|
|
+ * Flush the PCIe posted-write queue so the TSTART doorbell
|
|
+ * reliably reaches the MAC. Without this, the write can sit
|
|
+ * in the fabric and the MAC never advances, causing a silent
|
|
+ * TX stall.
|
|
+ */
|
|
+ (void)macb_readl(bp, NCR);
|
|
spin_unlock(&bp->lock);
|
|
|
|
out_tx_ptr_unlock:
|
|
@@ -2630,6 +2637,11 @@
|
|
queue->tx_pending = 1;
|
|
|
|
macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
|
|
+ /*
|
|
+ * Flush the PCIe posted-write queue; see the comment in
|
|
+ * macb_tx_restart() for the reasoning.
|
|
+ */
|
|
+ (void)macb_readl(bp, NCR);
|
|
spin_unlock(&bp->lock);
|
|
|
|
if (CIRC_SPACE(queue->tx_head, queue->tx_tail, bp->tx_ring_size) < 1)
|
|
--
|
|
2.44.0
|