minimal has quit [Quit: Leaving]
<jakllsch> djfe: hmm, i wonder if that'd solve the wifi bridge performance I've got on that unit
<owrt-images-builds> Build [#297](https://buildbot.openwrt.org/images/#/builders/62/builds/297) of `main_layerscape/armv8_64b` failed: failed Packages built (failure)
<owrt-images-builds> Build [#296](https://buildbot.openwrt.org/images/#/builders/24/builds/296) of `main_mvebu/cortexa53` completed successfully.
<owrt-images-builds> Build [#298](https://buildbot.openwrt.org/images/#/builders/156/builds/298) of `main_layerscape/armv7` failed: failed Packages built (failure)
lucascastro has quit []
schwicht has joined #openwrt-devel
schwicht has quit [Ping timeout: 480 seconds]
goliath has joined #openwrt-devel
valku has quit [Quit: valku]
<nbd> f00b4r0: i have a patch for testing: https://nbd.name/p/b5449c02
schwicht has joined #openwrt-devel
schwicht has quit [Ping timeout: 480 seconds]
warpme has joined #openwrt-devel
<f00b4r0> nbd: building kmod, will install and report
<f00b4r0> nbd: I'm seeing hostapd eating 25% cpu constantly now, is that expected?
<f00b4r0> nothing in logs
<f00b4r0> it's totally unresponsive to ubus calls (e.g. get_clients times out)
<f00b4r0> hmm not all aps are up: only 1 instead of 3 on 5GHz. Something is broken here.
<f00b4r0> nbd: so I've deployed on 3 devices: router + 2 dumb APs, all same hw. The two dumb APs are now unusable for wifi: hostapd pegged and unresponsive. 2 out of 3 APs missing on 5GHz.
<f00b4r0> iwinfo assoclist shows only one client connected on one of the 2.4G aps, on each device, and it's one of the infamous tuya systems
<f00b4r0> it seems to be the root cause because the router is not affected (all phy-aps present on both bands, everything normal), and it doesn't have any tuya device associated
<f00b4r0> I can't keep this running for very long (location is disturbed) so I'll revert if I don't hear from you shortly
<f00b4r0> wifi up/down also not working
<f00b4r0> killing and restarting hostapd seems to have done the trick
<f00b4r0> + wifi down / up
<f00b4r0> looks like some sort of deadlock
n3ph has joined #openwrt-devel
<f00b4r0> nbd: fwiw both dumb APs have 2 tuya devices "harrassing" them. is it possible that there's some race in the code triggered by the second one? When hung, only one device was associated.
rua has quit [Quit: Leaving.]
warpme has quit []
rsalvaterra has quit []
n3ph has quit [Ping timeout: 480 seconds]
rsalvaterra has joined #openwrt-devel
danitool has joined #openwrt-devel
rmilecki has joined #openwrt-devel
schwicht has joined #openwrt-devel
schwicht has quit [Ping timeout: 480 seconds]
schwicht has joined #openwrt-devel
rua has joined #openwrt-devel
schwicht has quit [Ping timeout: 480 seconds]
warpme has joined #openwrt-devel
SwedeMike has quit [Quit: leaving]
warpme has quit []
KGB-2 has quit [Remote host closed the connection]
KGB-2 has joined #openwrt-devel
warpme has joined #openwrt-devel
valku has joined #openwrt-devel
rua has quit [Remote host closed the connection]
rua has joined #openwrt-devel
rua is now known as Guest24394
rua has joined #openwrt-devel
Guest24394 has quit [Remote host closed the connection]
<f00b4r0> nbd: patch introduces a different bug:
<f00b4r0> Mon Aug 18 12:18:26 2025 daemon.info hostapd: phy0-ap0: STA c4:82:e1:ad:ef:8e IEEE 802.11: deauthenticated due to local deauth request
<f00b4r0> Mon Aug 18 12:18:27 2025 kern.err kernel: [ 6680.240293] mt7915e 0000:02:00.0: Retry message 00005aed (seq 2)
<f00b4r0> Mon Aug 18 12:19:25 2025 kern.err kernel: [ 6738.179123] rcu: INFO: rcu_sched self-detected stall on CPU
<f00b4r0> Mon Aug 18 12:18:29 2025 kern.err kernel: [ 6682.320259] mt7915e 0000:02:00.0: Message 00005aed (seq 2) timeout
<f00b4r0> ha no, i didn't scroll up enough. The stack trace looks a bit different. Let me dump this
castiel652 has joined #openwrt-devel
rua1 has joined #openwrt-devel
rua is now known as Guest24395
rua1 is now known as rua
Guest24395 has quit [Ping timeout: 480 seconds]
<f00b4r0> nbd: decoded call trace: https://pastecode.io/s/98r96y9i
<f00b4r0> I'm definitely writing a script to automate that ;P
rua is now known as Guest24398
rua has joined #openwrt-devel
Guest24398 has quit [Quit: Leaving.]
warpme has quit []
castiel652 has quit [Quit: castiel652]
plappermaul has joined #openwrt-devel
nixuser has quit [Read error: Connection reset by peer]
nixuser has joined #openwrt-devel
n3ph has joined #openwrt-devel
n3ph has quit [Ping timeout: 480 seconds]
n3ph has joined #openwrt-devel
goliath has quit [Quit: SIGSEGV]
n3ph has quit [Ping timeout: 480 seconds]
csharper2005 has joined #openwrt-devel
csharper2005 has left #openwrt-devel [#openwrt-devel]
schwicht has joined #openwrt-devel
schwicht has quit [Ping timeout: 480 seconds]
goliath has joined #openwrt-devel
n3ph has joined #openwrt-devel
<nbd> f00b4r0: i think your call trace decoding is not accurate
<f00b4r0> i ran it from the build artifacts
<nbd> f00b4r0: when i checked the .ko files, the .text section had an offset of 0x60 or something like that
<f00b4r0> this has your patch applied
<nbd> regardless of the patch
<nbd> the 0x60 offset needs to be accounted for when dealing with offsets
<f00b4r0> oh you mean the module trace are off by 0x60?
<f00b4r0> kernel is good tho?
<nbd> yes
<nbd> kernel is ELF with fixed load address
<nbd> module is dynamic and starts with note sections instead of .text
<f00b4r0> I'm confused, I thought the value reported in /proc/module was the load address for the module
<f00b4r0> oic
<nbd> and those add up to 0x60, iirc
<f00b4r0> let me redo the traces then
<f00b4r0> (definitely need to script that, *sigh*)
<dwfreed> Ideally there'd be a System.map in the build artifacts
<f00b4r0> so just to be perfectly clear: l *(0x83858aec-0x83850000) becomes l *(0x83858aec-0x83850000+0x60), right?
<nbd> f00b4r0: right
<f00b4r0> dwfreed: ideally ;P
<nbd> dwfreed: System.map is useless here
<nbd> since we're dealing with kernel modules with dynamic load address
<f00b4r0> nbd: for my education, how did you get the 0x60 value? readelf?
<nbd> yes
<f00b4r0> (I'll need to script that part too)
<f00b4r0> ok
<f00b4r0> it seems to make a little more "sense" indeed
n3ph has quit [Ping timeout: 480 seconds]
<nbd> btw. one way to get even better information is to use the kernel faddr2line script - use gdb to resolve the nearest function, calculate the offset and then do faddr2line mt7996e.ko func+offset
<nbd> since it shows the callsites of inline functions where gdb only shows the inline functions themselves
<f00b4r0> I'll take a look
<f00b4r0> btw readelf shows this for mt7915e: [ 3] .text PROGBITS 00000000 000090 0184c0 00 AX 0 0 16
<f00b4r0> Off is 0x90
<f00b4r0> so I suspect the decode is still wrong?
<f00b4r0> dwfreed: marginally. It needs symbol names to work
<dwfreed> so all your script would need to do is to convert raw addresses to symbol names + addresses
<f00b4r0> no
<f00b4r0> because we also use out of tree modules (e.g. mt76)
<f00b4r0> which quite complicates the dance
<f00b4r0> if it were that simple, it woudln't be any fun, would it? ;P
<f00b4r0> nbd: should I redo the trace again with 0x90 offset or did I misunderstand the output of readelf?
<dwfreed> why does it matter if the module is out of tree or not? as long as it has access to the module file, addr2line should work just fine?
<f00b4r0> dwfreed: the script wants all modules under a common path. That's not what we have.
<dwfreed> well, if you've got a full build tree you do, just not where you'd originally expect it :)
<dwfreed> s/build/built/
<nbd> f00b4r0: i guess 0x90 is the right offset in your case
<f00b4r0> *sigh*
<f00b4r0> 3rd time's the charm
Fijxu has quit [Quit: XD!!]
Fijxu has joined #openwrt-devel
killgufo- has joined #openwrt-devel
<f00b4r0> I see traces hitting end scope brackets, somehow that doesn't look right
<nbd> hm, weird
<nbd> i did however find some limitations in my previous patch and i have something new for testing: https://nbd.name/p/33b79944
<f00b4r0> is that on top or supersedes?
<nbd> supersedes
<f00b4r0> ok
killgufo has quit [Ping timeout: 480 seconds]
killgufo- is now known as killgufo
<f00b4r0> hmm this is awkward
<f00b4r0> I rebooted the hosed device but I can't log in anymore
<f00b4r0> I suspect it's hung worse than the first time
<f00b4r0> I'm in. Upgrading kmod and rebooting
<f00b4r0> nbd: seemingly no more hostapd hung at boot. Will report back.
n3ph has joined #openwrt-devel
plappermaul has quit [Remote host closed the connection]
swalker has quit [Ping timeout: 480 seconds]
swalker has joined #openwrt-devel
madwoota has quit [Remote host closed the connection]
madwoota has joined #openwrt-devel
madwoota has quit [Remote host closed the connection]
madwoota has joined #openwrt-devel
n3ph has quit [Ping timeout: 480 seconds]
<f00b4r0> and we have a crash
<f00b4r0> I'm exhausted, I'll do the trace dance tomorrow but if I had to guess, I'd say it's the exact same crash only shifted by your code changes
<f00b4r0> at least we have a relatively quick reproducer
<f00b4r0> nbd: https://pastecode.io/s/r4ukz67f - out of order but that shouldn't be a problem. AFAICT, exact same crash.
Edu4rdSHL has quit [Quit: Leaving]
* f00b4r0 beams off
AtomiclyCursed2 has quit [Quit: ZNC 1.10.1 - https://znc.in]
AtomiclyCursed2 has joined #openwrt-devel
<nbd> f00b4r0: i think i need to clear some more lists in my patch
<nbd> f00b4r0: new patch, full replacement as before: https://nbd.name/p/3b3f7682
ptudor_ has joined #openwrt-devel
ptudor has quit [Ping timeout: 480 seconds]
schwicht has joined #openwrt-devel
n3ph has joined #openwrt-devel
goliath has quit [Quit: SIGSEGV]
schwicht has quit [Ping timeout: 480 seconds]
schwicht has joined #openwrt-devel
Edu4rdSHL has joined #openwrt-devel
Edu4rdSHL has quit [Quit: ZNC 1.10.1 - https://znc.in]
Edu4rdSHL has joined #openwrt-devel
cmonroe has quit [Quit: Textual IRC Client: www.textualapp.com]
Edu4rdSHL is now known as Edu4rdSHL
Edu4rdSHL has left #openwrt-devel [WeeChat 4.7.0]
Edu4rdSHL has joined #openwrt-devel
cmonroe has joined #openwrt-devel
Edu4rdSHL has quit [Quit: ZNC 1.10.1 - https://znc.in]
Edu4rdSHL has joined #openwrt-devel
Tapper has quit [Quit: Tapper]
valku has quit [Quit: valku]
SlimeyX has joined #openwrt-devel
SlimeyX_ has joined #openwrt-devel
SlimeyX has quit [Ping timeout: 480 seconds]
SlimeyX_ is now known as SlimeyX
danitool has quit [Quit: Cubum autem in duos cubos, aut quadratoquadratum in duos quadratoquadratos]