I have assembled my desktop PC about 2 years ago. It’s fairly beefy (AMD Ryzen 9 3950X 16-Core Processor, 128Go RAM, nVidia RTX 3080 Ti). It’s running debian stable.
Once in a while (not that often, but like every 2 weeks or so), seemingly at random times, not especially under heavy loads, the system crash and freeze, irresponsive to even the linux sysrq magic keys. I never manage to find what was the cause. One interesting fact is that when it happens, for some reason it seems to “freeze my network” too, ie, other (ethernet) devices on my local network have no connectivity anymore. They’re all connected to the same router, but not through this crashing PC. Connectivity comes back as soon as I force shutdown the crashing PC.
What can cause this and how could I fix these freezes?
Removed by mod
Oh this would explain why it kills the connectivity of all ethernet-connected devices. The ethernet interface is the one on the mobo. Drivers are included by the linux kernel AFAIK. The problem persisted across 2 debian versions so I am not sure re-installing drivers would do anything here. But thanks for the plausible explanation about the network issue!
You won’t have much luck with doing anything to the driver part of it, but you could try a custom kernel. There’s two advantages to that, one is it would be more recent than whatever kernel that Debian is using, and the second is the optimized networking stack, which speeds up processing of packets and improves the congestion handling algorithm. I’d recommend the Xanmod kernel for this: https://xanmod.org/
Alternatively, if we suspect your network is the culprit then the solution could be as simple a buying a new card and disabling the builtin one.
I like my debian vanilla but thanks for the suggestion. The other network card would be interesting to try out. I don’t really suspect the network card, since I have no idea whether the network block is a consequence or a cause here.
I can only offer some additional troubleshooting steps.
- Your network connection is fairly simple so I would suggest you take NM (NetworkManager) out of the equation and setup your network device manually to see if that eliminates your issue. This goes back to the comment (@despotic_machine) and log listing the p2p and wireless interfaces. Seems like the NM may be trying to setup your wifi interfaces. Though looking at the log you provided, it seems NM sees the wireless interface, identifies that it is not connected, and sets it to inactive. So, there may not be an issue. I had issues with NM many years ago on a laptop and preferred wicd; however, it seems that development has stalled on wicd. Regardless, I do not run NetworkManager at all on my desktop (just isc-dhcp-client and entry in /etc/networks/interfaces) since it is not roaming (plugged into a switch). It seems you don’t even need to uninstall anything, just setup the network manually and NM should leave the interface alone. If you want it to be clean, make sure NM is not running, or purge it from the system and setup your networking manually. The assumption of manual setup is based on the debian wiki:
https://wiki.debian.org/NetworkManager#Wired_Networks_are_Unmanaged
NOTE: Unless you know networking, this is probably going to take you down a networking rabbit hole, so glhf.
Some Debian references regarding networking and different configurations:
https://www.debian.org/doc/manuals/debian-reference/ch05.en.html
https://www.debian.org/doc/manuals/debian-handbook/sect.network-config-
If you want to stick with NM, it seems you can change the logging level to see if you get more details. I would check the man page or documentation for NM for instructions for debugging. I would expect that you can disable interfaces in NM to reduce the likelihood of some fringe case that is plaguing your setup. Since I don’t run NM, I can’t provide any detailed suggestions.
-
More of a question but is the switch or router also the same device for the last 2 years? Is it possible that the network device is misbehaving and causing the desktop to lock up? This would feed into @0v0 request to wireshark/tcpdump from a laptop or other device connected to the router/switch to see what’s going on traffic wise.
Have you tried running
tcpdump
/wireshark
on another device in the network when this happened?Nope, I don’t know the first thing about these tools, but now I’m kind of impatient and hope that the next freeze happens soon so I can try. :-)
deleted by creator
No, good old SSID+passphrase. But this PC is connected via ethernet (although the mobo does have a wifi chip, that I don’t use).
I have no idea, but it seems like interesting problem. Good luck finding a solution. (Just commenting to get notified of someone has a solution)
Uninstall (I don’t know how, on debian) NetworkManager and reinstall it (better get a .deb)
Then
sudo systemctl enable NetworkManager.service
Reboot and hope for the best.
This has been happening for 2 years, with the previous debian version too, so I doubt this would do anything?
Have you been updating or reinstalling ?
Parce que si c’est update sur update ça pourrait venir de là. Dans ce cas réinstalle peut etre ?
Updating. I’m willing to try your solution but I am a little bit worried about not being able to reinstall anything after I
sudo apt remove network-manager
. Why would a package reinstallation help? Wouldn’t resetting the config files be more efficient btw?EDIT: Ce n’est pas update sur update, y a juste eu bullseye (d’abord testing, puis stable), puis récemment je suis passé à bookworm. Mais le soucis est là depuis le début. Il est pas trop chiant parce que c’est rare, mais quand même ça m’enquiquine.
Thing is, I really haven’t used debian based distros for the better part of the last two years so I’m not sure how to reinstall it if something goes south. With arch you just have to do a pacstrap with a liveUSB.
So… it seems kinda dangerous if you don’t have a backup .deb. I’m not sure I would advise you to go this way.
I looked at your journalctl. The error might come from your wireless card. If that is the case, and since you don’t use it at all there is a simple trick :
sudo systemctl disable wpa_supplicant
then reboot.It won’t have any incidence on the ethernet but will somewhat disable your wifi card. (Not exactly but you get the gist of it).
If I’m right it should make all of your problems go away. It might be worth a try. And if it doesn’t work a simple
sudo systemctl enable wpa_supplicant
will reverse it back to the way it was.Ça demeure chiant, même si c’est pas quotidien.
If its been happening for multiple years and os’s, maybe your network card is dead/dying? Buy a new network card and see if that helps?
Everything is 2 yo, so this would mean the mobo (well, the onboard ethernet thing) was malfunctioning from the start. Maybe!
I might try disabling and using the onboard wifi chip temporarily instead, just to see if I notice a new freeze. The issue is, I’ve never understood what triggers it, and it’s quite rare (less than once a week), so it’s really annoying to debug…
Check your system logs such as
dmesg
andjournalctl
immediately after the freeze (if it’s still occurring). You could filter journalctl log to show, say the last 5 minutes since the last boot, like this:journalctl --boot=-1 --since="5 min ago" --priority=0..3
It happened yesterday, and here are the latest log lines before the freeze:
Sep 14 23:30:30 licorne NetworkManager[1291]: [1694727030.1207] device (wlp4s0): set-hw-addr: set MAC address to CA:D0:86:5F:F9:85 (scanning) Sep 14 23:30:30 licorne NetworkManager[1291]: [1694727030.1478] device (wlp4s0): supplicant interface state: inactive -> disconnected Sep 14 23:30:30 licorne NetworkManager[1291]: [1694727030.1478] device (p2p-dev-wlp4s0): supplicant management interface state: inactive -> disconnected Sep 14 23:30:30 licorne NetworkManager[1291]: [1694727030.1530] device (wlp4s0): supplicant interface state: disconnected -> inactive Sep 14 23:30:30 licorne NetworkManager[1291]: [1694727030.1530] device (p2p-dev-wlp4s0): supplicant management interface state: disconnected -> inactive Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 Sep 14 23:30:58 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:31:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Established secure connection to REDACTED1 at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10 Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Replacing old connection [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 with 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10 for REDACTED1 Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Connection to REDACTED1 at [::]:22000-192.168.0.14:22000/quic-client/TLS1.3-TLS_CHACHA20_POLY1305_SHA256/LAN-P20 closed: replacing connection Sep 14 23:31:11 licorne syncthing[3169286]: [VY2L4] INFO: Device REDACTED1 client is "syncthing v1.23.4" named "REDACTED2.lan" at 192.168.0.98:22000-192.168.0.14:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10 Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:32:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:33:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:33:28 licorne systemd[1]: Started anacron.service - Run anacron jobs. Sep 14 23:33:28 licorne anacron[4171587]: Anacron 2.3 started on 2023-09-14 Sep 14 23:33:28 licorne anacron[4171587]: Normal exit (0 jobs run) Sep 14 23:33:28 licorne systemd[1]: anacron.service: Deactivated successfully. Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:34:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:35:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:36:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:37:04 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:37:25 licorne NetworkManager[1291]: [1694727445.1045] device (wlp4s0): set-hw-addr: set MAC address to EE:65:E2:6E:73:D1 (scanning) Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users. Sep 14 23:38:03 licorne rtkit-daemon[1541]: Supervising 4 threads of 4 processes of 1 users.
Anything interesting in your logs?
See here: https://sh.itjust.works/comment/3302216
Do you by any chance use
flameshot
?Nope. May I ask what would be the connection with my issues?
I had a similar issue which seemed to pop up anytime from an hour to a couple days after I used flameshot. It took me a long time to figure out what was triggering it. I stopped using flameshot and the freezes stopped. I’ve mentioned this a couple other times to people who ended up having the same problems and fix. But if you aren’t using it, I don’t have anything else to suggest.
Is it possible that the freeze you’re seeing on that machine is actually caused by a network failure, rather than the other way around?
I have encountered many times what appears to be a system freeze which is actually the result of background processes trying to access a network resource which no longer exists(eg, mounted a disk via VPN connection, but the VPN has dropped out)
I think this is unlikely because it’s only this specific device that crashes, and the others are fine?