• Kissaki@programming.dev
    link
    fedilink
    English
    arrow-up
    24
    ·
    edit-2
    2 months ago

    It’s a systematic multi-layered problem.

    The simplest, least effort thing that could have prevented the scale of issues is not automatically installing updates, but waiting four days and triggering it afterwards if no issues.

    Automatically forwarding updates is also forwarding risk. The higher the impact area, the more worth it safe-guards are.

    Testing/Staging or partial successive rollouts could have also mitigated a large number of issues, but requires more investment.

    • wizardbeard@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      11
      ·
      2 months ago

      The update that crashed things was an anti-malware definitions update, Crowdstrike offers no way to delay or stage them (they are downloaded automatically as soon as they are available), and there’s good reason for not wanting to delay definition updates as it leaves you vulnerable to known malware longer.

    • Gestrid@lemmy.ca
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      Four days for an update to malware definitions is how computers get infected with malware. But you’re right that they should at least do some sort of simple test. “Does the machine boot, and are its files not getting overzealously deleted?”

      • Kissaki@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 months ago

        One of the fixes was deleting a sysm32 driver file. Is a Windows driver how they update definitions?

        • Gestrid@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          2 months ago

          The driver was one installed on the computer by the security company. The driver would look for and block threats incoming via the internet or intranet.

          The definitions update included a driver update, and most of the computers the software was used on were configured to automatically restarted to install the update. Unfortunately, the faulty driver update caused computers to BSOD and enter a boot loop.

          Because of the boot loop, the driver could only be removed manually by entering Safe Mode. (That’s the thing you saw about deleting that file.) Then the updated driver, the one they released when they discovered the bug, would ideally be able to be installed normally after exiting Safe Mode.