(this is a repost of my thoughts originally posted in a Twitter thread)
The setup behind the CVE-2024-3094 supply-chain attack is fascinating. I originally wanted to finish and share a tool to audit other OSS projects for anomalous contributor behavior, but I feel what I found trying to MVP it is way more interesting.
If you haven't, please read the full @Openwall mailing list disclosure. The first advisory summary a friend shared with me had such a high-level overview that I feel I initially grossly underestimated the level of sophistication of this attack. (link)
Hackers tend to be lazy. When I heard "fake identity", I was thinking automation of "grammar fix" OSS contributions on many fake identities, farming activity on projects, and only after the identity met a threshold would an attacker even assess it for reputational value.
It still seemed unlikely to fool project maintainers though. Even with newer technologies like ChatGPT, I thought this would need to be done on a scale that would leave some identifiable patterns in activity.. Then I started to read the full original disclosure.
Suddenly I had a lot of questions. Why did sshd/OpenSSH load xz-utils if OpenSSH doesn't depend on it? As I understand now, official OpenSSH does not but linux distro packages often patch it to support systemd, which does. (still not 100% - please correct me if I am wrong!)
My thought then was to audit other projects for anomalous contributor behavior - especially ones that may have been an "unclear" dependency. But I was still confident the agg. stats of the backdoor commit author's git contributions would have patterns of automation too.
I started manually auditing the xz repo. Another surprise was reading the test file README in xz:
"Many of the files have been created by hand with a hex editor, thus there is no better "source code" than the files themselves."
With hindsight of the test file backdoor... 😅
When I looked for commits in other related projects adding new binary files, my first hit was a test fixture binary in zstd - also a compression lib too!
The same commit also had automation to regenerate and detect the file changing. (link)
I don't think this is by any means the single/most important factor that lead to the attack, but I did want to show them in contrast to at least highlight that there is a better way of doing this, and that CI/test infra hygiene is worth continuously reviewing and bettering.
I'm now so emotionally invested I want to start script something. I iterated by auditing a few lower-level library projects, and adding new ideas as they came to me. I was also very eager to (and honestly, way, way too late...) start testing my script in the xz-utils repo!
I spent way too much time keeping it as a one-liner, but I now had something to find each binary file in a repo, the commit author who last modified it and agg. git stats, recursive-extract binwalk / strings it, and print an (ugly) plaintext report. (link)
But as I was running my MVP script in the xz-utils repo, I realized that if this user was a 'fake identity' as suspected, the creator had been anything but lazy. This is by far the most work/time/persistence I've seen go into an attack that anyone can follow chronologically
Factoring in the lack of any other online web presence, as of now I would be incredibly surprised to learn this account was not created by the backdoor commit author, most likely with xz as a target to try to infiltrate.
Regardless, they have had a very busy past few years!
Since their first commit in Jan 2022, they have authored a total of 451 commits in xz-utils main banch. That's 19% of all main branch commits in just over 2 years. The project's first commit (when it migrated to git) was over 16 years ago!
The other contributor has authored 76% of commits, incl. the first. So between them, 95% of all commits.
But their GH account was created in 2021! Before working on xz, they ... tried to make libarchive auto-download combinations of dependencies that didn't make sense 🤔
From 2022 though a focus on xz-utils, even representing it in other projects! In Google's oss-fuzz, they disabled the same compiler feature their backdoor uses to intercept execution. And then changed the primary contact, so any bugs it did manage to find went to them...🤔
I'll stop with all the 🤔 sorry 🙇♀️
Once they decided they were ready to launch their backdoor, they still checked every detail carefully. They injected their code by a mix of an unclear-but-uninteresting build script addition that descrambled the test files in the project.
While that would prevent tools like binwalk from properly identifying the machine code it contained, they went so far as to make their scrambled backdoor test file have similar artifacts to others. Shown is a diff of the strings between the backdoor and another test file.
Last one! When I was skimming the binwalk outputs, I thought I ID'd another backdoor payload when it found an an xz-compressed x86 binary with a different name. Turns out this has been there since 2009, with the context explained in the git commit message.
I wondered why risk adding 2 new test files that didn't even get used. The disclosure actually mentioned 5.6.0 and 5.6.1 being vulnerable with different payloads. This is how the original backdoor payloads were added... hiding in plain sight 🥲
The repo is currently unavailable, but in an earlier PR I found on web archive they were merging their own changes without review as early as 2023. I unfortunately couldn't get the PRs for the backdoors, but I wonder if that PR had any review at all (link)
At the time of writing they are even still listed as co-maintainer on the sponsoring project's website too. My point isn't to goof on the project, but rather to highlight the level of trust and access they achieved while infiltrating the project. (link)
I wonder how many other high-effort "fake identities" are still in the infiltration stage, building trust with maintainers of other quiet or older projects that are a valuable target for attackers but aren't necessarily understood as one.
If the injected code was more conservative selecting targets and didn't have a performance impact so significant that someone who (in their own words) "is not a security researcher/engineer" began to investigate, how long could this have gone undetected?
It feels very lucky that it was discovered at the stage it was. I hope with this attack on people's minds, other OSS projects in similar positions consider doing tabletop scenario exercises for this kind of attack and how they can prevent/detect it. Thanks for reading!