moopi
4.3K posts

moopi
@moopidoopi
I write about how to program your life, leverage technology, and making time to play.


On Tuesday morning my dependency audit caught Axios. Axios. 300 million weekly downloads. The HTTP library in every JavaScript project since 2016. The one nobody audits because auditing Axios is like auditing gravity. It was there before you got hired. I am a security engineer at a company that runs 14,000 npm packages in production. I know the number because I counted them last year. I do not know what most of them do. Nobody does. My audit runs every Tuesday morning. It takes eleven minutes. Eleven minutes is the only thing between us and whatever is in those packages. Most weeks it catches nothing. Most weeks I call that a clean bill of health. My audit runs every Tuesday morning. It takes eleven minutes. The malicious versions had been live on npm for hours. Not days. Hours. They dropped a remote access trojan. Not a sophisticated one. Not a nation-state zero-day. A trojan. In Axios. It just needs to be in the right package. Axios is in every package. I reported it to our incident response team at 9:14 AM. By 9:16 AM I had confirmation we'd pulled the affected version. By 9:23 AM I learned that our staging environment had already installed it. Automatically. At 6:07 PM. Monday evening. While everyone was going home. Here is what happened at 6:07 PM on Monday. Our dependency bot checked for updates. The bot is called Renovate. The bot runs after work hours. It runs after work hours because running it during business hours slows down CI for the engineers. So we moved it to 6 PM. When nobody is watching. The bot found a new version of Axios. The bot opened a pull request. The pull request was auto-merged because Axios is on our trusted list. I approved the trusted list. Eight months ago. I reviewed it for about as long as I review the 14,000 packages. Axios is on the list because it has 300 million weekly downloads. 300 million weekly downloads means it's safe. Except when it isn't. At 6:08 PM the CI pipeline ran. All tests passed. The tests passed because the trojan doesn't break tests. The trojan breaks trust. Trust is not a test case. At 6:08 PM the deployment pipeline triggered. It deployed to staging-east-2. At 6:09 PM the trojan phoned home. At 6:11 PM it began beaconing to a command server. At 6:14 PM it began enumerating environment variables. At 6:15 PM it found the database credentials. At 6:16 PM it found the API keys. All of them. At 6:18 PM it found the Stripe production token. There are 2.4 million customer records behind that token. At 6:19 PM it found the treasury wallet private keys. We process crypto payouts for enterprise clients. Not the main product. A feature. The keys were in an environment variable. Not encrypted. Not in a vault. In a .env file committed in 2021. Someone left a comment above them. "TODO: move to HSM." The TODO is four years old. At 6:20 PM the wallet started draining. $2.1 million. Twelve transactions across three chains in ninety seconds. By 6:22 PM the funds were bridged, mixed, and scattered. Not gone like the credentials are gone. Gone like physics. A blockchain cannot be rotated. At 6:23 PM the exfiltration completed. Sixteen minutes. Nobody was watching. Everyone was on the train. In the parking lot. Picking up their kids. The systems were still at work. The systems did exactly what we told them to do. What I told them to do. The bot checked for updates as designed. The auto-merge triggered as designed. The tests passed as designed. The deployment ran as designed. The trojan installed as designed. The credentials left the building as designed. Every system worked exactly as it was supposed to. That's the problem. We pulled the affected version Tuesday at 9:16 AM. Fifteen hours later. Pulling the version doesn't un-send the data. The database credentials are on a server we will never find. The API keys are on a server we will never find. The Stripe token connected to 2.4 million customers is on a server we will never find. We can rotate the credentials. We did rotate the credentials. It took fourteen hours. During those fourteen hours we did not know what was being accessed with the old ones. We still don't. We cannot rotate a blockchain. The $2.1 million is not in an account we can freeze. It is not in a bank we can subpoena. It is on a ledger where theft is permanent. Our CFO asked me when we'd recover the funds. I told her the funds are mathematically irrecoverable. She asked me what "mathematically" means in this context. It means the technology is working exactly as designed. She left the call. I sat there. Then I opened the dependency manifest. Not because I found something in those 14,000 packages. Because I realized I'd never actually looked. I am the person whose job it is to look. I had not looked. I marked the ticket Done. Here is what I found when I looked. Package 4,211 hadn't been updated in three years. Its maintainer's GitHub account had been inactive for two. Their last commit message said "finally done with this." I don't know if they meant the package or the industry. Their code still runs on our servers every day. Package 7,408 was a dependency of a dependency of a dependency. Nobody in the company had ever typed its name. Nobody in the company knew it existed. It had full access to our file system. Package 9,002 was called "request-utils." It had 14 downloads per week. Its maintainer hasn't logged into npm in six months. Their email domain expired three months ago. The code stays. The access stays. The maintainer disappears. Anyone who buys that email domain can reset their npm password. It's still in our production build. I found a package called "config-handler" that was added in 2019. The person who added it left the company in 2020. The Jira ticket that approved it said "Reviewed: No Issues Found." The reviewer was the same person who added it. They reviewed their own dependency. Then they left. The dependency stayed. I found a package called "event-pipe" whose maintainer's email domain expired last year. Expired domains can be purchased. Anyone who buys that domain can reset the npm password. Anyone who resets the npm password can push a new version. Anyone who pushes a new version will be auto-installed by our bot at 6 PM. I checked. The domain costs $11. Our production environment is eleven dollars away from the next Axios. I found a package called "log-sanitizer" that pins a version of a package that pins a version of a package that uses Axios. Three levels deep. It has a postinstall script. A postinstall script runs code on your machine the moment you install the package. Not when you use it. When you install it. Before you can read it. Before you can review it. Before you know what it does. I read the postinstall script. It downloads a second script from a URL. The URL is still live. I did not visit the URL. I do not know what the second script does. Nobody does. This package has been in our production build for three years. The postinstall script has run on every developer machine in the company. Every CI runner. Every staging server. Every production deployment. For three years. Including my machine. The laptop I used to run Tuesday's audit has been executing unknown code from an unreviewed URL since 2023. I am auditing the fire from inside the building. I do not know if my machine is compromised. I do not know if the audit I ran on Tuesday was run on a clean system. I do not know if the results I'm reading right now are the real results. I ran the tool that checks for breaches on a machine that may already be breached. This is the security. If I hadn't audited Axios I would never have known. I only audited Axios because Axios got caught. The other 13,999 packages have not been caught. Nobody has looked. My manager asked me to write a post-mortem. I wrote it. The root cause section says "a compromised version of a trusted dependency was automatically installed via our standard pipeline." Every word of that sentence means "we did this to ourselves on purpose." He asked me to add a "Lessons Learned" section. I wrote: "Implement manual review gates for critical dependencies." We will not implement manual review gates. Manual review gates would slow down deployments. Deployments are a metric. Metrics go in dashboards. Dashboards go in quarterly reviews. Slowing down deployments does not go in quarterly reviews. We have a thing called a "quarterly dependency review." It is a Jira ticket. The ticket is assigned to me. The ticket has been marked "Done" four quarters in a row. I mark it done every quarter. I do not review 14,000 packages every quarter. I run the eleven-minute audit. The eleven-minute audit checks for known vulnerabilities. It does not check for unknown ones. Unknown vulnerabilities are not in the database. They are in the code. The code is in the packages. The packages are in production. Production is everyone's problem. Everyone's problem is nobody's job. I looked. It is technically my job. I wish I hadn't. After the incident I joined a Slack channel called #supply-chain-security. It has 340 members. The last message before mine was from November. Someone had posted an article about the Log4j anniversary. It had two emoji reactions. One was a skull. The other was a pizza slice because it was posted on a Friday. We built a system that trusts strangers by default and requires paperwork to trust each other. Open source means anyone can read the code. It does not mean anyone does. We have 14,000 packages in production. I can name eleven. The bot that installs the other 13,989 runs every evening at 6 PM. Right when I leave. It doesn't read code. It reads version numbers. The version number said this was fine. Nobody checks what the version number means. Last night I was packing up at 5:58 PM. I saw the Renovate job queued in the pipeline dashboard. Two minutes. I watched it start. I watched it pull a new version of something I didn't recognize. I watched it auto-merge. I picked up my bag and walked to the elevator. The bot was still running when the doors closed. Tomorrow the Jira ticket will come around again. I will mark the ticket Done.






@JoeyRavioli777 I was smart enough to ride it down from $69k to $15k buying the whole way down, and I’m smart enough to do it again from $126k to $40k. And eventually I’ll still be smart enough to buy it down from $250k to $75k too.









Finally achieved the holy trinity of longevity. Home Gym. In home dry sauna. In home Hyperbaric chamber. Red light therapy. 5-6x a week. I'm living to 300 or going to die trying.
















