mark lucovsky@marklucovsky
In David Crawshaw’s (@davidcrawshaw ) recent post “The agent principal-agent problem” there’s a lot of insight beneath the headline “Code review is broken.” Worth reading carefully.
Toward the end, David reflects on what he calls the old “cowboy” development culture at Microsoft in the 80s/90s. Not much has been written about that era, mostly because there was no social media, no laptops everywhere, no phones recording daily engineering life.
A few thoughts from someone who lived it.
Back then, formal code review was not our primary line of defense. Our biggest daily problem wasn’t “is this algorithm theoretically perfect?” It was:
Will the full system compile?
Will it link?
Will it boot?
Will it survive stress?
Pre-Win2k we used an internal source control system called SLM (“slime”). No branching. Filesystem-based. Extremely brittle.
To build a bootable NT system you needed 100+ SLM projects welded into arbitrary places in the tree. Getting a machine synced could take 3+ hours. You literally ran sync in a loop until you got no new files and no errors.
Then came the build.
In the NT 3.1 timeframe, a full system build on a capable machine might take ~5 hours.
By the Win2k era, full builds had stretched into the 14+ hour range — and this was before modern build farms or large-scale distributed compilation.
Those build times fundamentally shaped developer behavior.
Most developers avoided full-system builds entirely. They worked in tiny enlistments and borrowed objs/binaries from known-good systems because rebuilding the entire world was simply too expensive in both time and productivity.
The longer builds became, the more pressure there was to take shortcuts — and those shortcuts created endless opportunities for integration failures and subtle mistakes.
A broken build could easily waste days of engineering time. In bad stretches, you could go multiple days without a clean master build.
That approach worked… until someone changed a widely shared struct, renamed a field, added a property, tweaked a macro, or silently altered alignment assumptions somewhere deep in the system.
Best case:
parts of the system no longer compiled.
Next best:
they compiled but failed to link.
Worst case:
everything built successfully, but incompatible assumptions between old objs and newly compiled code poisoned the running system in ways that were extremely difficult to diagnose.
THIS was our daily battle:
not bad style,
not missing comments,
not minor logic bugs —
it was preserving system-wide build and runtime integrity across a massive codebase when most developers could not practically build the entire system locally.
Once we had builds that compiled, linked, and booted, the real work started.
Stress.
Every dev had at least two machines:
one for coding,
one for testing/stress.
We hammered systems continuously with unrealistic randomized load. Deadlocks. Pool corruption. Loader hangs. Resource exhaustion. “Hung, No Ready Threads.”
In the early days, the stress build was literally my build. I’d walk office-to-office in the morning checking which machines had died overnight and assign debugging work.
No remote debugging yet.
If someone needed your machine, you lost your office for hours.
Eventually we got remote.exe and centralized build/stress systems, but debugging was still brutal:
raw assembly,
minimal symbols,
hand-reconstructed stacks,
careful avoidance of paged-out memory because one wrong move killed the session.
That was the real engineering culture:
integration,
stress,
performance,
resource correctness,
system behavior under extreme load.
Most of the failures we chased would never have been caught by lightweight pre-commit review from someone inside your immediate group.