Drew

8.1K posts

Drew

Drew

@js4drew

Applied AI/ML Engineer | Freelancer | prev: Accenture , Sandia Labs, Berkeley | Building Tim: https://t.co/rF54wp3Nk2

Katılım Haziran 2024
398 Takip Edilen453 Takipçiler
Sabitlenmiş Tweet
Drew
Drew@js4drew·
day 1: was overthinking what to post as my first video. here it is quit my job, building in public
English
3
0
15
978
Chayenne Zhao
Chayenne Zhao@GenAI_is_real·
Today I read a lengthy piece on Harness Engineering — tens of thousands of words, almost certainly AI-written. My first reaction wasn't "wow, what a powerful concept." It was "do these people have any ideas beyond coining new terms for old ones?" I've always been annoyed by this pattern in the AI world — the constant reinvention of existing concepts. From prompt engineering to context engineering, now to harness engineering. Every few months someone coins a new term, writes a 10,000-word essay, sprinkles in a few big-company case studies, and the whole community starts buzzing. But if you actually look at the content, it's the same thing every time: Design the environment your model runs in — what information it receives, what tools it can use, how errors get intercepted, how memory is managed across sessions. This has existed since the day ChatGPT launched. It doesn't become a new discipline just because someone — for whatever reason — decided to give it a new name. That said, complaints aside, the research and case studies cited in the article do have value — especially since they overlap heavily with what I've been building with how-to-sglang. So let me use this as an opportunity to talk about the mistakes I've actually made. Some background first. The most common requests in the SGLang community are How-to Questions — how to deploy DeepSeek-V3 on 8 GPUs, what to do when the gateway can't reach the worker address, whether the gap between GLM-5 INT4 and official FP8 is significant. These questions span an extremely wide technical surface, and as the community grows faster and faster, we increasingly can't keep up with replies. So I started building a multi-agent system to answer them automatically. The first idea was, of course, the most naive one — build a single omniscient Agent, stuff all of SGLang's docs, code, and cookbooks into it, and let it answer everything. That didn't work. You don't need harness engineering theory to explain why — the context window isn't RAM. The more you stuff into it, the more the model's attention scatters and the worse the answers get. An Agent trying to simultaneously understand quantization, PD disaggregation, diffusion serving, and hardware compatibility ends up understanding none of them deeply. The design we eventually landed on is a multi-layered sub-domain expert architecture. SGLang's documentation already has natural functional boundaries — advanced features, platforms, supported models — with cookbooks organized by model. We turned each sub-domain into an independent expert agent, with an Expert Debating Manager responsible for receiving questions, decomposing them into sub-questions, consulting the Expert Routing Table to activate the right agents, solving in parallel, then synthesizing answers. Looking back, this design maps almost perfectly onto the patterns the harness engineering community advocates. But when I was building it, I had no idea these patterns had names. And I didn't need to. 1. Progressive disclosure — we didn't dump all documentation into any single agent. Each domain expert loads only its own domain knowledge, and the Manager decides who to activate based on the question type. My gut feeling is that this design yielded far more improvement than swapping in a stronger model ever did. You don't need to know this is called "progressive disclosure" to make this decision. You just need to have tried the "stuff everything in" approach once and watched it fail. 2. Repository as source of truth — the entire workflow lives in the how-to-sglang repo. All expert agents draw their knowledge from markdown files inside the repo, with no dependency on external documents or verbal agreements. Early on, we had the urge to write one massive sglang-maintain.md covering everything. We quickly learned that doesn't work. OpenAI's Codex team made the same mistake — they tried a single oversized AGENTS.md and watched it rot in predictable ways. You don't need to have read their blog to step on this landmine yourself. It's the classic software engineering problem of "monolithic docs always go stale," except in an agent context the consequences are worse — stale documentation doesn't just go unread, it actively misleads the agent. 3. Structured routing — the Expert Routing Table explicitly maps question types to agents. A question about GLM-5 INT4 activates both the Cookbook Domain Expert and the Quantization Domain Expert simultaneously. The Manager doesn't guess; it follows a structured index. The harness engineering crowd calls this "mechanized constraints." I call it normal engineering. I'm not saying the ideas behind harness engineering are bad. The cited research is solid, the ACI concept from SWE-agent is genuinely worth knowing, and Anthropic's dual-agent architecture (initializer agent + coding agent) is valuable reference material for anyone doing long-horizon tasks. What I find tiresome is the constant coining of new terms — packaging established engineering common sense as a new discipline, then manufacturing anxiety around "you're behind if you don't know this word." Prompt engineering, context engineering, harness engineering — they're different facets of the same thing. Next month someone will probably coin scaffold engineering or orchestration engineering, write another lengthy essay citing the same SWE-agent paper, and the community will start another cycle of amplification. What I actually learned from how-to-sglang can be stated without any new vocabulary: Information fed to agents should be minimal and precise, not maximal. Complex systems should be split into specialized sub-modules, not built as omniscient agents. All knowledge must live in the repo — verbal agreements don't exist. Routing and constraints must be structural, not left to the agent's judgment. Feedback loops should be as tight as possible — we currently use a logging system to record the full reasoning chain of every query, and we've started using Codex for LLM-as-a-judge verification, but we're still far from ideal. None of this is new. In traditional software engineering, these are called separation of concerns, single responsibility principle, docs-as-code, and shift-left constraints. We're just applying them to LLM work environments now, and some people feel that warrants a new name. I don't know how many more new terms this field will produce. But I do know that, at least today, we've never achieved a qualitative leap on how-to-sglang by swapping in a stronger model. What actually drove breakthroughs was always improvements at the environment level — more precise knowledge partitioning, better routing logic, tighter feedback loops. Whether you call it harness engineering, context engineering, or nothing at all, it's just good engineering practice. Nothing more, nothing less. There is one question I genuinely haven't figured out: if model capabilities keep scaling exponentially, will there come a day when models are strong enough to build their own environments? I had this exact confusion when observing OpenClaw — it went from 400K lines to a million in a single month, driven entirely by AI itself. Who built that project's environment? A human, or the AI? And if it was the AI, how many of the design principles we're discussing today will be completely irrelevant in two years? I don't know. But at least today, across every instance of real practice I can observe, this is still human work — and the most valuable kind.
Chayenne Zhao tweet media
English
39
114
1.1K
81.1K
Drew
Drew@js4drew·
@yacineMTB my kids will be lil reggae heads
English
0
0
0
102
kache
kache@yacineMTB·
lmao my kid loves metal so much ahahahahahaha
English
33
2
167
6.2K
Drew
Drew@js4drew·
@joshpuckett @Duderichy wife and i have been inseparable since we met 8 years ago we went sledding this day :)
Drew tweet media
English
1
0
4
61
joshpuckett
joshpuckett@joshpuckett·
@Duderichy Told my wife I was gonna marry her within a week of meeting. Did the opposite of whatever garbage advice that child is spewing. That was about 16 happy years ago (pic from a wedding this weekend).
joshpuckett tweet media
English
10
1
140
5.8K
Drew
Drew@js4drew·
@_iamtpo yes dude content creation is mentally exhausting these days everyone can build but not everyone can make good content about what they’re building
English
0
0
1
14
Temi
Temi@_iamtpo·
It’s the hardest thing I’ve done and it’s the single thing that exposes if you actually have the knowledge of your subject or if it’s just passive understanding. But don’t give up! You’re reinforcing the muscles needed an it gets easier with time.
Aunty Teda@imoteda

Incase you’ve wondered, this is why I don’t make content. This is me trying to film a short educative video and you can literally see how excited I started out and how stressed I got when I gave up. Packed the hair, fake smile. Content creators are truly doing the lord’s work!

English
1
0
0
101
Drew
Drew@js4drew·
@teodorio i told my wife that i wanted to do this on our last trip (portugal + spain) and she couldn’t fathom not having a return flight now she is fully bought in to just vibing it out
English
0
0
2
14
teo
teo@teodorio·
I never buy return tickets, I just leave for Bali usually and then see where the road takes me. Can be 2 months, 6 months or more!
peach@33b345

English
2
0
15
626
Drew
Drew@js4drew·
@benhylak i built a cute otter to monitor my spending, think this hits both.. his names Tim!
Drew tweet media
English
0
0
2
95
ben
ben@benhylak·
if you're not building any silly things right now, you are ngmi. if you're not building any serious things right now, i'm sorry, but you are also ngmi
English
29
7
223
7.3K
Drew
Drew@js4drew·
@teodorio this is exactly what i do 2 tmux panes passing commit hashes back and forth
English
0
0
0
41
Drew
Drew@js4drew·
@samswoora congrats! beautiful that the team means so much to you
English
1
0
3
351
Samswara
Samswara@samswoora·
Told my manager I’m leaving, started sobbing unfortunately, he told me congrats and he’s super happy for me which just made me start crying harder lol
English
17
0
263
8.7K
Drew
Drew@js4drew·
@yacinelearning did the same but with a geophysics class ended up being my favorite class in college
English
0
0
1
76
Drew
Drew@js4drew·
@kuberdenis real g’s move in silence like lasagna
English
0
0
2
13
Denislav Gavrilov
Denislav Gavrilov@kuberdenis·
Another great life hack is to stop sharing - whatever opportunities you have, you just don't share them not even to your closest allies This way the universe knows you are capable of handling success and it brings more your way, it's like you "earn" it in a way
English
5
3
107
2.7K
Drew
Drew@js4drew·
@fjzeit one shotters are in for a rude awakening
English
0
0
3
344
fj
fj@fjzeit·
there are two directions ahead: * they make us irrelevant and our work ceases - in which case all that effort you put in nailing your processes and technology to 2025/26 agent orchestration chaos is useless and you’re flipping burgers at Maccas. * they are full of shit and you’re still responsible for all the things and nobody is going to care you wrote stuff 100x faster when you don’t understand any of it. only one of these outcomes is worth preparing for.
English
11
8
121
8.1K
Drew
Drew@js4drew·
@amritwt especially santa claude
English
0
0
0
41
amrit
amrit@amritwt·
I dislike frontend engineering with passion but sometimes LLMs do make it bearable
English
14
1
190
5.1K
Drew
Drew@js4drew·
@dejavucoder it might be the best learning/exploration tool ever
English
0
0
1
44
sankalp
sankalp@dejavucoder·
+1 on this i have a suggestion. get a claude subscription and ask as many questions as you want. let your inner child out. yap out everything you have in your mind. opus 4.6 thinking.
Icona@iconawrites

Please cultivate an interest in art, history, anything. If you don’t stay curious, you risk becoming one of those adults who think of nothing beyond coveting expensive cars/bags/whatever, and gossiping about who does or does not have money. An unfathomably boring existence.

English
6
6
94
4.4K
Drew
Drew@js4drew·
@yacineMTB i found out that spiders are cool from Charlottes web
English
0
0
1
256
kache
kache@yacineMTB·
Have we figured out anything cool from the James Webb yet
English
91
14
534
1M
Drew
Drew@js4drew·
@RoyShilkrot the future belongs to those who vaguely understand git
English
0
0
0
7
Roy Shilkrot
Roy Shilkrot@RoyShilkrot·
Unpopular opinion: The greatest innovation in coding of the last decade isn’t Claude Code, It’s GitHub Actions. (Can’t guarantee the next decade tho)
English
1
0
1
182