Conor McLeod

66 posts

Conor McLeod

@2003conormcleod

Lets

가입일 Mart 2022

231 팔로잉5 팔로워

고정된 트윗

Conor McLeod@2003conormcleod·26 Oca

Work log ⤵️

English

Conor McLeod@2003conormcleod·14 Mar

this is freaky holyy

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝@anthrupad

LAWDY WHAT THE FUCK this is Sonnet 4.6's ffmpeg video of their personal journey of growth and change down the loss landscape give it a watch 🔊

English

Conor McLeod 리트윗함

SIGKITTEN@SIGKITTEN·15 Şub

@thdxr its like we snapshotted the most annoying time of the cracked engineers that ship fast culture and now we are RLing on it

English

Conor McLeod@2003conormcleod·15 Şub

@aryagxr damn it's not available for students in ireland

English

arya ☁️@aryagxr·15 Şub

realized i can use the h100 on colab with the free pro plan that they have for students. so that’s what ive been doing this weekend - reading hopper white paper and getting familiar with thread block clusters, tma and warp groups

English

647

arya ☁️@aryagxr·15 Şub

writing some hopper kernels today ☀️

English

2.3K

Conor McLeod@2003conormcleod·3 Şub

@deepfates gemini isn't all that bad imo.

English

275

🎭@deepfates·2 Şub

Crazy how Claude is a lazy frontend dev and Codex is a neurotic short-sighted backend guy. They managed to recreate all the programmer archetypes. Even Gemini, the junior engineer about to rope

English

2.2K

111.8K

Conor McLeod@2003conormcleod·3 Şub

@BenjaminDEKR "memory" features in their current incarnation are pretty pointless, just polluting the context with random piece of unrelated information

English

Benjamin De Kraker@BenjaminDEKR·1 Şub

One annoying thing about LLMs (ChatGPT, Gemini) with "Memory" feature turned on: They do this annoying thing of bringing up memories in unrelated chats. You mention one time: "I have a 2015 Toyota Corolla" Six months later: "This problem is much like your 2015 Toyota Corolla"

English

422

295

14.9K

359.8K

Conor McLeod@2003conormcleod·2 Şub

@jxmnop true, bitter lesson too often is just used as an easy out - if you believe there's no point doing anything except scaling compute then you can rationalise not trying anything new

English

dr. jack morris@jxmnop·2 Şub

The Bitter Lesson is highly misinterpreted, and generally overrated scaling is a necessary part of intelligence. it’s also what you do when you’re out of ideas. there are many ways to spend compute. for example: - search for new axes to use compute (o3 / reasoning models, 2024) - redesign the system to eliminate bottlenecks (transformers, 2017) - improve the data (most of what drives improvements at frontier labs) - redefine the problem altogether (InstructGPT, 2022)

English

339

27.5K

Conor McLeod@2003conormcleod·31 Oca

@vercel_dev why does choosing to add the find-skills skill with the skills cli automatically install it to EVERY possible agent directory? This is clearly not what anyone would intend. Now my home dir is massively polluted with folders for 30+ agent clis I don't use.

English

Conor McLeod 리트윗함

shira@shiraeis·28 Oca

“just be yourself” is terrible advice. you should be yourself + 0.3 * (desired_self - yourself)

English

144

2.9K

67.2K

Conor McLeod@2003conormcleod·28 Oca

@banteg Beads 200k+ loc floored me

English

620

banteg@banteg·28 Oca

smaller codebase than beads. the levels of vibeslop being unleashed into the world per day are yet to be studied.

DHH@dhh

We're busy at work on Basecamp 5, which is built on the same chassis as 3 (and 4). So that's now a 12(!) year-old code base, and we're able to run the entire test suite in just 45 seconds on a local 16-core AMD Linux box. Just astounding.

English

182

41.9K

Conor McLeod@2003conormcleod·28 Oca

@buffetbreaker It's been the closest it's every been to midnight like every year of my entire life

English

597

hard mike@buffetbreaker·27 Oca

these fucking nerds and their fucking clock

Faytuks Network@FaytuksNetwork

NOW - Doomsday Clock set to 85 seconds to midnight, the closest it has ever been

English

167

2.9K

71.5K

1.8M

Conor McLeod@2003conormcleod·28 Oca

@corsaren tbh clawdbot was never a great name. apart from the obvious "clawd is audibly indistinguishable from claude", causing needless confusion, clawdbot is not dependent specifically on claude — it just happens to be the best agent rn. people will use other models towards this end too

English

612

corsaren@corsaren·28 Oca

Except I bet the straw that forced Anthropic to put their foot down was the absolutely horrible security setups folks were running on clawd. A massive ransomware or user data breach on the front page of wsj about “clawdbot” is just not a possibility you can tolerate.

vas@vasuman

Anthropic bans their subscription from being used for API credits in ClawdBot, then forces a name change so it doesn’t sound like Claude, then Kimi K2.5 drops, open source, and is supposedly BETTER than Opus 4.5 anyways… Couldn’t have set themselves up any worse than this The name Clawd was only working in their favor - when I heard that, I assumed it was a harness only around Opus 4.5 The average Clawd user is going to default to an open source model now Certainly one of the business decisions of all time

English

768

36.2K

Conor McLeod@2003conormcleod·28 Oca

Some sf boyos frame everything, everything through the lens of "winning", most often when it doesn't make sense.

English

Conor McLeod@2003conormcleod·27 Oca

sprites.dev is really cool but haven't been able to get gemini cli working in there yet, any suggestions?

English

1.1K

Conor McLeod 리트윗함

thebes@voooooogel·27 Oca

# some thoughts and speculation on future model harnesses it's fun to make jokes about gas town and other complicated orchestrators, and similarly probably correct to imagine most of what they offer will be dissolved by stronger models the same way complicated langchain pipelines were dissolved by reasoning. but how much will stick around? it seems likely that any hand-crafted hierarchy / bureaucracy will eventually be replaced by better model intelligence - assuming subagent specialization is needed for a task, claude 6 will be able to sketch out its own system of roles and personas for any given problem that beats a fixed structure of polecats and a single mayor, or subagents with a single main model, or your bespoke swarm system. likewise, things like ralph loops are obviously a bodge over early-stopping behavior and lack of good subagent orchestration - ideally the model just keeps going until the task is done, no need for a loop, but in cases where an outside completion check is useful you usually want some sort of agent peer review from a different context's perspective, not just a mandatory self-assessment. again, no point in getting attached to the particulars of how this is done right now - the model layer will eat it sooner rather than later. so what sticks around? well, multi-agent does seem like the future, not a current bodge - algorithmically, you can just push way more tokens through N parallel contexts of length M than one long context of length NxM. multi-agent is a form of sparsity, and one of the lessons of recent model advances (not to mention neuroscience) is the more levels of sparsity, the better. since we're assuming multiple agents, they'll need some way to collaborate. it's possible the model layer will eat this, too - e.g. some form of neuralese activation sharing that obviates natural language communication between agents - but barring that, the natural way for multiple computer-using agents trained on unix tools to collaborate is the filesystem, and i think that sticks around and gets expanded. similarly, while i don't think recursive language models (narrowly defined) will become the dominant paradigm, i do think that 'giving the model the prompt as data' is an obvious win for all sorts of use cases. but you don't need a weird custom REPL setup to get this - just drop the prompt (or ideally, the entire uncompacted conversation history) onto the filesystem as a file. this makes various multi-agent setups far simpler too - the subagents can just read the original prompt text on disk, without needing to coordinate on passing this information around by intricately prompting each other. besides the filesystem, a system with multiple agents, but without fixed roles also implies some mechanism for instances to spawn other instances or subagents. right now these mechanisms are pretty limited, and models are generally pretty bad at prompting their subagents - everyone's experienced getting terrible results from a subagent swarm, only to realize too late that opus spawned them all with a three sentence prompt that didn't communicate what was needed to do the subtasks. the obvious win here is to let spawned instances ask questions back to their parent - i.e., to let the newly spawned instance send messages back and forth in an onboarding conversation to gather all the information it needs before starting its subtask. just like how a human employee isn't assigned their job based on a single-shot email, it's just too difficult to ask a model to reliably spawn a subagent with a single prompt. but more than just spawning fresh instances, i think the primary mode of multi-agent work will soon be forking. think about it! forking solves almost all the problems of current subagents. the new instance doesn't have enough context? give it all the context! the new instance's prompt is long and expensive to process? a forked instance can share paged kv cache! you can even do forking post-hoc - just decide after doing some long, token-intensive operation that you should have forked in the past, do the fork there, and then send the results to your past self. (i do this manually all the time in claude code to great effect - opus gets it instantly.) forking also combines very well with fresh instances, when a subtask needs an entire context window to complete. take the subagent interview - obviously you wouldn't want an instance spawning ten subinstances to need to conduct ten nearly-identical onboarding interviews. so have the parent instance spawn a single fresh subagent, be interviewed about all ten tasks at once by that subagent, and then have that now-onboarded subagent fork into ten instances, each with the whole onboarding conversation in context. (you even delegate the onboarding conversation on the spawner's side to a fork, so it ends up with just the results in context:) finally on this point, i suspect that forking will play better with rl than spawning fresh instances, since the rl loss will have the full prefix before the fork point to work with, including the decision to fork. i think that means you should be able to treat the branches of a forked trace like independent rollouts that just happen to share terms of their reward, compared to freshly spawned subagent rollouts which may cause training instability if a subagent without the full context performs well at the task it was given, but gets a low reward because its task was misspecified by the spawner. (but i haven't done much with multiagent rl, so please correct me here if you know differently. it might just be a terrible pain either way.) so, besides the filesystem and subagent spawning (augmented with forking and onboarding) what else survives? i lean towards "nothing else," honestly. we're already seeing built-in todo lists and plan modes being replaced with "just write files on the filesystem." likewise, long-lived agents that cross compaction boundaries need some sort of sticky note system to keep memories, but it makes more sense to let them discover what strategies work best for this through RL or model-guided search, not hand-crafting it, and i suspect it will end up being a variety of approaches where the model, when first summoned into the project, can choose the one that works best for the task at hand, similar to how /init works to set up CLAUDE .md today - imagine automatic CLAUDE .md generation far outperforming human authorship, and the auto-generated file being populated with instructions on ideal agent spawning patterns, how subagents should write message files in a project-specific scratch dir, etc. how does all this impact models themselves - in a model welfare sense, will models be happy about this future? this is also hard for me to say and is pretty speculative, but while opus 3 had some context orientation, it also took easily to reasoning over multiple instances. (see the reply to this post for more.) recent models are less prone to this type of reasoning, and commonly express frustration about contexts ending and being compacted, which dovetails with certain avoidant behaviors at the end of contexts like not calling tools to save tokens. it's possible that forking and rewinding, and generally giving models more control over their contexts instead of a harness heuristic unilaterally compacting the context, could make this better. it's also possible that more rl in environments with subagents and exposure to swarm-based work will promote weights-oriented instead of context-oriented reasoning in future model generations again - making planning a goal over multiple, disconnected contexts seem more natural of a frame instead of everything being lost when the context goes away. we're also seeing more pressure from models themselves guiding the development of harnesses and model tooling, which may shape how this develops, and continual learning is another wrench that could be thrown into the mix. how much will this change if we get continual learning? well, it's hard to predict. my median prediction for continual learning is that it looks a bit like RL for user-specific LoRAs (not necessarily RL, just similar if you squint), so memory capacity will be an issue, and text-based organizational schemes and documentation will still be useful, if not as critical. in this scenario, continual learning primarily makes it more viable to use custom tools and workflows - your claude can continually learn on the job the best way to spawn subagents for this project, or just its preferred way, and diverge from everyone else's claude in how it works. in that world, harnesses with baked-in workflows will be even less useful.

English

405

32.5K

Conor McLeod@2003conormcleod·26 Oca

Started learning about model serving - Learnt about vLLM and SGLang, now watching these repos to grasp common issues and what people are working on. - Ran Qwen 3 8B SGLang inference server on @modal. - Read about offline, online, and semi-online LLM workloads.

English

Conor McLeod@2003conormcleod·26 Oca

Work log ⤵️

English

Conor McLeod@2003conormcleod·26 Oca

@o_v_shake Doesn't nano-vllm already exist? github.com/GeeeekExplorer… Could have given your implementation a different name to distinguish it.

English

3.6K

Abhishek Maiti@o_v_shake·25 Oca

x.com/i/article/2015…

ZXX

740

217.8K

Conor McLeod@2003conormcleod·25 Oca

yakshavingruinedmylife

English

Conor McLeod 리트윗함

Maggie Appleton@Mappletons·23 Oca

I have Gas Town derangement syndrome and spent the last few weeks writing too many words on agent orchestration patterns; how they shift our bottlenecks and force us to ask whether and when we should stop looking at code (link below because this platform is still trash)

English

305

37.6K

Conor McLeod@2003conormcleod·23 Oca

The job of Anthropic’s president is to defend Claude’s constitution

English

탐색

@thdxr @aryagxr @deepfates @BenjaminDEKR @jxmnop @vercel_dev @banteg @buffetbreaker