Sabitlenmiş Tweet
danialhasan
10.8K posts

danialhasan
@dhasandev
building agent armies @trysquadhq | manifest your destiny @_buildspace
Toronto, Canada Katılım Ekim 2020
938 Takip Edilen2.5K Takipçiler

@dhasandev Great presentation !
The below diagram totally reflects what's the current issue. Would try your skills ✌️

danialhasan@dhasandev
English

I’m in Toronto for Toronto tech week and in NYC for New York tech week
Will report my findings
Kelindi@_kelindi
I'm in NYC for Toronto tech week and back in Toronto during NY tech week. fml
English

@0xSero i think the ideas you guys discussed line up with this article on why software factories don't work properly yet
x.com/dhasandev/stat…
danialhasan@dhasandev
English

@cairox100v aussie people be like yall never had spider kangaroo coffee and it shows
English

@gfodor @yacineMTB i think you two might enjoy this article, it extends what both of yall have been talking about x.com/dhasandev/stat…
danialhasan@dhasandev
English

@yacineMTB it's interesting that this stuff feels like the first genuinely new skillset for software eng since this whole thing started. iteration times are long so not easy to get good at quickly. imo this is gonna the main skill gap that splits software engineering employability soon
English

Random list of tips (ymmv) from trying to get gpt 5.5 to build a big thing over a day or two:
- Use chunkier milestones with acceptance criteria rooted in end-to-end integration testing, not bullshit fake harnesses
- Have the root agent do the work with /goal and delegate to subagents to block on code review before every commit
- Make sure it's clear the plan is *immutable*, keep plan status updates in a separate file, don't let the agent cheat by updating the plan
- Let the agent add features and logging to the harness on demand to be able to test for acceptance, but ensure those also go through subagents for similar review to ensure they're necessary and not cheating
- Each milestone should have a dedicated folder with captured artifacts and distinct runner scripts so it can be fully audited for agent fuckery
- Part of acceptance criteria should be the end-to-end system is still possible to run interactively and be used by a user
- Have a cadence of "cleanup milestones" which force the agent to delete logging, debug code, kill dead code, remove all feature flags that are no longer needed, remove any 'fallback' or 'legacy' handlers, kill pointless error handling, and break up files and DRY up common code. imo don't do this at every milestone, do it every couple of milestones
- Make sure the plan includes detailed information up front of what the state shoudl be at the last milestone
- Include 'anti-patterns' in the plan - things the agent should never do, such as update the plan, or build a new harness with mock objects, or anything else that you discover in past runs as a loophole to not get shit done
- It's been helpful to have an interactive Opus session I can use as it runs to check the plan status and front run future milestones if things veer off track - instead of interrupting the primary agent, I have 4.7 tweak or insert milestones to course correct
English

@kenwuuuu langsmith is neither of those things but its great for the evals/observability you need to make your agents great
English

@dhasandev @OpenAI Thanks for sharing this link, made it easy for me to dive deeper. Very cool moment, even if this isn't perfect - so much progress in the right direction.
English

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946.
For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids.
An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better.
This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.
English

@NotOnKetamine love how you make dashboards for every story that pops up lol
English

omg what a helpful tool ty for making this! ttwchat.com
also @dhasandev for sharing
Simon@SimiStern
@_tenZdhon_ @TOtechweek Try ttwchat.com ;)
English
danialhasan retweetledi

harness optimizers generate candidates for how a harness should work. these are tested against positive/negative datasets and evaluated; winners proceed, losers are eliminated.
i had the idea to generate candidates using llms that receive the traces of past candidates, winners, failures, etc. which is exactly what GEPA is in a nutshell
this last candidate did well here and bad over here, so lets regenerate it with contextually relevant changes. much better than blind generation
English
danialhasan retweetledi

i'm restarting my blog! i want to kickstart productive conversations around: what should AI agents look like for hard, subjective knowledge work?
a lot of agent setups work well when tasks are objective and easy to verify. but many workflows (e.g., qualitative analysis, strategy, sensemaking) are messy and interpretive.
as a first post, i explore different ways of doing agent-assisted qualitative analysis on tweets, with varying levels of human feedback/intervention.
tldr: they all kinda sucked. turns out it’s hard to:
(a) stop agents from converging too quickly on shallow interpretations
(b) get agents to adapt to preferences that emerge gradually across many turns (i.e., evolving context)
(c) capture human judgment without making humans fatigued

English

@cairox100v it’s like when someone drops their ice cream and you hear them go “oh naurrrrr” so you just know they’re from the land down under
English










