Ari Dyckovsky

2.5K posts

Ari Dyckovsky

@adyckovsky

Building systems for human-AI teams

Brooklyn, NY Katılım Aralık 2013

951 Takip Edilen1.8K Takipçiler

Ari Dyckovsky retweetledi

Xuhui Zhou@nlpxuhui·4d

Creating user simulators is a key to evaluating and training models for user-facing agentic applications. But are stronger LLMs better user simulators? TL;DR: not really. We ran the largest sim2real study for AI agents to date: 31 LLM simulators vs. 451 real humans across 165 tasks. Here's what we found (co-lead with @sunweiwei12).

English

270

28.2K

Ari Dyckovsky retweetledi

Morgan@morganlinton·10 Mar

I can't stop thinking about this article Rhys wrote yesterday, I think he's really onto something. While there are a lot of really good points he makes, the one that really stands out to me, that I think is a real (growing) problem today is: "the less tools you give them, the better they perform" As more and more companies, rush to adopt AI Agents, I think one of the key mistakes they'll make/are making, is to stuff those agents full of tools, and then wonder why they're not performing as well. The companies that end up really getting value from agents, are going to be those that are incredibly detailed, and disciplined about tool selection and optimization.

Rhys@RhysSullivan

x.com/i/article/2030…

English

116

26.9K

Ari Dyckovsky retweetledi

Omar Khattab@lateinteraction·11 Mar

"Models will write all code" can only be said by someone who fails to recognize the power of directly manipulating objects with your own hands at the right level of abstraction. For a skilled expert, "ask some other extremely smart dude in English and hope they get you" is only OK for the lower-order bits. And there's a TON of these, waiting to be optimized away. But my ideal future has *me* writing 20 extremely powerful lines of very well-considered "code" that LLM-driven compilers turn into computer programs, not writing 200 lines of fluffy back-and-forth prompts and hoping the Einstein-level LLM kinda sorta gets my point.

Gergely Orosz@GergelyOrosz

English

289

39.6K

Ari Dyckovsky retweetledi

dax@thdxr·10 Mar

sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that

English

254

708

6.9K

964.1K

Ari Dyckovsky@adyckovsky·9 Şub

@sqs @waghnakh_21 Get me on the waitlist for that cataract surgery game please

English

Quinn Slack@sqs·9 Şub

@waghnakh_21 well they just type stuff into my Amp CLI, it's just like this, lol

English

1.6K

Mayank Gupta🧑🏻‍💻@waghnakh_21·9 Şub

6yo and 7yo vibe coding a game is crazy!!!

Quinn Slack@sqs

SF AI ad sentiment from my Super Bowl house party tonight: - OpenAI: didn't see it - Anthropic: same - Gemini: same - Alexa: scary Our main takeaways: my 2yo shouldn't draw on couches or her face, my 6yo and my 7yo nephew vibecoded a game, and the halftime show was awesome.

English

2.7K

Ari Dyckovsky@adyckovsky·8 Şub

@alexolegimas *and a coordinated way to plan, run, share, and compare results for all these experiments

English

Alex Imas@alexolegimas·8 Şub

“we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.” 🫡🫡🫡🫡

Ethan Mollick@emollick

I think agentic AI would work much better if people took lessons from organizational theory, which has actually spent a lot of time understanding how to deal with complex hierarchies, information limits, and spans of control. Right now most agentic AI systems seem to pretend that models have basically unlimited ability to manage subagents when that is clearly not true. We need measures of spans of control for AI. A human tops out at less than 10 direct reports. I am pretty sure that 100 subagents is too much for an orchestrator agent - suspect we need middle management agents (yes, I get it, insert middle management joke here). Similarly, we need more attention to boundary objects. These are what is handed between groups (marketing to IT to sales) in organizations to convey meaning as a project crosses group boundaries, like a prototype or a user story. Right now agents pass raw text & maybe code back and forth. Structured boundary objects that multiple agents of different ability levels can read and write to would solve a huge number of coordination failures & reduce token use. I also think aboht coupling, which is how tightly units inside organizations are bound. Most agentic systems are either too tightly coupled (every step needs approval) or too loose (Moltbook). This tradeoff is well-studied in organizations, I bet a lot would apply to agents. Other known issues like bounded rationality also apply, I suspect. Everyone is rushing towards the (terribly named) agent swarm, but the issue won’t just be how good the model is, it will be org design choices. I am not sure the labs see this, but we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.

English

6.6K

Ari Dyckovsky@adyckovsky·30 Oca

@alexolegimas @danielrock @TheStalwart @stevehou @deanwball @lugaricano @Afinetheorem @DAcemogluMIT @tszzl @bcherny @abhishekn @ahall_research Living research artifacts are the way!

English

Alex Imas@alexolegimas·29 Oca

@danielrock @TheStalwart @stevehou @deanwball @lugaricano @Afinetheorem @DAcemogluMIT @tszzl @bcherny @abhishekn And of course @ahall_research whose post on "living research" inspired me to continuously update it.

English

2.5K

Alex Imas@alexolegimas·29 Oca

New post: What is the impact of AI on productivity? I review all of the studies and data that I can find and try to provide a synthesis. There’s a lot of disagreement on what we know about the productivity impact. Part of the reason for this is the disconnect between the micro and macro evidence. The micro studies overwhelmingly find positive productivity benefits (except for one notable exception), but these productivity benefits are yet to show up in the macro data. There is also a disconnect on who benefits most: micro (mostly) finds low-skill/less-experienced workers see higher returns, the (limited) macro evidence is more mixed but leans toward higher wage/higher ed people seeing more of the benefits. I discuss potential reasons for the micro-macro gap in the post and 🧵 below. Importantly, this is a living post. I will update it continuously as new data comes in. If you see something I'm missing, please let me know and I will add it. For regular updates, please consider subscribing to the substack. Here is the link: aleximas.substack.com/p/what-is-the-…

English

142

542

195.8K

Ari Dyckovsky@adyckovsky·19 Oca

@alz_zyd_ What humanity needs is up against what many tenure committees want

English

404

alz@alz_zyd_·19 Oca

Write papers that - if not for your intervention - it would have taken humanity 10 or 20 years to figure out, or better yet, humanity would never have figured out. Change the course of history, if only in some small way, with each paper you write

English

120

30.2K

alz@alz_zyd_·19 Oca

As an academic, move slowly. Don't rush your papers. If someone scoops your paper, that means humanity didn't need you to write the paper - we would have figured it out quickly, whether or not you did it, so why are you wasting your time on it?

English

220

18.1K

Ari Dyckovsky@adyckovsky·19 Oca

@alz_zyd_ The robots can scavenge food for us while we prompt them 🧠

English

179

alz@alz_zyd_·19 Oca

@adyckovsky That's exactly why we should force them to spend it on AI instead of food

English

532

alz@alz_zyd_·18 Oca

In an ideal world all funded PhD students should get $300/month extra funding specifically for access to frontier AI models

English

211

31.9K

Ari Dyckovsky retweetledi

Séb Krier@sebkrier·13 Oca

The very long tail of tasks that require some human judgement or taste is often a bottleneck, and many aren't easily specifiable and amenable to automation. You can automate someone's taste in particular, but that remains a snapshot in time whose appeal depletes as preferences change, evolve, contrafict themselves over time, and the desire for individuality overtakes consumers. The problem with the long tail is that it's not a static set: not only do preferences change but historically at least, automation has generated new problem spaces rather than depleting a fixed set. People expect that at some point, "it's solved" - well the world is not a finite set of tasks and problems to solve. Almost everything people ever did in the ancient times is automated - and yet the world today now has more preferences to satiate and problems to solve than ever. The world hasn't yet shown signs of coalescing to a great unification or a fixed state! Of course it's conceivable that at sufficient capability levels, the generative process exhausts itself and preferences stabilize - but I'd be surprised.

English

180

23.7K

Ari Dyckovsky@adyckovsky·19 Oca

@SabrinaHalper Machines mean more when they reflect the lives we want to live

English

Sabrina Halper@SabrinaHalper·19 Oca

These tech ads make you feel something

English

5.8K

Ari Dyckovsky@adyckovsky·18 Oca

@MichaelArnaldi Chose to do something similar and probably wouldn’t have taken this path without Effect and agents on my side

English

111

Michael Arnaldi@MichaelArnaldi·18 Oca

Specifically Auth domain models: github.com/mikearnaldi/ac…

English

Michael Arnaldi@MichaelArnaldi·18 Oca

Differently from what people think for me automated software development enables more correct design decisions and avoids lazy behaviour, for example for Accountability I am now implementing authorization, I would have never implemented full ABAC myself but I am doing it now

English

3.1K

Ari Dyckovsky@adyckovsky·16 Oca

@kitlangton This feels like the start of a manifesto

English

Kit Langton@kitlangton·16 Oca

The middlemen shall not inherit the earth. We can all milk our own drivel from ChatGPT, so what worth is a proxy? Won't you aspire to something greater? Won't you take this newfound productivity and go deeper, recursing one thousand times further than you'd ever have before?

English

2.1K

Kit Langton@kitlangton·16 Oca

All of these AI workflow micro-optimizations are just that, & any utility will be absorbed into your favorite TUI in short order. If you focus on what makes code understandable to humans (single sources of truth, types, feedback loops), you'll get more effective agents for free.

English

3.9K

Ari Dyckovsky@adyckovsky·16 Oca

Been thinking about this a lot lately, thanks for sharing. One of the underlying tensions with "world class" scientists is that they often get that status by excelling within institutional norms. Those norms tend to reward being right and punish being wrong. So the experts we default to are, almost by construction, systematically biased toward smaller leaps that are easier to defend (it's harder to secure grants, publish, and get tenure when taking big leaps). When something is nascent and messy, that bias often collapses uncertainty into 'never'. Meanwhile the renegades who push boundaries are optimizing for a different game, which is partly why they're less likely to show up on the list of experts in the first place. Curious about the framing of responses they gave: When they told you it would never work, was it a genuine hard constraint (e.g. speed of light is a strict upper bound)? Or more of an "I can't see a path to proof from here" given the norms and tools they're used to?

English

villi@villi·16 Oca

Experts are a double-edged sword. They are super helpful in helping you assess tech bc of their expertise in their field. But, they can mislead you bc of their inability to imagine how something nascent can mature, or extrapolate the progress a small team can make over time.

English

1.6K

villi@villi·16 Oca

The biggest mistake I made in the past year is not backing a team wanting to do something extremely hard. Every expert we spoke with in the field (world class scientist) told us it will never work. We could not find any external validation. I regret it. Back the dreamers.

English

5.4K

Ari Dyckovsky@adyckovsky·15 Oca

I’m imagining the value increases for this as repos scale, especially because maintainers likely have the optimal setup for agents working on their repos. Would much rather pay to generally improve the core project vs pay to make a one-off local improvement that ends up out of sync/a PR that goes unused. Plus you could pool multiple user payments toward resolving bigger issues, so there’s a chance for collective buy-in on any given solution.

English

sam@samgoodwin89·15 Oca

@adyckovsky My thinking was that users are already willing to pay for tokens. Throwing some of those at a repo isn’t different than cloning and using an agent locally. Developer cut is to pay for the cost to review and maintenance. Ai slop PRs are a problem, esp. for big repos.

English

120

sam@samgoodwin89·15 Oca

Could AI help solve open source revenue? What if a repo earned money by charging for an AI to solve issues? A user would be paying for tokens anyway. Why not send some revenue to the maintainer as a margin?

English

Ari Dyckovsky@adyckovsky·14 Oca

Fascinating to watch software engineering culture split into identity-based camps as agentic technologies evolve. It’s less about methods now and more about who feels threatened, who overestimates themselves in comparison, and who leans into collaboration. I know which camp I’d bet on.

English

130

Ari Dyckovsky@adyckovsky·14 Oca

Absolutely, on the same page. The value gained from data integrations and unique context goes much further than whatever interface they overlay it with. Makes Bloomberg’s a great example case for the point you’re making, and there are many other businesses that we could probably say the same for

English

Michael Arnaldi@MichaelArnaldi·13 Oca

@adyckovsky If I was trading today I would pay for a Bloomberg subscription and I would not use the terminal, I am not saying Bloomberg doesn't have a business (they have a fantastic one)

English

215

Michael Arnaldi@MichaelArnaldi·13 Oca

You can clearly see that by the amount of people calling BS on the article focusing on the details of the Bloomberg Terminal clone, completely missing the point.

Michael Arnaldi@MichaelArnaldi

95% of developers will resist this transition for one reason or another, mostly because they cannot believe it is true and they fear the world if this is true. That's arbitrage opportunity for companies and individuals.

English

19.1K

Keşfet

@sunweiwei12 @sqs @waghnakh_21 @alexolegimas @danielrock @TheStalwart @stevehou @deanwball