Ari Dyckovsky

2.5K posts

Ari Dyckovsky banner
Ari Dyckovsky

Ari Dyckovsky

@adyckovsky

Building systems for human-AI teams

Brooklyn, NY Katılım Aralık 2013
951 Takip Edilen1.8K Takipçiler
Ari Dyckovsky retweetledi
Xuhui Zhou
Xuhui Zhou@nlpxuhui·
Creating user simulators is a key to evaluating and training models for user-facing agentic applications. But are stronger LLMs better user simulators? TL;DR: not really. We ran the largest sim2real study for AI agents to date: 31 LLM simulators vs. 451 real humans across 165 tasks. Here's what we found (co-lead with @sunweiwei12).
Xuhui Zhou tweet media
English
5
62
270
28.2K
Ari Dyckovsky retweetledi
Morgan
Morgan@morganlinton·
I can't stop thinking about this article Rhys wrote yesterday, I think he's really onto something. While there are a lot of really good points he makes, the one that really stands out to me, that I think is a real (growing) problem today is: "the less tools you give them, the better they perform" As more and more companies, rush to adopt AI Agents, I think one of the key mistakes they'll make/are making, is to stuff those agents full of tools, and then wonder why they're not performing as well. The companies that end up really getting value from agents, are going to be those that are incredibly detailed, and disciplined about tool selection and optimization.
Rhys@RhysSullivan

x.com/i/article/2030…

English
10
10
116
26.9K
Ari Dyckovsky retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
"Models will write all code" can only be said by someone who fails to recognize the power of directly manipulating objects with your own hands at the right level of abstraction. For a skilled expert, "ask some other extremely smart dude in English and hope they get you" is only OK for the lower-order bits. And there's a TON of these, waiting to be optimized away. But my ideal future has *me* writing 20 extremely powerful lines of very well-considered "code" that LLM-driven compilers turn into computer programs, not writing 200 lines of fluffy back-and-forth prompts and hoping the Einstein-level LLM kinda sorta gets my point.
Gergely Orosz@GergelyOrosz

OK, related to this: Fresh data from Uber, from Feb 2026: 31% of code is AI-authored 11% of PRs opened by agents And Uber is investing heavily in AI So outside Anthropic + AI labs, we are a far way out, probably? Source: newsletter.pragmaticengineer.com/p/how-uber-use…

English
18
13
289
39.6K
Ari Dyckovsky retweetledi
dax
dax@thdxr·
sent this to the team today everything great comes from being able to delay gratification for as long as possible and it feels like we're collectively losing our ability to do that
dax tweet media
English
254
708
6.9K
964.1K
Quinn Slack
Quinn Slack@sqs·
@waghnakh_21 well they just type stuff into my Amp CLI, it's just like this, lol
Quinn Slack tweet mediaQuinn Slack tweet media
English
7
0
13
1.6K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
@alexolegimas *and a coordinated way to plan, run, share, and compare results for all these experiments
English
0
0
1
46
Alex Imas
Alex Imas@alexolegimas·
“we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.” 🫡🫡🫡🫡
Ethan Mollick@emollick

I think agentic AI would work much better if people took lessons from organizational theory, which has actually spent a lot of time understanding how to deal with complex hierarchies, information limits, and spans of control. Right now most agentic AI systems seem to pretend that models have basically unlimited ability to manage subagents when that is clearly not true. We need measures of spans of control for AI. A human tops out at less than 10 direct reports. I am pretty sure that 100 subagents is too much for an orchestrator agent - suspect we need middle management agents (yes, I get it, insert middle management joke here). Similarly, we need more attention to boundary objects. These are what is handed between groups (marketing to IT to sales) in organizations to convey meaning as a project crosses group boundaries, like a prototype or a user story. Right now agents pass raw text & maybe code back and forth. Structured boundary objects that multiple agents of different ability levels can read and write to would solve a huge number of coordination failures & reduce token use. I also think aboht coupling, which is how tightly units inside organizations are bound. Most agentic systems are either too tightly coupled (every step needs approval) or too loose (Moltbook). This tradeoff is well-studied in organizations, I bet a lot would apply to agents. Other known issues like bounded rationality also apply, I suspect. Everyone is rushing towards the (terribly named) agent swarm, but the issue won’t just be how good the model is, it will be org design choices. I am not sure the labs see this, but we definitely need a lot more experiments with organizing agents done by people who understand real coordination issues.

English
1
4
48
6.6K
Alex Imas
Alex Imas@alexolegimas·
New post: What is the impact of AI on productivity? I review all of the studies and data that I can find and try to provide a synthesis. There’s a lot of disagreement on what we know about the productivity impact. Part of the reason for this is the disconnect between the micro and macro evidence. The micro studies overwhelmingly find positive productivity benefits (except for one notable exception), but these productivity benefits are yet to show up in the macro data. There is also a disconnect on who benefits most: micro (mostly) finds low-skill/less-experienced workers see higher returns, the (limited) macro evidence is more mixed but leans toward higher wage/higher ed people seeing more of the benefits. I discuss potential reasons for the micro-macro gap in the post and 🧵 below. Importantly, this is a living post. I will update it continuously as new data comes in. If you see something I'm missing, please let me know and I will add it. For regular updates, please consider subscribing to the substack. Here is the link: aleximas.substack.com/p/what-is-the-…
Alex Imas tweet media
English
34
142
542
195.8K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
@alz_zyd_ What humanity needs is up against what many tenure committees want
English
0
0
3
404
alz
alz@alz_zyd_·
Write papers that - if not for your intervention - it would have taken humanity 10 or 20 years to figure out, or better yet, humanity would never have figured out. Change the course of history, if only in some small way, with each paper you write
English
13
7
120
30.2K
alz
alz@alz_zyd_·
As an academic, move slowly. Don't rush your papers. If someone scoops your paper, that means humanity didn't need you to write the paper - we would have figured it out quickly, whether or not you did it, so why are you wasting your time on it?
English
13
6
220
18.1K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
@alz_zyd_ The robots can scavenge food for us while we prompt them 🧠
English
0
0
2
179
alz
alz@alz_zyd_·
@adyckovsky That's exactly why we should force them to spend it on AI instead of food
English
1
0
10
532
alz
alz@alz_zyd_·
In an ideal world all funded PhD students should get $300/month extra funding specifically for access to frontier AI models
English
12
8
211
31.9K
Ari Dyckovsky retweetledi
Séb Krier
Séb Krier@sebkrier·
The very long tail of tasks that require some human judgement or taste is often a bottleneck, and many aren't easily specifiable and amenable to automation. You can automate someone's taste in particular, but that remains a snapshot in time whose appeal depletes as preferences change, evolve, contrafict themselves over time, and the desire for individuality overtakes consumers. The problem with the long tail is that it's not a static set: not only do preferences change but historically at least, automation has generated new problem spaces rather than depleting a fixed set. People expect that at some point, "it's solved" - well the world is not a finite set of tasks and problems to solve. Almost everything people ever did in the ancient times is automated - and yet the world today now has more preferences to satiate and problems to solve than ever. The world hasn't yet shown signs of coalescing to a great unification or a fixed state! Of course it's conceivable that at sufficient capability levels, the generative process exhausts itself and preferences stabilize - but I'd be surprised.
English
20
16
180
23.7K
Sabrina Halper
Sabrina Halper@SabrinaHalper·
These tech ads make you feel something
Sabrina Halper tweet mediaSabrina Halper tweet media
English
3
1
55
5.8K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
@MichaelArnaldi Chose to do something similar and probably wouldn’t have taken this path without Effect and agents on my side
English
0
0
0
111
Michael Arnaldi
Michael Arnaldi@MichaelArnaldi·
Differently from what people think for me automated software development enables more correct design decisions and avoids lazy behaviour, for example for Accountability I am now implementing authorization, I would have never implemented full ABAC myself but I am doing it now
English
1
1
45
3.1K
Kit Langton
Kit Langton@kitlangton·
The middlemen shall not inherit the earth. We can all milk our own drivel from ChatGPT, so what worth is a proxy? Won't you aspire to something greater? Won't you take this newfound productivity and go deeper, recursing one thousand times further than you'd ever have before?
English
1
0
19
2.1K
Kit Langton
Kit Langton@kitlangton·
All of these AI workflow micro-optimizations are just that, & any utility will be absorbed into your favorite TUI in short order. If you focus on what makes code understandable to humans (single sources of truth, types, feedback loops), you'll get more effective agents for free.
English
5
2
79
3.9K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
Been thinking about this a lot lately, thanks for sharing. One of the underlying tensions with "world class" scientists is that they often get that status by excelling within institutional norms. Those norms tend to reward being right and punish being wrong. So the experts we default to are, almost by construction, systematically biased toward smaller leaps that are easier to defend (it's harder to secure grants, publish, and get tenure when taking big leaps). When something is nascent and messy, that bias often collapses uncertainty into 'never'. Meanwhile the renegades who push boundaries are optimizing for a different game, which is partly why they're less likely to show up on the list of experts in the first place. Curious about the framing of responses they gave: When they told you it would never work, was it a genuine hard constraint (e.g. speed of light is a strict upper bound)? Or more of an "I can't see a path to proof from here" given the norms and tools they're used to?
English
0
0
1
17
villi
villi@villi·
Experts are a double-edged sword. They are super helpful in helping you assess tech bc of their expertise in their field. But, they can mislead you bc of their inability to imagine how something nascent can mature, or extrapolate the progress a small team can make over time.
English
5
1
9
1.6K
villi
villi@villi·
The biggest mistake I made in the past year is not backing a team wanting to do something extremely hard. Every expert we spoke with in the field (world class scientist) told us it will never work. We could not find any external validation. I regret it. Back the dreamers.
English
7
2
61
5.4K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
I’m imagining the value increases for this as repos scale, especially because maintainers likely have the optimal setup for agents working on their repos. Would much rather pay to generally improve the core project vs pay to make a one-off local improvement that ends up out of sync/a PR that goes unused. Plus you could pool multiple user payments toward resolving bigger issues, so there’s a chance for collective buy-in on any given solution.
English
0
0
1
22
sam
sam@samgoodwin89·
@adyckovsky My thinking was that users are already willing to pay for tokens. Throwing some of those at a repo isn’t different than cloning and using an agent locally. Developer cut is to pay for the cost to review and maintenance. Ai slop PRs are a problem, esp. for big repos.
English
1
0
3
120
sam
sam@samgoodwin89·
Could AI help solve open source revenue? What if a repo earned money by charging for an AI to solve issues? A user would be paying for tokens anyway. Why not send some revenue to the maintainer as a margin?
English
1
0
6
2K
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
Fascinating to watch software engineering culture split into identity-based camps as agentic technologies evolve. It’s less about methods now and more about who feels threatened, who overestimates themselves in comparison, and who leans into collaboration. I know which camp I’d bet on.
English
0
0
3
130
Ari Dyckovsky
Ari Dyckovsky@adyckovsky·
Absolutely, on the same page. The value gained from data integrations and unique context goes much further than whatever interface they overlay it with. Makes Bloomberg’s a great example case for the point you’re making, and there are many other businesses that we could probably say the same for
English
0
0
1
84
Michael Arnaldi
Michael Arnaldi@MichaelArnaldi·
@adyckovsky If I was trading today I would pay for a Bloomberg subscription and I would not use the terminal, I am not saying Bloomberg doesn't have a business (they have a fantastic one)
English
2
0
1
215