Derek Chen

2.1K posts

Derek Chen

@derekchen14

AI Research & Eng in Conversational AI. Building at @teamsundial Prev: founder @soleda_ai, also @columbia, @UW, @stanfordnlp, @UCBerkeley

San Francisco, CA Katılım Nisan 2009

308 Takip Edilen662 Takipçiler

Sabitlenmiş Tweet

Derek Chen@derekchen14·22 Tem

Practical AGI is achievable already, but requires 3 changes to the current LLM tool-calling approach: 1. Tools assume that all information is already available in the prompt, but users in the real world are rarely so forthcoming. Consequently, we should build each tool to assume that details are missing by default, which is then solved through a continuous slot-filling exercise, rather than placing the onus on the user to provide everything upfront. 2. Moreover, each 'tool' should actually be its own specially trained module, which is able to provide outputs in addition to taking action (such as notifying partially successful actions, rather than just returning a final result). Each module must be modified to intrinsically handle ambiguity by establishing its own expectations about reasonable inputs and outputs. This (Bayesian) prior is baked in by humans, which allows us to control it. 3. Lastly, each module is a single node within a graph, operating as a federated system. There is no single monolithic entity controlling all the tools, but simply an orchestration node which operates just like any other module in the network. This allows exponential scaling in intelligence as you add additional modules. We already have something similar with MoE, but the key difference is that these expert modules are programmable and interpretable, rather than black boxes. When we recognize that most users in reality are unwilling to learn proper prompting techniques, we can then embrace the chaos by building a system that is robust to failure and capable of continuous learning. Luckily, there are no further research breakthroughs to start moving in this direction. More details to be revealed soon, please comment below to poke holes or provide feedback!

English

425

Derek Chen retweetledi

Lossfunk@lossfunk·19 Mar

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English

152

286

2.2K

1.3M

Derek Chen retweetledi

Lukasz Olejnik@lukOlejnik·10 Mar

Amazon is holding a mandatory meeting about AI breaking its systems. The official framing is "part of normal business." The briefing note describes a trend of incidents with "high blast radius" caused by "Gen-AI assisted changes" for which "best practices and safeguards are not yet fully established." Translation to human language: we gave AI to engineers and things keep breaking? The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off. AWS spent 13 hours recovering after its own AI coding tool, asked to make some changes, decided instead to delete and recreate the environment (the software equivalent of fixing a leaky tap by knocking down the wall). Amazon called that an "extremely limited event" (the affected tool served customers in mainland China).

English

967

3.2K

18.8K

29.8M

Derek Chen retweetledi

Josh Kale@JoshKale·7 Mar

An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times

Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English

402

2.8K

10.5K

1.4M

Derek Chen retweetledi

Hōrōshi バガボンド@KatanaLarp·6 Mar

x.com/i/article/2029…

ZXX

138

562

4.5K

1.9M

Derek Chen retweetledi

Xinyu Yang@Xinyu2ML·3 Mar

Qwen delivered the best open-source models across sizes and modalities, for both academia and industry. And the response? Replace the excellent leader with a non-core people from Google Gemini, driven by DAU metrics. If you judge foundation model teams like consumer apps, don’t be surprised when the innovation curve flattens.

Junyang Lin@JustinLin610

me stepping down. bye my beloved qwen.

English

1.2K

235.2K

Derek Chen retweetledi

Zora Wang@ZhiruoW·3 Mar

AI agents are tackling more and more "human work" But are they benchmarked on the work people actually do? tl;dr: Not really Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere. 📒 We built a database linking agent benchmarks & real-world work Submit new tasks + agent trajectories today 🧵

English

401

60.6K

Derek Chen retweetledi

Junyang Lin@JustinLin610·3 Mar

me stepping down. bye my beloved qwen.

English

1.7K

732

13.5K

6.5M

Derek Chen retweetledi

Robert Youssef@rryssf_·2 Mar

"AI agents are getting smarter every month." Princeton tested 14 models across 500 runs and found the opposite. accuracy is climbing. reliability is flat. 18 months of frontier development. almost zero improvement in whether these systems behave consistently. the benchmarks are lying to you.

English

205

14.1K

Derek Chen retweetledi

Taelin@VictorTaelin·27 Şub

Ok, I think my experiment leaving AI working on stuff 24/7 ends here. It doesn't work. Code explodes in complexity, results are not that great, the AI can't get past hard walls (it is still completely unable to even *grasp* SupGen), and it is insanely expensive (spent ~1k over the last 2 days). The best results are on the JS compiler, mostly because it is familiar (compared to inets), but not worth losing control over the codebase. I think the dream of having AI's working on the background and making real progress on things that matter (i.e., truly new things) isn't here yet. It is still a machine hard-stuck on its own training data, incapable of thinking out of the box. It is great for building things that were already built. But not new things Also coding normally has the under-appreciated advantage that you're doing two things at the same time: building a codebase *and* learning it. AI's do only half of that. The other half is obviously impossible 🤔

English

220

256

341.3K

Derek Chen retweetledi

Hieu Pham@hyhieu226·26 Şub

I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.

English

1.1K

409

14K

1.2M

Derek Chen retweetledi

Kevin Roose@kevinroose·25 Oca

i follow AI adoption pretty closely, and i have never seen such a yawning inside/outside gap. people in SF are putting multi-agent claudeswarms in charge of their lives, consulting chatbots before every decision, wireheading to a degree only sci-fi writers dared to imagine. people elsewhere are still trying to get approval to use Copilot in Teams, if they're using AI at all. it's possible the early adopter bubble i'm in has always been this intense, but there seems to be a cultural takeoff happening in addition to the technical one. not ideal!

English

666

450

5.9K

2.6M

Derek Chen@derekchen14·25 Oca

Today, Opus 4.5 confidently gave responses where (a) Anthony Davis performed poorly on the Lakers (b) couldn't find a specific streamer that I found in two searches (c) Gave a summary of a well-known book w/ hallucinated chapters. Not sure where we stand in 2026, but it's not AGI

English

Derek Chen retweetledi

Lain on the Blockchain@CryptoCyberia·22 Oca

The most correct take on AI coding agents I have seen. You may not like it, but this is the truth of the matter.

English

196

371

3.8K

148.8K

Derek Chen retweetledi

Meghan Bobrowsky@MeghanBobrowsky·21 Oca

We got the inside scoop on what went down between Mira and Barret. A few details from our reporting: -Mira found out about his relationship last summer -Barret went on a break and came back to IC role with reduced managerial responsibilities -during Monday meeting ... (cont.)

English

1.3K

632.2K

Derek Chen retweetledi

Haider.@haider1·18 Oca

Computer scientist Judea Pearl: There are mathematical limits to LLMs that cannot be crossed by scaling alone LLMs don't discover world models from raw data; they merely summarize the interpretations humans have already written down "this path is not the way to get AGI"

English

512

1.5K

10.6K

621.7K

Derek Chen retweetledi

Markov@MarkovMagnifico·18 Oca

how my codebase written entirely with claude code runs

English

699

3.2K

63.8K

4.4M

Derek Chen retweetledi

Yuchen Jin@Yuchenj_UW·17 Oca

I’m starting to think Anthropic might win simply by: not having drama. Turns out that’s the rarest trait in frontier AI labs. No lawsuits. No co-founder departures. No ads. No undisclosed relationships. Just brutal focus on coding. You have to admit it. Dario built a cult.

English

268

176

5.1K

165.8K

Derek Chen@derekchen14·18 Oca

Surprisingly good take

Forrest Knight@ForrestPKnight

Honestly, Ben Affleck actually knowing AI and the landscape caught me off guard, but as a writer, makes sense. Great takes across the board.

English

Derek Chen retweetledi

Susan Zhang@suchenzang·15 Oca

brutal

Norsk

1.6K

395.9K

Derek Chen retweetledi

prerat@prerat·4 Oca

while my friends played starcraft, i studied the compiler. and now you come to me ... wait hold on your saying niche computer knowledge is commoditized now and what matters is coordinating a bunch of agents with lots of quick task switching and high APM

English

223

4.5K

170.2K

Keşfet

@OpenAI @xai @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA