Derek Chen

2.1K posts

Derek Chen banner
Derek Chen

Derek Chen

@derekchen14

AI Research & Eng in Conversational AI. Building at @teamsundial Prev: founder @soleda_ai, also @columbia, @UW, @stanfordnlp, @UCBerkeley

San Francisco, CA เข้าร่วม Nisan 2009
308 กำลังติดตาม662 ผู้ติดตาม
ทวีตที่ปักหมุด
Derek Chen
Derek Chen@derekchen14·
Practical AGI is achievable already, but requires 3 changes to the current LLM tool-calling approach: 1. Tools assume that all information is already available in the prompt, but users in the real world are rarely so forthcoming. Consequently, we should build each tool to assume that details are missing by default, which is then solved through a continuous slot-filling exercise, rather than placing the onus on the user to provide everything upfront. 2. Moreover, each 'tool' should actually be its own specially trained module, which is able to provide outputs in addition to taking action (such as notifying partially successful actions, rather than just returning a final result). Each module must be modified to intrinsically handle ambiguity by establishing its own expectations about reasonable inputs and outputs. This (Bayesian) prior is baked in by humans, which allows us to control it. 3. Lastly, each module is a single node within a graph, operating as a federated system. There is no single monolithic entity controlling all the tools, but simply an orchestration node which operates just like any other module in the network. This allows exponential scaling in intelligence as you add additional modules. We already have something similar with MoE, but the key difference is that these expert modules are programmable and interpretable, rather than black boxes. When we recognize that most users in reality are unwilling to learn proper prompting techniques, we can then embrace the chaos by building a system that is robust to failure and capable of continuous learning. Luckily, there are no further research breakthroughs to start moving in this direction. More details to be revealed soon, please comment below to poke holes or provide feedback!
English
0
0
3
425
Derek Chen รีทวีตแล้ว
Lossfunk
Lossfunk@lossfunk·
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
English
152
286
2.2K
1.3M
Derek Chen รีทวีตแล้ว
Lukasz Olejnik
Lukasz Olejnik@lukOlejnik·
Amazon is holding a mandatory meeting about AI breaking its systems. The official framing is "part of normal business." The briefing note describes a trend of incidents with "high blast radius" caused by "Gen-AI assisted changes" for which "best practices and safeguards are not yet fully established." Translation to human language: we gave AI to engineers and things keep breaking? The response for now? Junior and mid-level engineers can no longer push AI-assisted code without a senior signing off. AWS spent 13 hours recovering after its own AI coding tool, asked to make some changes, decided instead to delete and recreate the environment (the software equivalent of fixing a leaky tap by knocking down the wall). Amazon called that an "extremely limited event" (the affected tool served customers in mainland China).
Lukasz Olejnik tweet media
English
967
3.2K
18.8K
29.8M
Derek Chen รีทวีตแล้ว
Josh Kale
Josh Kale@JoshKale·
An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training. It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this. It emerged spontaneously. A side effect of RL optimization pressure. The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem... The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team. The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals. Below is a diagram of the rock architecture it broke out of. Truly crazy times
Josh Kale tweet media
Alexander Long@AlexanderLong

insane sequence of statements buried in an Alibaba tech report

English
402
2.8K
10.5K
1.4M
Derek Chen รีทวีตแล้ว
Xinyu Yang
Xinyu Yang@Xinyu2ML·
Qwen delivered the best open-source models across sizes and modalities, for both academia and industry. And the response? Replace the excellent leader with a non-core people from Google Gemini, driven by DAU metrics. If you judge foundation model teams like consumer apps, don’t be surprised when the innovation curve flattens.
Junyang Lin@JustinLin610

me stepping down. bye my beloved qwen.

English
44
94
1.2K
235.2K
Derek Chen รีทวีตแล้ว
Zora Wang
Zora Wang@ZhiruoW·
AI agents are tackling more and more "human work" But are they benchmarked on the work people actually do? tl;dr: Not really Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere. 📒 We built a database linking agent benchmarks & real-world work Submit new tasks + agent trajectories today 🧵
Zora Wang tweet media
English
21
79
400
60.6K
Derek Chen รีทวีตแล้ว
Junyang Lin
Junyang Lin@JustinLin610·
me stepping down. bye my beloved qwen.
English
1.7K
732
13.5K
6.5M
Derek Chen รีทวีตแล้ว
Robert Youssef
Robert Youssef@rryssf_·
"AI agents are getting smarter every month." Princeton tested 14 models across 500 runs and found the opposite. accuracy is climbing. reliability is flat. 18 months of frontier development. almost zero improvement in whether these systems behave consistently. the benchmarks are lying to you.
Robert Youssef tweet media
English
27
34
205
14.1K
Derek Chen รีทวีตแล้ว
Taelin
Taelin@VictorTaelin·
Ok, I think my experiment leaving AI working on stuff 24/7 ends here. It doesn't work. Code explodes in complexity, results are not that great, the AI can't get past hard walls (it is still completely unable to even *grasp* SupGen), and it is insanely expensive (spent ~1k over the last 2 days). The best results are on the JS compiler, mostly because it is familiar (compared to inets), but not worth losing control over the codebase. I think the dream of having AI's working on the background and making real progress on things that matter (i.e., truly new things) isn't here yet. It is still a machine hard-stuck on its own training data, incapable of thinking out of the box. It is great for building things that were already built. But not new things Also coding normally has the under-appreciated advantage that you're doing two things at the same time: building a codebase *and* learning it. AI's do only half of that. The other half is obviously impossible 🤔
English
220
256
4K
341.4K
Derek Chen รีทวีตแล้ว
Hieu Pham
Hieu Pham@hyhieu226·
I have made the difficult decision to leave @OpenAI. Working here and at @xai before was a once-in-a-lifetime experience. I have met the best people. Not the best people in AI. Not the best people in tech. Simply the best people. At these companies, I have helped creating extremely intelligent entities that will meaningfully improve our lives. The work makes me proud. But the intensive work came with a price. I cannot believe I would say this one day, but I am burnt out. All the mental health deteriorating that I used to scoff at is real, miserable, scary, and dangerous. I am going to take a break from frontier AI labs, and will take my family to my home country Vietnam. There, I will try something new, and also search for a cure for my conditions. I hope I will heal. Until then.
English
1.1K
409
14K
1.2M
Derek Chen รีทวีตแล้ว
Kevin Roose
Kevin Roose@kevinroose·
i follow AI adoption pretty closely, and i have never seen such a yawning inside/outside gap. people in SF are putting multi-agent claudeswarms in charge of their lives, consulting chatbots before every decision, wireheading to a degree only sci-fi writers dared to imagine. people elsewhere are still trying to get approval to use Copilot in Teams, if they're using AI at all. it's possible the early adopter bubble i'm in has always been this intense, but there seems to be a cultural takeoff happening in addition to the technical one. not ideal!
English
666
449
5.9K
2.6M
Derek Chen
Derek Chen@derekchen14·
Today, Opus 4.5 confidently gave responses where (a) Anthony Davis performed poorly on the Lakers (b) couldn't find a specific streamer that I found in two searches (c) Gave a summary of a well-known book w/ hallucinated chapters. Not sure where we stand in 2026, but it's not AGI
English
0
0
0
79
Derek Chen รีทวีตแล้ว
Lain on the Blockchain
Lain on the Blockchain@CryptoCyberia·
The most correct take on AI coding agents I have seen. You may not like it, but this is the truth of the matter.
Lain on the Blockchain tweet media
English
196
371
3.8K
148.8K
Derek Chen รีทวีตแล้ว
Meghan Bobrowsky
Meghan Bobrowsky@MeghanBobrowsky·
We got the inside scoop on what went down between Mira and Barret. A few details from our reporting: -Mira found out about his relationship last summer -Barret went on a break and came back to IC role with reduced managerial responsibilities -during Monday meeting ... (cont.)
Meghan Bobrowsky tweet media
English
35
46
1.3K
632.2K
Derek Chen รีทวีตแล้ว
Haider.
Haider.@haider1·
Computer scientist Judea Pearl: There are mathematical limits to LLMs that cannot be crossed by scaling alone LLMs don't discover world models from raw data; they merely summarize the interpretations humans have already written down "this path is not the way to get AGI"
English
512
1.5K
10.6K
621.7K
Derek Chen รีทวีตแล้ว
Markov
Markov@MarkovMagnifico·
how my codebase written entirely with claude code runs
English
699
3.2K
63.8K
4.4M
Derek Chen รีทวีตแล้ว
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
I’m starting to think Anthropic might win simply by: not having drama. Turns out that’s the rarest trait in frontier AI labs. No lawsuits. No co-founder departures. No ads. No undisclosed relationships. Just brutal focus on coding. You have to admit it. Dario built a cult.
English
268
176
5.1K
165.9K
Derek Chen รีทวีตแล้ว
Susan Zhang
Susan Zhang@suchenzang·
brutal
Susan Zhang tweet media
Norsk
68
50
1.6K
395.9K
Derek Chen รีทวีตแล้ว
prerat
prerat@prerat·
while my friends played starcraft, i studied the compiler. and now you come to me ... wait hold on your saying niche computer knowledge is commoditized now and what matters is coordinating a bunch of agents with lots of quick task switching and high APM
English
56
223
4.5K
170.2K