Ben Dickson

10.1K posts

Ben Dickson

@bendee983

Software Engineer | Tech analyst | Thinker | Student of life | Founder of @bdtechtalks

In a private namespace Entrou em Ağustos 2015

655 Seguindo4.8K Seguidores

Ben Dickson@bendee983·2h

Training reasoning agents has long been stuck between two tricky options: 1- Reinforcement learning with verifiable rewards (RLVR), which is low-cost but also tricky due to sparse rewards. 2- On-Policy Distillation (OPD) from larger models, which provides granular feedback on responses but is costly because it requires a large teacher. A third option is on-policy self-distillation (OPSD), which solves the costs of OPD but results in low-quality training due to information leakage from the teacher to the student. RLVR with Self-Distillation (RLSD), a new technique by researchers at JD.com, addresses this problem by making small changes to the self-distillation process. It uses the sparse signal from the verfiable reward to determine the direction of the update (i.e., whether to reinforce or penalize a behavior). And it uses the signal from the self-distillation to determine the magnitude of the update (i.e., how much relative credit or blame a specific step deserves). The result: RLSD at 200 training steps already beats RLVR with GRPO trained for 400 steps while avoiding the costs of OPD and the poor training quality of OPSD.

VentureBeat@VentureBeat

How to build custom reasoning agents with a fraction of the compute venturebeat.com/ai/how-to-buil…

English

Ben Dickson@bendee983·3h

Anthropic's loss is OpenAI's gain... for now. But don't be fooled. This is not a sustainable process. Eventually, they will face the same problem as Anthropic, especially if their models are as large as empirical research shows (GPT-5.5 being ~9.7T params). Prepare yourself for token scarcity.

Tibo@thsottiaux

Don't just reset Codex rate limits for fun, it costs money. Don't just reset Codex rate limits for fun, it costs money. ... but the vibes are good ... I have reset Codex rate limits for ALL paid plans to celebrate a good week and allow everyone to build more with GPT-5.5. Enjoy

English

Ben Dickson@bendee983·9h

Exactly. We haven't even scratched the surface of the space of possible and useful software. And if the cost of building applications drops, demand will only grow. And note that everyone can code now, but the people who can ship production-level code are software engineers. Some companies have realized this already. Some will do so soon.

English

John Crickett@johncrickett·14h

AI is going to create more demand for software engineers. AWS CEO says they're hiring as many as ever. It makes sense. There's so much software that could be written, apps that could be improved, new games that could be created, processes that could be automated. If creating software becomes cheaper / quick demand goes up. Jevons Paradox in action.

Shay Boloor@StockSavvyShay

$AMZN AWS CEO pushed back on the idea that AI is killing software jobs by saying Amazon is hiring as many developers as ever. He said AI agents are “exploding” across every industry & moving faster than expected changing the developer job rather than eliminating it.

English

7.3K

Ben Dickson@bendee983·10h

The undiscussed social impact of AI coding agents 😂

beginbot 🃏@beginbot

I told my gf I can't hang out right now Github is up, so I have to work don't know when I'll get this chance again

English

Ben Dickson@bendee983·15h

Previously, anyone who could fire up an IDE and write code called themselves a software developer/engineer. And for someone looking from outside, it was difficult to tell the difference between a coder and engineer. In reality, there is a lot more to building software than just writing code. And with LLMs writing code, all those non-coding disciplines are becoming much more important.

Fernando@Franc0Fernand0

When people claim that LLMs will replace software engineers, it either indicates a lack of understanding of LLMs or of software engineering. But if your only definition of software engineering is related to feature development, there is potential to believe that LLMs can replace developers.

English

586

Ben Dickson@bendee983·17h

Acquisition incoming

Exa@ExaAILabs

We're excited to partner with Google to offer Grounding With Exa inside of Gemini models! Using Exa's agent-first search, Gemini models can now access billions of websites, technical docs, papers, people, companies, and more. 10^18🤝10^100

English

Ben Dickson@bendee983·17h

The growing costs of closed frontier models and unpredictable outages are creating opportunities for a new segment of the market. Local AI models in IDEs, in particular, will be an interesting space to watch. The use case is specialized enough to require smaller parametric knowledge (i.e., general knowledge from the world), making it a good space for small language models. One of the challenges, however, is how to make it work on the wide range of devices that constitute the install base (different processors, memory capacity, etc.). A possible solution is to provide a range of options, from self-hosted to low-cost cloud-hosted (e.g., by JetBrains) to frontier models (e.g., Claude Opus 4.7). It will be very interesting to see how this plays out. But the end of AI subsidies is creating new market dynamics.

Kirill Skrygan@kskrygan

Would you be interested if JetBrains releases a totally local AI agent, working 100% on your laptop, using our code insight engine and deeply integrated into the IDE? Yes, it will be probably 1 month behind the very recent frontier models, but no token blood bath anymore WDYT?

English

308

Ben Dickson@bendee983·1d

Be careful what kind of information you give Claude Code. Your API keys and other sensitive information might end up in your codebase (e.g., when you choose “allow always” with sensitive data in a CLI command) and be shipped to a repository. And no, the normal safeguards don't detect it.

TechTalks@bdtechtalks

A new study reveals how AI coding assistants like Claude Code are quietly hoarding and publishing sensitive API keys to code repositories. bdtechtalks.com/2026/04/27/cla…

English

149

Ben Dickson@bendee983·1d

@his_eminence_j "No AI" will be the new "bio" and "gluten-free" label.

English

245

Uncle Milty’s Ghost@his_eminence_j·1d

Very soon, “no AI was used” will be a premium service category in nearly every industry. Mark this.

English

481

3.4K

29K

432.7K

Ben Dickson@bendee983·1d

Ironically, the company that is supposed to drive software engineers out of work can't keep its own software running.

English

Ben Dickson@bendee983·1d

@MatthewBerman Dario was on Dwarkesh, downplaying the need for compute build-up. It didn't turn out well.

English

280

Matthew Berman@MatthewBerman·1d

Guess who's having their quotas reduced!

English

388

43K

Ben Dickson@bendee983·1d

You spend up to 3x more tokens when you use Claude in non-English languages. And ironically, in countries where these languages, paying the $20 or $100-200 subscription accounts for a larger percentage of the median monthly wages. AI was supposed to democratize access to tech and intelligence. But access to frontier AI is not evenly distributed.

English

Ben Dickson@bendee983·1d

@jangiacomelli Unfortunately, that seems to be how some people are using them x.com/lifeof_jer/sta…

JER@lifeof_jer

x.com/i/article/2048…

English

Jan Giacomelli@jangiacomelli·1d

That's a very low bar ...

Ben Dickson@bendee983

If you don't have: 1- A backup system for your company's data 2- Separate development and production environments You should under no means deploy AI coding agents in your company.

English

144

Ben Dickson@bendee983·1d

Anthropic is becoming a business risk: - Claude Code limits are unpredictable - Model behavior changes are frustrating because you don't know what's happening behind the scenes (to be fair, this is also true of OpenAI) - Several companies have had their entire organizations banned with no recourse Claude is still a good model and Claude Code is a super-useful tool, but don't tie your operations to it until they show some stability. Explore and develop with Claude, ship with open models.

Om Patel@om_patel5

ANTHROPIC JUST BANNED A 110 PERSON COMPANY OVERNIGHT WITHOUT WARNING monday morning at an agricultural tech company, every single employee wakes up to an email saying their claude account has been suspended 110 people locked out at the same time with zero warning and the email even pretended it was an individual ban with a link to a personal appeal form it took them 10 minutes on slack to realize the entire org had been wiped at once. not even the account admins were told it was coming they submitted the appeal form and got no response, even after 36 hours later there was still nothing AND it gets worse: > their separate API account is still active and still billing them > their admins can't log in to view usage or billing because the email addresses are banned > they got hit with a renewal invoice the day AFTER the team account was suspended > they have no idea what triggered it. fertilizer conversations? GPS satellites? agriculture in general? so they're paying anthropic to get banned by anthropic while anthropic ignores their support tickets the founder of the company laid out the bigger problem perfectly banning an entire organization for one user's behavior means a single employee or careless intern can revoke claude access for your whole business. there's no per seat guardrail, no admin override, no way to limit the ban radius his words: "you have to ask yourself if this is a platform you can entrust your daily workflows to as a business" every founder reading this who runs claude through their company should be checking right now what their actual exposure looks like billion dollar AI company with zero enterprise customer support

English

165

Ben Dickson@bendee983·1d

@arankomatsuzaki And ironically, in those locations, paying the $20 or $100-200 subscription accounts for a larger percentage of the median monthly wages. AI was supposed to democratize access to tech and intelligence. But access to frontier AI is not evenly distributed.

English

6.4K

Aran Komatsuzaki@arankomatsuzaki·1d

The non-English tax is real. Sutton's Bitter Lesson, translated across languages and normalized to OpenAI English token count: Hindi: OpenAI 1.37×, Anthropic 3.24× Arabic: OpenAI 1.31×, Anthropic 2.86× Chinese: OpenAI 1.15×, Anthropic 1.71× Claude’s tokenizer charges a much higher linguistic tax.

English

241

1.5K

781.4K

Ben Dickson@bendee983·1d

Could kind of be one of the reasons Microsoft was not super thrilled about being the exclusive cloud provider of OpenAI?

Polymarket@Polymarket

JUST IN: OpenAI CFO is reportedly worried the company may not be able to pay for future compute contracts if revenue doesn’t grow fast enough.

English

134

Ben Dickson@bendee983·1d

For many tasks, LLM choice is often driven by mental cues: - If you're not using the absolute frontier model, you're probably not getting the best possible answer - If your model doesn't "think" a whole minute before answering, you can't trust the response In reality, for most tasks, a fast and cheap model like Gemini 3 Flash is more than enough. It would be interesting to run an RCT like LMArena but where you hide model names and cause fake delays and ask users to rate the answers.

English

183

Ben Dickson@bendee983·1d

One of the things DeepSeek doesn't get enough credit for is the contributions they make to LLM architectures and training algorithms with each release: - DeepSeek-R1 set the standard for training reasoning models through reinforcement learning with verifiable rewards (RLVR) - DeepSeek-v3.2 introduced DeepSeek Sparse Attention (DSA), which cuts the memory costs of long-context reasoning tasks. - And now DeepSeek-v4 brings two new KV cache optimization techniques: Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) By open sourcing these techniques, DeepSeek is helping other AI labs to build on top of them and move the field forward. For example, Kimi-K2.5 used DSA to reduce memory costs for long-running reasoning tasks. Open source FTW.

English

285

Ben Dickson@bendee983·1d

@JeffBohren And let's not forget the ultimate measure: - Number of coding agents running concurrently

English

168

Jeff Bohren@JeffBohren·2d

Here are worst SWE metrics I can think of: 1) Lines of code - the 70s called, they want their metrics back 2) PRs - easy to game, but irrelevant 3) Bug fixes - As Wally said "I'm going to code me a minivan" 4) AI Token Usage - Tells you who's the most expensive

English

176

6.7K

Descobrir

@his_eminence_j @MatthewBerman @jangiacomelli @arankomatsuzaki @elonmusk @BarackObama @taylorswift13 @cristiano