Ben Cohen

408 posts

Ben Cohen

@blc_16

spends too much time watching football and coding. Prev @Meta, @Microsoft engineer

เข้าร่วม Haziran 2014

2.2K กำลังติดตาม146 ผู้ติดตาม

Ben Cohen@blc_16·13h

@bcherny Time to cancel my claude subscription!

English

2.3K

Boris Cherny@bcherny·13h

Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.

English

1.4K

599

7.2K

3.4M

Ben Cohen@blc_16·1d

AGI is six months away but you need to switch to extra-lower-thinking mode before asking a tough question or your daily rations will run out immediately

Lydia Hallie ✨@lydiahallie

Digging into reports, most of the fastest burn came down to a few token-heavy patterns. Some tips: • Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start. • Lower the effort level or turn off extended thinking when you don't need deep reasoning. Switch at session start. • Start fresh instead of resuming large sessions that have been idle ~1h • Cap your context window, long sessions cost more CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000 We're rolling out more efficiency improvements, make sure you're on the latest version. If a small session is still eating a huge chunk of your limit in a way that seems unreasonable, run /feedback and we'll investigate

English

110

Ben Cohen@blc_16·2d

@mil000 This is actually crazy😭

English

1.8K

Milo Smith@mil000·3d

Why does a physical lamp, like a source of light, need a logo wall

English

1.3K

87.6K

Ben Cohen@blc_16·3d

@Manderljung working on this Markus, would love to chat

English

Markus Anderljung@Manderljung·3d

Would be keen on folks' takes. I'm sure I've got some stuff wrong!

English

202

Markus Anderljung@Manderljung·3d

Frontier AI companies already outsource lots of evaluation work to METR, Apollo, and others. I think a similar logic extends to safeguards development – classifiers, data generation, KYC. And that such outsourcing is could be both in companies' interest and good for the world.

English

106

Ben Cohen@blc_16·3d

@levie Just use gstack /review Aaron

English

Aaron Levie@levie·3d

The ultimate rate limiter on productivity gains from agents will be on critical stuff like security, compliance, governance, the ability to review the work of the agent, ensure that it’s compatible with regulations, and so on. We’ve been living in a little bit of la-la land around how much software enterprises are going to ultimately want to vibe code themselves. The last 48 hours represents a good example of why you won’t take on every risk of every piece of technology in your enterprise. There’s no free lunch with AI productivity. Companies will have the build up the systems, processes, and controls for ensuring that agents can’t run around and do anything they want on any data at any time.

sarah guo@saranormous

x.com/i/article/2039…

English

383

108.6K

Ben Cohen@blc_16·5d

@adkravetz @part_harry_ @siddarthv66 it's interesting how social media just uses user engagement but AI companies seem to be avoiding it. Would guess there are concerns with sycophancy

English

Alex Kravetz@adkravetz·5d

@blc_16 @part_harry_ @siddarthv66 Agreed. To me that's the actual interesting part. Online learning isn't solved, so much as cursor seems to have modeled a decent (but not perfect) reward signal

English

Harry Partridge@part_harry_·5d

Human-in-the-loop RL is necessarily done at group size 1; you cannot do a group of rollouts with only one human. i.e. there is no baseline for you to subtract for each input prompt. This is by far the most interesting and under-discussed part of this announcement. The same was true for their tab-completions model. From the wording in their posts, it sounds like they are using plain REINFORCE (no mention of value functions) with a large batch size + re-evaluating each checkpoint to guard against high variance. Cursor is implicitly revealing an important empirical result: with a large enough batch size, simple REINFORCE just works, no baseline needed. In other words, large scale continual learning is solved.

Cursor@cursor_ai

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.

English

255

40.3K

Ben Cohen@blc_16·6d

@t_blom Does this include token spend for customers or just internal token spend?

English

Tom Blomfield@t_blom·28 Mar

The responses to this are split: 70%: You are stupid, this will never happen, and 30%: This already happened at my startup

Tom Blomfield@t_blom

By the end of 2026, I predict token spend will be greater than engineering salaries at early stage startups.

English

530

75.6K

Ben Cohen@blc_16·6d

@jxnlco unc got bars

English

jason liu@jxnlco·6d

The happiest person I knew killed themselves four years ago. The more I tell people that I’m anxious or neurotic the more I realize deep down I’m doing pretty good for myself There’s so much friction in trying to mask everything

English

358

56.3K

Ben Cohen@blc_16·6d

@samhogan The urge to update my linkedin to work there so all my normie friends who have never heard of it get so confused

English

1.3K

Sam Hogan 🇺🇸@samhogan·6d

I can’t stop thinking about the fact that they named it Post Hog

English

406

46.5K

Ben Cohen@blc_16·6d

@darrenangle @willccbb Lovable for domain specific models

English

163

darren@darrenangle·6d

it's easier and easier to imagine a world where eng teams use a frontier harness + agent-friendly libraries like primeintellect to vibetrain 4B-20B domain-specific agents with RL and slap them into private, composable workflows

English

Ben Cohen@blc_16·6d

@illscience But its harder to even get in front of the customer in the theoretical end state. Not saying thats true yet however people are starting to notice it on X and other platforms

English

Anish Acharya@illscience·6d

@blc_16 yes agree you can’t build stuff faster than the customer can digest it, but that’s surely more stuff than we were shipping previously

English

143

Anish Acharya@illscience·6d

End of Prioritization I’ve been thinking about the tension between exploitation and exploration lately - mathematically best described by the multi arm bandit problem. You can’t do everything because trying something has a cost. Just as so many other laws of physics are changing with AI, I think this one is about to change too. For any intelligence+execution bound work you can imagine the cost of exploitation (trying something) is rapidly approaching zero (modulo inference). In that world, the value of exploration goes up dramatically — you can simply try more things. This is a broad, important concept that applies to thousands of trade-offs in companies and society that we previously took as immutable. It also tells you something about where value accrues in the future. People who can identify compelling new paths to explore will have far more value to add than people who are experts at specialized exploitation of known paths. I have a feeling this might even have implications for the multi-armed bandit problem in the formal mathematical sense, but that’s a bit beyond my expertise. Think about a growth team that A/B tests two landing pages a week because each variant costs real design and eng time — now they test fifty. Or a product team that agonizes over which feature to build next because they can only ship one — now they build all of them and let users decide. It’s like Monte Carlo simulation for everything, except you’re not simulating — you’re actually doing it. Every path gets run. Prioritization as we know it is obsolete. You don’t pick what to do — you do all of it. The only art left is knowing which bandits are worth arming.

English

210

17.9K

Ben Cohen@blc_16·6d

@futurenomics @santa_kaus @ThriveCapital The jestermaxxing mascot

English

Sam@futurenomics·28 Mar

@santa_kaus @ThriveCapital What’s it even supposed to be?

English

1.1K

Kaustubh Prabhakar@santa_kaus·27 Mar

promote the designer at @ThriveCapital new flag looks so good

English

455

44.8K

Ben Cohen@blc_16·6d

@guohao_li maybe we shouldnt train our models on that

English

Guohao Li 🐫@guohao_li·6d

You never know how wildly creative people get with data

Chroma@trychroma

Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source

English

3.7K

Ben Cohen@blc_16·27 Mar

The final micromanagement boss

Tanay Kothari@tankots

My team's productivity 2x’d when I started spending one day a month doing what no other CEO does. I can tell you what any of my 45 employees do from 7am to 7pm. Pick anyone on my team. CMO. Support agent. Engineer who started last week. I'll tell you how they start their morning. What gives them energy. What's blocking their best work. What drives them in life. Not from a performance review. From actually sitting next to them and watching them work. Here's why: Most CEOs see their team struggling and bring in consultants. I bring a notebook. I sit with each person. Shadow them for hours. See what their actual day looks like. Not what I think it looks like - what it really is. Then we map the blockers together. And fix them. This isn't about managing them. It's about removing their friction. When I do this, something changes. They don't feel criticized; they feel seen and empowered. The 2x productivity boost is nice. But that's not why I do it. I do it because they stop being employees. They become teammates I actually understand and trust. Here's the truth: Your product will change. Your market will change. Your strategy will change. The only thing that stays with you is your people. So if you're going to prioritize anything, prioritize understanding them. The best CEOs aren't always the smartest people in the room. They're the ones who know exactly what's preventing everyone else from being brilliant. That's the job. — Written with Wispr Flow

English

232

Ben Cohen@blc_16·27 Mar

@lateinteraction Best source of alpha on X

English

127

Omar Khattab@lateinteraction·27 Mar

I find it disappointing to see how much progress comes daily from late interaction, DSPy/GEPA, and RLMs but our slow industry only catches up after a lab person slaps a name like “autoresearch” or “deep research” on it lol. The future is already here, just not equally distributed

English

361

23.1K

Omar Khattab@lateinteraction·27 Mar

recursive language models at work!

Agentica@agenticasdk

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

319

44.9K

Ben Cohen@blc_16·27 Mar

@Kaz_Khadem Maybe because every incubator application asks them what they are top 1% in the world at

English

138

5.3K

Kasra Khadem@Kaz_Khadem·27 Mar

Young gen z founders in SF today are blatantly embellishing their accomplishments / straight up lying about themselves to VCs to an extent I’ve never seen before… “I spearheaded the AI roll-out of—“ bro you were a freshman summer intern at Meta what are you talking about

English

2.4K

162.1K

Ben Cohen@blc_16·23 Mar

@danshipper This sounds awful for the architect lol

English

Dan Shipper 📧@danshipper·23 Mar

new model for engineering team structure in 2026: 2 people only one pirate and one architect the pirate's job is to move as fast as possible to develop valuable, shipped product features by vibe coding. the architect's job is to turn the product surface discovered by the pirate into a reliable, structured machine—also by vibe coding, but at a slower, more well-reasoned pace. every product needs a pirate but most product's only need an architect once they some form of PMF, and in that case they usually don't need one full-time. architects can work across many codebases and solve interesting technical challenges. pirates go hard on a product that they own end-to-end.

English

340

297

4.5K

619.4K

Ben Cohen@blc_16·22 Mar

@jamesonhaslam I assumed it was because every other commercial I see is now either a Cursor or a Lovable commercial

English

223

jameson (big deck energy)@jamesonhaslam·22 Mar

I have a question about Twitter When you see 5+ voices (gurley, a16z etc) come out on the same day saying the same thing (paid customer acquisition is for losers?) is someone paying them to say it? Was there some event I missed? Help me understand

Olivia Moore@omooretweets

In consumer, paid ads generally = lack of true product market fit I have yet to see a generational startup with largely paid ad-driven growth…

English

241

71.7K

ค้นพบ

@bcherny @mil000 @Manderljung @levie @adkravetz @part_harry_ @siddarthv66 @t_blom