Ben Cohen

408 posts

Ben Cohen

Ben Cohen

@blc_16

spends too much time watching football and coding. Prev @Meta, @Microsoft engineer

เข้าร่วม Haziran 2014
2.2K กำลังติดตาม146 ผู้ติดตาม
Ben Cohen
Ben Cohen@blc_16·
@bcherny Time to cancel my claude subscription!
English
2
0
50
2.3K
Boris Cherny
Boris Cherny@bcherny·
Starting tomorrow at 12pm PT, Claude subscriptions will no longer cover usage on third-party tools like OpenClaw. You can still use these tools with your Claude login via extra usage bundles (now available at a discount), or with a Claude API key.
English
1.4K
599
7.2K
3.4M
Milo Smith
Milo Smith@mil000·
Why does a physical lamp, like a source of light, need a logo wall
Milo Smith tweet media
English
43
11
1.3K
87.6K
Markus Anderljung
Markus Anderljung@Manderljung·
Would be keen on folks' takes. I'm sure I've got some stuff wrong!
English
4
0
2
202
Markus Anderljung
Markus Anderljung@Manderljung·
Frontier AI companies already outsource lots of evaluation work to METR, Apollo, and others. I think a similar logic extends to safeguards development – classifiers, data generation, KYC. And that such outsourcing is could be both in companies' interest and good for the world.
Markus Anderljung tweet media
English
3
19
106
7K
Aaron Levie
Aaron Levie@levie·
The ultimate rate limiter on productivity gains from agents will be on critical stuff like security, compliance, governance, the ability to review the work of the agent, ensure that it’s compatible with regulations, and so on. We’ve been living in a little bit of la-la land around how much software enterprises are going to ultimately want to vibe code themselves. The last 48 hours represents a good example of why you won’t take on every risk of every piece of technology in your enterprise. There’s no free lunch with AI productivity. Companies will have the build up the systems, processes, and controls for ensuring that agents can’t run around and do anything they want on any data at any time.
sarah guo@saranormous

x.com/i/article/2039…

English
71
54
383
108.6K
Ben Cohen
Ben Cohen@blc_16·
@adkravetz @part_harry_ @siddarthv66 it's interesting how social media just uses user engagement but AI companies seem to be avoiding it. Would guess there are concerns with sycophancy
English
1
0
1
72
Alex Kravetz
Alex Kravetz@adkravetz·
@blc_16 @part_harry_ @siddarthv66 Agreed. To me that's the actual interesting part. Online learning isn't solved, so much as cursor seems to have modeled a decent (but not perfect) reward signal
English
1
0
0
55
Harry Partridge
Harry Partridge@part_harry_·
Human-in-the-loop RL is necessarily done at group size 1; you cannot do a group of rollouts with only one human. i.e. there is no baseline for you to subtract for each input prompt. This is by far the most interesting and under-discussed part of this announcement. The same was true for their tab-completions model. From the wording in their posts, it sounds like they are using plain REINFORCE (no mention of value functions) with a large batch size + re-evaluating each checkpoint to guard against high variance. Cursor is implicitly revealing an important empirical result: with a large enough batch size, simple REINFORCE just works, no baseline needed. In other words, large scale continual learning is solved.
Cursor@cursor_ai

Earlier this week, we published our technical report on Composer 2. We're sharing additional research on how we train new checkpoints. With real-time RL, we can ship improved versions of the model every five hours.

English
12
23
255
40.3K
Ben Cohen
Ben Cohen@blc_16·
@t_blom Does this include token spend for customers or just internal token spend?
English
0
0
0
21
jason liu
jason liu@jxnlco·
The happiest person I knew killed themselves four years ago. The more I tell people that I’m anxious or neurotic the more I realize deep down I’m doing pretty good for myself There’s so much friction in trying to mask everything
jason liu tweet media
English
13
7
358
56.3K
Ben Cohen
Ben Cohen@blc_16·
@samhogan The urge to update my linkedin to work there so all my normie friends who have never heard of it get so confused
English
1
0
1
1.3K
Sam Hogan 🇺🇸
Sam Hogan 🇺🇸@samhogan·
I can’t stop thinking about the fact that they named it Post Hog
English
23
5
406
46.5K
darren
darren@darrenangle·
it's easier and easier to imagine a world where eng teams use a frontier harness + agent-friendly libraries like primeintellect to vibetrain 4B-20B domain-specific agents with RL and slap them into private, composable workflows
English
4
5
80
5K
Ben Cohen
Ben Cohen@blc_16·
@illscience But its harder to even get in front of the customer in the theoretical end state. Not saying thats true yet however people are starting to notice it on X and other platforms
English
0
0
0
24
Anish Acharya
Anish Acharya@illscience·
@blc_16 yes agree you can’t build stuff faster than the customer can digest it, but that’s surely more stuff than we were shipping previously
English
1
0
0
143
Anish Acharya
Anish Acharya@illscience·
End of Prioritization I’ve been thinking about the tension between exploitation and exploration lately - mathematically best described by the multi arm bandit problem. You can’t do everything because trying something has a cost. Just as so many other laws of physics are changing with AI, I think this one is about to change too. For any intelligence+execution bound work you can imagine the cost of exploitation (trying something) is rapidly approaching zero (modulo inference). In that world, the value of exploration goes up dramatically — you can simply try more things. This is a broad, important concept that applies to thousands of trade-offs in companies and society that we previously took as immutable. It also tells you something about where value accrues in the future. People who can identify compelling new paths to explore will have far more value to add than people who are experts at specialized exploitation of known paths. I have a feeling this might even have implications for the multi-armed bandit problem in the formal mathematical sense, but that’s a bit beyond my expertise. Think about a growth team that A/B tests two landing pages a week because each variant costs real design and eng time — now they test fifty. Or a product team that agonizes over which feature to build next because they can only ship one — now they build all of them and let users decide. It’s like Monte Carlo simulation for everything, except you’re not simulating — you’re actually doing it. Every path gets run. Prioritization as we know it is obsolete. You don’t pick what to do — you do all of it. The only art left is knowing which bandits are worth arming.
English
34
22
210
17.9K
Ben Cohen
Ben Cohen@blc_16·
The final micromanagement boss
Tanay Kothari@tankots

My team's productivity 2x’d when I started spending one day a month doing what no other CEO does. I can tell you what any of my 45 employees do from 7am to 7pm. Pick anyone on my team. CMO. Support agent. Engineer who started last week. I'll tell you how they start their morning. What gives them energy. What's blocking their best work. What drives them in life. Not from a performance review. From actually sitting next to them and watching them work. Here's why: Most CEOs see their team struggling and bring in consultants. I bring a notebook. I sit with each person. Shadow them for hours. See what their actual day looks like. Not what I think it looks like - what it really is. Then we map the blockers together. And fix them. This isn't about managing them. It's about removing their friction. When I do this, something changes. They don't feel criticized; they feel seen and empowered. The 2x productivity boost is nice. But that's not why I do it. I do it because they stop being employees. They become teammates I actually understand and trust. Here's the truth: Your product will change. Your market will change. Your strategy will change. The only thing that stays with you is your people. So if you're going to prioritize anything, prioritize understanding them. The best CEOs aren't always the smartest people in the room. They're the ones who know exactly what's preventing everyone else from being brilliant. That's the job. — Written with Wispr Flow

English
1
0
2
232
Omar Khattab
Omar Khattab@lateinteraction·
I find it disappointing to see how much progress comes daily from late interaction, DSPy/GEPA, and RLMs but our slow industry only catches up after a lab person slaps a name like “autoresearch” or “deep research” on it lol. The future is already here, just not equally distributed
English
24
34
361
23.1K
Ben Cohen
Ben Cohen@blc_16·
@Kaz_Khadem Maybe because every incubator application asks them what they are top 1% in the world at
English
0
0
138
5.3K
Kasra Khadem
Kasra Khadem@Kaz_Khadem·
Young gen z founders in SF today are blatantly embellishing their accomplishments / straight up lying about themselves to VCs to an extent I’ve never seen before… “I spearheaded the AI roll-out of—“ bro you were a freshman summer intern at Meta what are you talking about
English
69
69
2.4K
162.1K
Dan Shipper 📧
Dan Shipper 📧@danshipper·
new model for engineering team structure in 2026: 2 people only one pirate and one architect the pirate's job is to move as fast as possible to develop valuable, shipped product features by vibe coding. the architect's job is to turn the product surface discovered by the pirate into a reliable, structured machine—also by vibe coding, but at a slower, more well-reasoned pace. every product needs a pirate but most product's only need an architect once they some form of PMF, and in that case they usually don't need one full-time. architects can work across many codebases and solve interesting technical challenges. pirates go hard on a product that they own end-to-end.
English
340
297
4.5K
619.4K
Ben Cohen
Ben Cohen@blc_16·
@jamesonhaslam I assumed it was because every other commercial I see is now either a Cursor or a Lovable commercial
English
0
0
0
223