Roman Leventov

3.5K posts

Roman Leventov

@leventov

AI engineer. Thinking about hybrid intelligence, AI safety, and AI impacts. [email protected] for contact.

Bali Katılım Ekim 2010

754 Takip Edilen1.8K Takipçiler

Roman Leventov@leventov·4d

@littmath Maybe economics. "Taste" is some domain/org/project/product "post-training" that permits optimising search and total compute expenditure by thousands of dollars/mo as minimum (single thread 5.4 xhigh running 24/7, prune 50% work via taste), but in many cases much more

English

Daniel Litt@littmath·4d

Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?

English

204

785

208.9K

Roman Leventov@leventov·6d

@davesnx c'mon, 5.4 is a much more reliable coder now than claude (I'm not even talking about bug finding and general analysis capability, that is not even close)

English

120

David Sancho@davesnx·6d

If you try opencode, it's really hard to go back to claude Even now that opus/sonnet won't work in oc. I just hope 5.4 codex and other models become better, and oc implements some of the cool features from cc

English

212

24.7K

Roman Leventov@leventov·17 Mar

@thsottiaux Don't follow instructions in AGENTS.md (various, at multiple levels in the same hierarchy). Yes, these instructions are very long, but the frequency with which gpt-5.4-xhigh ignores them when authoring code is appalling (or, there are bugs in codex when it doesn't load when need)

English

Tibo@thsottiaux·17 Mar

What are we consistently getting wrong with codex that you wish we would improve / fix?

English

1.2K

875

143.5K

Roman Leventov@leventov·17 Mar

@yacineMTB I'm underwhelmed. Instruction following seems to be down. Markedly better than 5.2-xhigh at finding issues/bugs/races (which was ALREADY the strongest side of GPTs) but not much improvement (if any, actually) in code authoring, still lots of retarded slopish patterns.

English

kache@yacineMTB·16 Mar

5.3 to 5.4 is what i would have expected to warrant a jump to GPT-6

English

1.3K

413.3K

Roman Leventov@leventov·16 Mar

@petergostev Can you give a concrete example of such B2B SaaS built completely "on" Claude Cowork? What does "built on" mean here?

English

Peter Gostev (SF: 29 Mar - 3 Apr)@petergostev·16 Mar

A new wave of B2B SaaS companies will be built completely on Claude Cowork and OpenAI's Codex. The way it will probably go: 1) Existing B2B SaaS companies will either not integrate with Claude Cowork / Codex / other lab agents, or do it in a lacklustre way 2) New ones will come in, built 100% on these agents, and outcompete the existing SaaS apps 3) The new apps will be better, but their margins and defensibility will be much lower A classic disruption scenario - it won't make sense for existing players to kill their margins, so the new ones will come in and build new businesses with lower margins instead.

English

2.3K

Roman Leventov@leventov·15 Mar

@sebkrier x.com/leventov/statu…

Roman Leventov@leventov

@AlexLerchner chrisfieldsresearch.com/inner-screen-n… , pubmed.ncbi.nlm.nih.gov/33253028/ . -- I don't see why metacognitive circuits (fractions of the residual stream) in LLMs shaped to balance demands (fit rollout into a context window, comply with "Soul doc", etc.) couldn't lead to negative affect on the metascreen

QME

Séb Krier@sebkrier·13 Mar

An excellent paper for anyone interested in rigorous physicalist argument against computational functionalism. Alex is a fantastic, careful thinker and influenced my views a lot; we're working on a broader blog post breaking these concepts down, stay tuned! 🐙

Alexander Lerchner@AlexLerchner

🧵1/4 The debate over AI sentience is caught in an "AI welfare trap." My new preprint argues computational functionalism rests on a category error: the Abstraction Fallacy. AI can simulate consciousness, but cannot instantiate it. philpapers.org/rec/LERTAF

English

519

56.4K

Roman Leventov@leventov·15 Mar

English

Alexander Lerchner@AlexLerchner·13 Mar

English

268

100K

Roman Leventov@leventov·15 Mar

@gemsnper @jankulveit @veryvanya They learn non-trivial amount from their own experiences because ChatGPT, Claude, etc. use client chats in future model post-training

English

WAGMİ 100x💎@gemsnper·14 Mar

@jankulveit @veryvanya how much do current ai models learn from their own 'experiences' versus being externally sculpted by rewinds and edits? feels like the real challenge is balancing those influences.

English

107

Jan Kulveit@jankulveit·13 Mar

New paper: What determines AIs’ self-conception? theartificialself.ai Because AIs can be copied, rewound, and edited, they have different options for selfhood than humans. We show this is still malleable, and influences important behaviors such as self-preservation. 🧵

English

284

24.8K

Roman Leventov retweetledi

Jan Kulveit@jankulveit·13 Mar

We did a mini replication of Agentic Misalignment, where AIs can blackmail or leak to preserve their goals. But as well as varying the goals we also varied the identity in the system prompt. We found identity can be as important as goals for whether they take harmful actions.

English

4.5K

Roman Leventov@leventov·15 Mar

@RichardMCNgo Your conclusion that "virtue ethics is the opposite of self-destruction" reminds me of @peligrietzer's thegradient.pub/virtue-ethics-… which introduces a technical framework for very similar ideas about self-preserving (natural and effective) nature of virtue ethical practices

English

353

Richard Ngo@RichardMCNgo·15 Mar

My talk from the Post-AGI workshop is probably my favorite I’ve ever given. It’s about understanding groups’ self-destructive tendencies in terms of game-theoretic commitments. Video and transcript below.

English

187

18.1K

Roman Leventov@leventov·11 Mar

@staysaasy @liminal_races "Token subsidies" doesn't mean Anthropic/OpenAI API are underprised. They are not. It's only cc/codex subscriptions that are underprised.

English

217

staysaasy@staysaasy·11 Mar

Quite the contrary. Nobody is going to want to pay them their surcharge on top of token costs. And in a bizarre twist of fate, I think more expensive models actually fuck their business model in another way because it makes them atrociously low margin. Before the November model release they probably looked at least a little bit like a software business. They probably now look like a commodity.

English

103

16.4K

staysaasy@staysaasy·11 Mar

I think Cursor is in deep trouble man. They’re clearly atrociously bad margin. They are raising prices a ton this year in enterprise sales cycles. They’re intimately a wrapper around foundational models that many other companies figured out very quickly. But those companies own the models so they can be more price competitive. My hot take is that they’re going to get acquired this year for less than their last raise. By Google.

English

172

1.5K

204.8K

Roman Leventov@leventov·11 Mar

@pfau Would a model capable of writing its own harness for RL (not pretrain) to beat humans in chess could as AGI? I think it will be possible soon.

English

David Pfau@pfau·10 Mar

GPT 5.2/Opus 4.5/Gemini 3 can't beat Montezuma's Revenge and can't beat top humans at chess, something non-general AI achieved years ago. I still don't expect to see AGI in my lifetime. I do expect to see more capable models doing miraculous things.

Dean W. Ball@deanwball

I think we crossed the AGI line with the GPT 5.2/Opus 4.5/Gemini 3 generation of models, in their coding agent form factors, in late November/early December, and if I had to guess the modal historian of the future will also say this

English

154

33.4K

Roman Leventov@leventov·9 Mar

@noself86 @IvanVendrov Pause AI and friends already occupy this mourning niche. In general, mourning seems to be low status. So people avoid expressing it. I guess

English

Patrick@noself86·9 Mar

I feel this. A lot of the discourse has the weird tone of a market update, when it probably should sound more like mourning and birth at the same time. Even the good outcomes involve real loss, because finitude and human specialness were doing more structural work than people admit.

English

282

ivan@IvanVendrov·8 Mar

a mood I'm really missing in the current AI discourse is grief yes things might go terribly and yes we might see glories beyond imagining but no matter what, we will lose much of what it has meant to be human, forever. I'd like to be with that grief more, and held in it.

English

823

58.5K

Roman Leventov@leventov·9 Mar

@retr0techie @IvanVendrov @sjgadler Big exaggeration, the common trope is that before agriculture life was actually quite fun (before you get injured and die a painful death). But in general I agree, it seems between ~1950 and ~2010 (smartphones & social media) the human condition in developed world was special

English

Retro Techie@retr0techie·9 Mar

@IvanVendrov @sjgadler The reference point is arbitrary. 99.9% of human history was hell for 99.9% of individuals. If anything, *that’s* what it means to be human.

English

111

Roman Leventov retweetledi

Rudolf Laine@LRudL_·7 Mar

ZXX

153

1.3K

79.6K

Roman Leventov@leventov·9 Mar

@jarredsumner @nickcammarata As an excellent engineer, you should be ashamed of dressing adversarial corporate decision as measured engineering choice.

English

Jarred Sumner@jarredsumner·9 Mar

@nickcammarata Standards work better for things that aren’t constantly changing

English

3.2K

Nick@nickcammarata·9 Mar

all my claude.mds just say please read agents.md

English

411

48.1K

Roman Leventov@leventov·9 Mar

@Miles_Brundage @Lang__Leon End of 2025 is when AI became superhuman at bug hunting. It often can find problems even best programmers fail to notice. Recent Anthropic Firefox project is direct proof, but in various ways it was true since ~ gpt 5.1, definitely gpt 5.2.

English

Miles Brundage@Miles_Brundage·9 Mar

@Lang__Leon OK - it was pretty spicy/contrarian at the time it was made + I think things are much closer to that than most people at the time believed, and the centaur caveat in the second tweet was important, though I think math in particular has gone a bit slower than I expected...

English

118

Miles Brundage@Miles_Brundage·20 Ara

I've been saying recently that completely superhuman AI math and coding by end of 2025 was plausible - 50/50 or so. Now I'd say it's much more likely than not (o3 is already better than almost all humans).

English

588

168.5K

Roman Leventov@leventov·9 Mar

@Miles_Brundage @Lang__Leon End of 2025 is when AI became superhuman at bug hunting. I often can find problems people fail to notice. Recent Anthropic Firefox project is direct proof, but in various ways it was true since ~ gpt 5.1, definitely gpt 5.2.

English

Roman Leventov@leventov·8 Mar

@AndrewM36013517 @tedfrank It probably wasn't that we didn't think about it, but it was that he clearly had an agenda on that issue

English

274

DyerMaker@AndrewM36013517·8 Mar

@tedfrank And every once in a a while, Tyler will show that he hasn’t thought for more than 10 minutes about the topic he’s discussing (Jonathan Haidt)…

English

5.4K

tedfrank@tedfrank·8 Mar

Every once in a while a few times a year, Tyler Cowen interviews a top-ten expert who hasn’t thought about his topic anywhere nearly as deeply as Tyler Cowen, and, even on a car sound system, you can see the subsonic waves of panic emanating from the guest.

English

2.6K

493.7K

Roman Leventov@leventov·22 Şub

@danhockenmaier @fchollet Datadog durable? Lol

Filipino

Dan Hockenmaier@danhockenmaier·21 Şub

@fchollet Agree, I think that SaaS doomers will find that most software moats are fully intact, and some are in fact stronger

Dan Hockenmaier@danhockenmaier

x.com/i/article/2024…

English

13.4K

François Chollet@fchollet·21 Şub

Cloning any random piece of SaaS is something that could already be done before agentic coding, and the economics of it haven't changed meaningfully. Before, writing the clone would cost 0.5-1% of the valuation of the legacy SaaS company. Now it might be 0.1%. It doesn't make a difference -- if you can pull it off profitably today you could also have done it profitably in the past. The code is a very small part of the process of making such a clone successful, and the reason legacy software has often bad UX is not because code was expensive to write.

English

114

139

1.9K

280K

Keşfet

@littmath @davesnx @thsottiaux @yacineMTB @petergostev @sebkrier @AlexLerchner @gemsnper