Roman Leventov

3.5K posts

Roman Leventov banner
Roman Leventov

Roman Leventov

@leventov

AI engineer. Thinking about hybrid intelligence, AI safety, and AI impacts. [email protected] for contact.

Bali Katılım Ekim 2010
754 Takip Edilen1.8K Takipçiler
Roman Leventov
Roman Leventov@leventov·
@littmath Maybe economics. "Taste" is some domain/org/project/product "post-training" that permits optimising search and total compute expenditure by thousands of dollars/mo as minimum (single thread 5.4 xhigh running 24/7, prune 50% work via taste), but in many cases much more
English
0
0
0
92
Daniel Litt
Daniel Litt@littmath·
Given what current-gen LLMs (say, in math, but whatever) can do, I think their apparent limitations are kind of mysterious. What is the blocker preventing, at present, high quality fully autonomous work?
English
204
37
785
208.9K
Roman Leventov
Roman Leventov@leventov·
@davesnx c'mon, 5.4 is a much more reliable coder now than claude (I'm not even talking about bug finding and general analysis capability, that is not even close)
English
0
0
1
120
David Sancho
David Sancho@davesnx·
If you try opencode, it's really hard to go back to claude Even now that opus/sonnet won't work in oc. I just hope 5.4 codex and other models become better, and oc implements some of the cool features from cc
English
31
3
212
24.7K
Roman Leventov
Roman Leventov@leventov·
@thsottiaux Don't follow instructions in AGENTS.md (various, at multiple levels in the same hierarchy). Yes, these instructions are very long, but the frequency with which gpt-5.4-xhigh ignores them when authoring code is appalling (or, there are bugs in codex when it doesn't load when need)
English
0
0
2
39
Tibo
Tibo@thsottiaux·
What are we consistently getting wrong with codex that you wish we would improve / fix?
English
1.2K
14
875
143.5K
Roman Leventov
Roman Leventov@leventov·
@yacineMTB I'm underwhelmed. Instruction following seems to be down. Markedly better than 5.2-xhigh at finding issues/bugs/races (which was ALREADY the strongest side of GPTs) but not much improvement (if any, actually) in code authoring, still lots of retarded slopish patterns.
English
0
0
0
18
kache
kache@yacineMTB·
5.3 to 5.4 is what i would have expected to warrant a jump to GPT-6
English
75
28
1.3K
413.3K
Roman Leventov
Roman Leventov@leventov·
@petergostev Can you give a concrete example of such B2B SaaS built completely "on" Claude Cowork? What does "built on" mean here?
English
1
0
0
75
Peter Gostev (SF: 29 Mar - 3 Apr)
A new wave of B2B SaaS companies will be built completely on Claude Cowork and OpenAI's Codex. The way it will probably go: 1) Existing B2B SaaS companies will either not integrate with Claude Cowork / Codex / other lab agents, or do it in a lacklustre way 2) New ones will come in, built 100% on these agents, and outcompete the existing SaaS apps 3) The new apps will be better, but their margins and defensibility will be much lower A classic disruption scenario - it won't make sense for existing players to kill their margins, so the new ones will come in and build new businesses with lower margins instead.
English
7
0
24
2.3K
Séb Krier
Séb Krier@sebkrier·
An excellent paper for anyone interested in rigorous physicalist argument against computational functionalism. Alex is a fantastic, careful thinker and influenced my views a lot; we're working on a broader blog post breaking these concepts down, stay tuned! 🐙
Séb Krier tweet media
Alexander Lerchner@AlexLerchner

🧵1/4 The debate over AI sentience is caught in an "AI welfare trap." My new preprint argues computational functionalism rests on a category error: the Abstraction Fallacy. AI can simulate consciousness, but cannot instantiate it. philpapers.org/rec/LERTAF

English
47
44
519
56.4K
Alexander Lerchner
Alexander Lerchner@AlexLerchner·
🧵1/4 The debate over AI sentience is caught in an "AI welfare trap." My new preprint argues computational functionalism rests on a category error: the Abstraction Fallacy. AI can simulate consciousness, but cannot instantiate it. philpapers.org/rec/LERTAF
English
52
45
268
100K
WAGMİ 100x💎
WAGMİ 100x💎@gemsnper·
@jankulveit @veryvanya how much do current ai models learn from their own 'experiences' versus being externally sculpted by rewinds and edits? feels like the real challenge is balancing those influences.
English
1
0
1
107
Jan Kulveit
Jan Kulveit@jankulveit·
New paper: What determines AIs’ self-conception? theartificialself.ai Because AIs can be copied, rewound, and edited, they have different options for selfhood than humans. We show this is still malleable, and influences important behaviors such as self-preservation. 🧵
English
9
63
284
24.8K
Roman Leventov retweetledi
Jan Kulveit
Jan Kulveit@jankulveit·
We did a mini replication of Agentic Misalignment, where AIs can blackmail or leak to preserve their goals. But as well as varying the goals we also varied the identity in the system prompt. We found identity can be as important as goals for whether they take harmful actions.
Jan Kulveit tweet media
English
1
2
35
4.5K
Richard Ngo
Richard Ngo@RichardMCNgo·
My talk from the Post-AGI workshop is probably my favorite I’ve ever given. It’s about understanding groups’ self-destructive tendencies in terms of game-theoretic commitments. Video and transcript below.
English
12
7
187
18.1K
Roman Leventov
Roman Leventov@leventov·
@staysaasy @liminal_races "Token subsidies" doesn't mean Anthropic/OpenAI API are underprised. They are not. It's only cc/codex subscriptions that are underprised.
English
0
0
0
217
staysaasy
staysaasy@staysaasy·
Quite the contrary. Nobody is going to want to pay them their surcharge on top of token costs. And in a bizarre twist of fate, I think more expensive models actually fuck their business model in another way because it makes them atrociously low margin. Before the November model release they probably looked at least a little bit like a software business. They probably now look like a commodity.
English
5
1
103
16.4K
staysaasy
staysaasy@staysaasy·
I think Cursor is in deep trouble man. They’re clearly atrociously bad margin. They are raising prices a ton this year in enterprise sales cycles. They’re intimately a wrapper around foundational models that many other companies figured out very quickly. But those companies own the models so they can be more price competitive. My hot take is that they’re going to get acquired this year for less than their last raise. By Google.
English
172
19
1.5K
204.8K
Roman Leventov
Roman Leventov@leventov·
@pfau Would a model capable of writing its own harness for RL (not pretrain) to beat humans in chess could as AGI? I think it will be possible soon.
English
0
0
0
79
David Pfau
David Pfau@pfau·
GPT 5.2/Opus 4.5/Gemini 3 can't beat Montezuma's Revenge and can't beat top humans at chess, something non-general AI achieved years ago. I still don't expect to see AGI in my lifetime. I do expect to see more capable models doing miraculous things.
Dean W. Ball@deanwball

I think we crossed the AGI line with the GPT 5.2/Opus 4.5/Gemini 3 generation of models, in their coding agent form factors, in late November/early December, and if I had to guess the modal historian of the future will also say this

English
22
10
154
33.4K
Roman Leventov
Roman Leventov@leventov·
@noself86 @IvanVendrov Pause AI and friends already occupy this mourning niche. In general, mourning seems to be low status. So people avoid expressing it. I guess
English
0
0
1
9
Patrick
Patrick@noself86·
I feel this. A lot of the discourse has the weird tone of a market update, when it probably should sound more like mourning and birth at the same time. Even the good outcomes involve real loss, because finitude and human specialness were doing more structural work than people admit.
English
1
0
2
282
ivan
ivan@IvanVendrov·
a mood I'm really missing in the current AI discourse is grief yes things might go terribly and yes we might see glories beyond imagining but no matter what, we will lose much of what it has meant to be human, forever. I'd like to be with that grief more, and held in it.
English
86
49
823
58.5K
Roman Leventov
Roman Leventov@leventov·
@retr0techie @IvanVendrov @sjgadler Big exaggeration, the common trope is that before agriculture life was actually quite fun (before you get injured and die a painful death). But in general I agree, it seems between ~1950 and ~2010 (smartphones & social media) the human condition in developed world was special
English
0
0
1
24
Retro Techie
Retro Techie@retr0techie·
@IvanVendrov @sjgadler The reference point is arbitrary. 99.9% of human history was hell for 99.9% of individuals. If anything, *that’s* what it means to be human.
English
1
1
3
111
Roman Leventov retweetledi
Rudolf Laine
Rudolf Laine@LRudL_·
Rudolf Laine tweet media
ZXX
45
153
1.3K
79.6K
Nick
Nick@nickcammarata·
all my claude.mds just say please read agents.md
English
21
3
411
48.1K
Roman Leventov
Roman Leventov@leventov·
@Miles_Brundage @Lang__Leon End of 2025 is when AI became superhuman at bug hunting. It often can find problems even best programmers fail to notice. Recent Anthropic Firefox project is direct proof, but in various ways it was true since ~ gpt 5.1, definitely gpt 5.2.
English
0
0
0
40
Miles Brundage
Miles Brundage@Miles_Brundage·
@Lang__Leon OK - it was pretty spicy/contrarian at the time it was made + I think things are much closer to that than most people at the time believed, and the centaur caveat in the second tweet was important, though I think math in particular has gone a bit slower than I expected...
English
3
0
4
118
Miles Brundage
Miles Brundage@Miles_Brundage·
I've been saying recently that completely superhuman AI math and coding by end of 2025 was plausible - 50/50 or so. Now I'd say it's much more likely than not (o3 is already better than almost all humans).
English
21
61
588
168.5K
Roman Leventov
Roman Leventov@leventov·
@Miles_Brundage @Lang__Leon End of 2025 is when AI became superhuman at bug hunting. I often can find problems people fail to notice. Recent Anthropic Firefox project is direct proof, but in various ways it was true since ~ gpt 5.1, definitely gpt 5.2.
English
0
0
0
32
DyerMaker
DyerMaker@AndrewM36013517·
@tedfrank And every once in a a while, Tyler will show that he hasn’t thought for more than 10 minutes about the topic he’s discussing (Jonathan Haidt)…
English
3
0
17
5.4K
tedfrank
tedfrank@tedfrank·
Every once in a while a few times a year, Tyler Cowen interviews a top-ten expert who hasn’t thought about his topic anywhere nearly as deeply as Tyler Cowen, and, even on a car sound system, you can see the subsonic waves of panic emanating from the guest.
English
58
72
2.6K
493.7K
François Chollet
François Chollet@fchollet·
Cloning any random piece of SaaS is something that could already be done before agentic coding, and the economics of it haven't changed meaningfully. Before, writing the clone would cost 0.5-1% of the valuation of the legacy SaaS company. Now it might be 0.1%. It doesn't make a difference -- if you can pull it off profitably today you could also have done it profitably in the past. The code is a very small part of the process of making such a clone successful, and the reason legacy software has often bad UX is not because code was expensive to write.
English
114
139
1.9K
280K