Raghav

15 posts

Raghav

Raghav

@rgvbansal

NYC Katılım Temmuz 2025
462 Takip Edilen18 Takipçiler
Raghav
Raghav@rgvbansal·
don't know what's in their AI safety filters specifically if the argument is that all the generated images are permissible, agreed, the conversation is uninteresting. more interesting one to be had if you think this is an example of safety training/model "alignment" techniques being fairly non-robust
English
0
0
0
26
fdf
fdf@0xfdf·
@rgvbansal @natolambert Why would they train it not to generate a photo of a Donnie Darko-esque bunny suit or a person lying on a floor?
English
1
0
1
120
Raghav
Raghav@rgvbansal·
@0xfdf @natolambert Sure, but the model was presumably safety pre/mid/post-trained not to generate the first set of photos and was told its ok to generate examples similar to yours
English
1
0
0
81
fdf
fdf@0xfdf·
@natolambert The prompt is priming the model with "strange" and "apologize for the content." This is not an AI safety example; the model is dutifully making up polarizing content, as it was asked to do. Here is what I get with slightly different priming.
fdf tweet media
English
4
1
87
8.2K
Raghav retweetledi
demitria
demitria@wannabfisherman·
I don't think enough people realize we're about five years out from people going to rehab for phones
English
257
4.3K
94.9K
1.8M
Raghav retweetledi
Dimitris Papailiopoulos
Dimitris Papailiopoulos@DimitrisPapail·
Math AI is roughly where coding was before CLI agents: single-turn and mostly ungrounded without a dense feedback loop. The best math prover we have today is GPT-5.5 Pro doing for the most part single-turn natural language proofs. Without a real reactive environment, grounding, or real multi-turn correction. Very much the opposite of what CLI agents like Codex or Claude Code operate in. In current top math AI models you generate and then verify after the fact. Terminal agents work so well because the terminal grounds them after every turn and lets them self-correct as they go. Each step gets verified on the way to the solution, and this also helps during training and test time! There's so much signal (literally thousands of tokens) that the bash terminal offers, both during training and during inference. That kind of reactive, and very verbose environment is exactly why Claude Code and Codex have taken off, and are the closest thing an LLM has been to an embodied agent. My conjecture is that math needs the equivalent: a reactive environment, a "file system", and a "math terminal" that builds pieces of the proof as you go, verifies them and allows the model to backtrack and redo without keeping the entire proof/process in its context. When a real agentic math model is trained by experience inside that kind of environment, my conjecture is it'll be a phase transition given how strong GPT-5.5 and Gemini 3.1 Pro already are in ungrounded, single-turn settings.
Kimon Fountoulakis@kfountou

Verification of math currently takes multiple days, if it succeeds at all. It needs to drop to O(generation time).

English
24
17
222
25.4K
Raghav
Raghav@rgvbansal·
Would love an easy way to shove my chatgpt context into a codex thread inside the codex app and vice-versa
English
0
0
0
43
Raghav retweetledi
Nick Levine
Nick Levine@status_effects·
New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:
English
179
399
3.2K
1.2M
Raghav retweetledi
Chase Brower
Chase Brower@ChaseBrowe32432·
I painstakingly ran all 20 EsoLang-Bench hard problems through Claude webui. It solved 20/20 (100%). No specialized scaffolding, no expert prompting, no few-shot examples, it just solves them natively. This benchmark just suffocated the models with constrictive scaffolding.
Lossfunk@lossfunk

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

English
52
112
1.2K
153.1K
Raghav
Raghav@rgvbansal·
feels like many people are very bullish AI but continue to work on research and products with a half life of a couple months if you believe in the trend, seems like a better bet to swing big
English
0
0
0
32
Raghav retweetledi
Bojan Tunguz
Bojan Tunguz@tunguz·
99% of the economy are the edge cases.
English
4
6
58
5.5K
Raghav retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
My current AI stack (ranked in usage): 1. GPT 5.2 Thinking/Pro: Primary driver, search, information, synthesis, planning. 2. Claude Opus 4.5: Feedback, basic debugging, data viz 3. Gemini 3: Multimodal, queries 4. Grok 4: X search This has changed a lot from last summer, where I was almost exclusively using GPT 5 Thinking (previously o3). Diversity points to a more exciting ecosystem. If you want to feel the frontier of AI you need to be using multiple models for their best strengths and learning to hand off in between them. In my latest post, I explain what this means about the models and how they're evolving (the jagged frontier).
Nathan Lambert tweet media
English
54
41
519
53.2K
Raghav retweetledi
lala
lala@zxlava·
i'm generally bullish on the cohort of people who turned 26 in 2022 (plus or minus a few years). young enough to be ai native, senior enough to have solid judgement, and still had normal prefrontal cortex. also got through middle and most of high school without their brightest minds getting cooked by algo slop feeds
delian@zebulgar

just read this in an investor update "older engineers who graduated from college pre-GPT are actually the best-suited for our purposes. They have fundamental programming ability that's lost amongst most of the current-gen." the AI-induced thinking/skills decay has begun wild

English
57
63
2.1K
478.3K