jeremy

99 posts

jeremy

@jercarin

member of the technically staff @ mit csail

Cambridge شامل ہوئے Şubat 2025

528 فالونگ26 فالوورز

jeremy@jercarin·1d

@willdepue ant does this significantly better than OAI, should copy their approach. Claude web has “full” network access through a proxy, whereas GPT can only access approved package managers. No reason why GPT environment needs to be so firewalled, it’s a smart guy now

English

592

will depue@willdepue·1d

yes local full-access coding agent was always the right interface, but it’s worth noting that the reason that took off so hard is how fucking unbelievably bad code interpreter was executed. tried today: cant download packages, dies and wipes itself, errors. massive unforced error

English

24.2K

jeremy@jercarin·2d

@badlogicgames I mean this with no malice but genuinely who uses kilo code?

English

2.2K

Mario Zechner@badlogicgames·2d

That's unexpected. openrouter.ai/apps/category/…

English

589

117.4K

jeremy@jercarin·5d

@Suolar_ @0ranguchad @sama @scaling01 I feel like you misjudge the TAM of extremely online nerds who constantly refresh twitter for new model updates

English

Dennis Kacz@Suolar_·5d

@jercarin @0ranguchad @sama @scaling01 This always killer marketing and every company does it. Anthropic did the same thing with 4.7 before it released. I’m almost certain this is planned and if it was accidental I doubt they’re worried about it

English

Lisan al Gaib@scaling01·5d

it's really the dumbest fucking thing I've seen from Anthropic you know how much I love them but this is borderline suicidal they could've just said: "here's Haiku and Sonnet 5 and btw Pro subs no longer get access to Opus and only low thinking effort" but removing claude code entirely is such an idiotic move when everything you are known for is coding especially in the same week we likely get Spud/GPT-5.5 and potentially DeepSeek-V4 they are begging you poor shits to unsubscribe and to either pay up or to get lost so that they can allocate that juicy compute to higher-margin customers

Lisan al Gaib@scaling01

Anthropic removed Claude Code from the Pro plan I'm obviously going to cancel my subscription if I lose access to Claude Code Mythos was actually the top of the Anthropic hype cycle

English

426.3K

jeremy@jercarin·5d

@0ranguchad @sama @Suolar_ @scaling01 you are irrationally annoyed by things that do not matter

English

126

Jack@0ranguchad·5d

@jercarin @sama @Suolar_ @scaling01 I mean at this point people at oAI are definitely aware, I guarantee you SamA isn’t finding out about this through a twitter reply lmao. Models are already gone. It’s not like alerting them sooner would change the engineer’s fate anyway.

English

219

jeremy@jercarin·5d

@0ranguchad @sama @Suolar_ @scaling01 i have empathy for the miscellaneous infra engineer who accidentally switched a flag and you should as well

English

777

Jack@0ranguchad·5d

@jercarin @sama @Suolar_ @scaling01 “Um teacher you forgot to collect the homework!”

English

739

jeremy@jercarin·5d

@yifan_zhang_ so are people like sure sure that gpt 5.5 == spud and not distilled spud

English

1.4K

Yifan Zhang @ ICLR 2026@yifan_zhang_·5d

GPT-5.5 Spud launches on Thursday🫡

Sam Altman@sama

@Suolar_ @scaling01 🫡

English

264

41.9K

jeremy@jercarin·5d

@immanuelg @sama @Suolar_ @scaling01 the source is the codex application

English

918

Immanuel Giulea@immanuelg·5d

@jercarin @sama @Suolar_ @scaling01 source?

English

923

jeremy@jercarin·5d

@sama @Suolar_ @scaling01 hello mr sam altman are u aware gpt 5.5 and oai-2.1 are both listed in the model options right now. i feel like that was an accident

English

18.1K

Sam Altman@sama·5d

@Suolar_ @scaling01 🫡

QME

1.6K

400.9K

jeremy@jercarin·5d

@letmutex When I ask GPT 5.5 which model it is, it says "I'm GPT 5". when I ask the model listed as oai-2.1, it says "I'm GPT 5.5". idk.

English

4.6K

jeremy@jercarin·5d

@letmutex I just searched latest on twitter to see if anyone else noticed this. You're the first person I've seen with it (me too!)

English

307

letmutex@letmutex·5d

Wait what? I got GPT 5.5?

English

346

jeremy@jercarin·17 Nis

@badlogicgames @thsottiaux It is perplexing to imagine why someone would sit and pick between low/medium/high when the direct messaging of ant is that you should basically always use xhigh lol

English

267

Mario Zechner@badlogicgames·16 Nis

@thsottiaux well, ant outdid you with: - max - anything below high is now useless, but can still be configured

English

4.4K

Mario Zechner@badlogicgames·16 Nis

how many more thinking levels do we need? i really wonder what everyone is smoking at the model labs.

English

456

39.3K

jeremy@jercarin·15 Nis

@bcherny please scrape my ssh config à la vscode :)

Français

Boris Cherny@bcherny·15 Nis

We've been working on this for a while. Can't wait to hear what you think

Claude@claudeai

We've redesigned Claude Code on desktop. You can now run multiple Claude sessions side by side from one window, with a new sidebar to manage them all.

English

897

206

6.9K

572.9K

jeremy@jercarin·15 Nis

@kalomaze may be right. in any case we will slowly climb back to heights of GPT-2

English

kalomaze (is at iclr)@kalomaze·15 Nis

@jercarin i distinctly recall violent use of the em-dash as being a distinctly 4o and beyond phenomenon

English

103

kalomaze (is at iclr)@kalomaze·15 Nis

opus 4.5 was better

English

7.1K

jeremy@jercarin·15 Nis

@kalomaze From my memory em-dash was a gpt-3.5 artifact, no? And got turned to 11 by data contamination/cursed rlhf in gpt-4

English

kalomaze (is at iclr)@kalomaze·15 Nis

echoes of that one deprecated gemini ckpt, and early pre-sycophancy 4o checkpoints (yes, the early ones that introduced the em dash pre-RLHF makeover, and also seemed to really really like curly quotes beyond all logical justification)

English

2.7K

jeremy@jercarin·7 Nis

@AlexPalcuie fwiw, sonnet 4.6 is similarly quite funny (more than opus). we have it in household group chat and it provides great content

English

676

palcu@AlexPalcuie·7 Nis

oh and one more thing about mythos preview -- genuinely good company in our slack

English

415

19K

jeremy@jercarin·30 Mar

@stalkermustang @jukan05 My impression (based on some epoch AI reporting I think that I can’t find right now) is all of these models have been midtraining + RL on top of the gpt-4o base. Happy to be proven wrong but I think given all the news about these new pretrains coming out it’s likely true.

English

260

Igor Kotenkov@stalkermustang·30 Mar

@jercarin @jukan05 I'm sure there was a new pretrain base model after original GPT-5. Not sure where exactly though, 5.2 or 5.3 or 5.4.

English

772

Jukan@jukan05·30 Mar

AI lab folks, when the hell is the Blackwell-trained model finally dropping? Doesn’t look like it’s Gemini, and people are going crazy saying Claude Mythos is performing ridiculously well. Was that trained on Blackwell?

English

513

87.4K

jeremy@jercarin·30 Mar

@stalkermustang @jukan05 These models weren’t new pretrains

English

729

Igor Kotenkov@stalkermustang·30 Mar

@jukan05 weren't GPT-5.4 / 5.3-Codex pretrained on blackwell?

English

jeremy@jercarin·29 Mar

@AndrewCurran_ Are people sure that Gemini 3 wasn't a larger pretrain than 4.5? I had this impression that it was a larger or comperable pretrain with a very weak/bad posttrain, just from vibes. in other words, I have no idea :)

English

621

Andrew Curran@AndrewCurran_·29 Mar

From the post:

Krishna Kaasyap@krishnakaasyap

From QT - //But if Anthropic found that training above a certain scale, or in a certain way at that scale, produces capabilities that sit far above the prior trendline, then that is an architectural breakthrough.// I believe this is the case, not just because an architectural and algorithmic breakthrough at this scale cannot be achieved in isolation, but also because, even if it were, it would soon leak via employee turnover, corporate espionage, or many other means. The moat of a frontier lab lies in enormously scaling an advancement, or simply in scaling a Transformer++ arch. I don't think any of the frontier lab would purely bet on an architectural or algorithmic breakthrough (that could be easily replicated like CoT reasoning/thinking was replicated by almost everyone) for them to be at the frontier! In addition to this business logic, research from @EpochAIResearch supports the same conclusion. From @ansonwhho's research - //For example, @MITFutureTech found that shifting from LSTMs (green) to Modern Transformers (purple) has an efficiency gain that depends on the compute scale: - At 1e15 FLOP, the gain is 6.3× - At 3e16 FLOP, the gain is 26× Naively extrapolating to 1e23 FLOP, the gain is 20,000×!// If Anthropic found that training above a certain scale... produces capabilities that sit far above the prior trendline... they would definitely attempt it, as it can be done by only two other labs in the world. This is especially relevant given that those two labs have their tentacles in everything from adult content slop to search engine & browser wars, thinning their available compute for a final training run of a single model. Source - epoch.ai/gradient-updat… Since past final training runs have typically accounted for <30% of total R&D compute, a significant amount of compute remains unused for these runs. It is possible that the compute allocated for final training runs has now been increased substantially. The largest final training run known to humanity occurred in late 2024 for GPT-4.5, which OpenAI officially released on February 27, 2025. Not a single GB200 NVL72 was available at that time. However, by early 2026, we have access to thousands of GB200 and GB300 NVL72 racks, along with more diversified compute from AMD, Google (TPUs), AWS (Tranium), and many other providers. All available evidence and reasonable inferences suggest that the observed step-change improvements are likely large primarily because Ant scaled final training run compute significantly, rather than due to a multitude of new innovations. Source - epoch.ai/gradient-updat… Total @EpochAIResearch victory - @datagenproc @cherylwoooo @Jsevillamol and the team!

English

428

63.5K

jeremy@jercarin·21 Mar

@dbreunig tbf I'm not sure this is a fully novel change for people deep in something technical. for example coming from OS/systems, I say "context switch overhead" or "race condition" way too much when describing things. it just so happens the technical thing here acts a lot like a human

English

316

Drew Breunig@dbreunig·20 Mar

One takeaway from the recent Andreessen interview, and something I'm seeing more of lately, is that people _deep_ in AI have started using the LLM as a metaphor for reasoning about how brains and intelligence works. "You’re a 15 second sliding context window." Here we see it again, "Kepler was a high temperature LLM." I'm not sure what to take from this, other than many in this field/culture have crossed the line from projecting our theories about intelligence onto AI architecture designs, to projecting AI architecture designs on ourselves. Meanwhile, most people have no clue how an LLM even replies, let alone what a context window is.

Dwarkesh Patel@dwarkesh_sp

The Terence Tao episode. We begin with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make worse predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy! 0:00:00 – Kepler was a high temperature LLM 0:11:44 – How would we know if there’s a new unifying concept within heaps of AI slop? 0:26:10 – The deductive overhang 0:30:31 – Selection bias in reported AI discoveries 0:46:43 – AI makes papers richer and broader, but not deeper 0:53:00 – If AI solves a problem, can humans get understanding out of it? 0:59:20 – We need a semi-formal language for the way that scientists actually talk to each other 1:09:48 – How Terry uses his time 1:17:05 – Human-AI hybrids will dominate math for a lot longer Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify.

English

14.7K

jeremy@jercarin·19 Mar

@lydiahallie nightmare nightmare nightmare

English

Lydia Hallie ✨@lydiahallie·18 Mar

if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result!

English

128

249

851.3K

دریافت کریں

@willdepue @badlogicgames @Suolar_ @0ranguchad @sama @scaling01 @yifan_zhang_ @immanuelg