jeremy

99 posts

jeremy banner
jeremy

jeremy

@jercarin

member of the technically staff @ mit csail

Cambridge شامل ہوئے Şubat 2025
528 فالونگ26 فالوورز
jeremy
jeremy@jercarin·
@willdepue ant does this significantly better than OAI, should copy their approach. Claude web has “full” network access through a proxy, whereas GPT can only access approved package managers. No reason why GPT environment needs to be so firewalled, it’s a smart guy now
English
0
0
0
592
will depue
will depue@willdepue·
yes local full-access coding agent was always the right interface, but it’s worth noting that the reason that took off so hard is how fucking unbelievably bad code interpreter was executed. tried today: cant download packages, dies and wipes itself, errors. massive unforced error
will depue tweet media
English
7
1
98
24.2K
jeremy
jeremy@jercarin·
@badlogicgames I mean this with no malice but genuinely who uses kilo code?
English
3
0
2
2.2K
Dennis Kacz
Dennis Kacz@Suolar_·
@jercarin @0ranguchad @sama @scaling01 This always killer marketing and every company does it. Anthropic did the same thing with 4.7 before it released. I’m almost certain this is planned and if it was accidental I doubt they’re worried about it
English
1
0
1
97
Lisan al Gaib
Lisan al Gaib@scaling01·
it's really the dumbest fucking thing I've seen from Anthropic you know how much I love them but this is borderline suicidal they could've just said: "here's Haiku and Sonnet 5 and btw Pro subs no longer get access to Opus and only low thinking effort" but removing claude code entirely is such an idiotic move when everything you are known for is coding especially in the same week we likely get Spud/GPT-5.5 and potentially DeepSeek-V4 they are begging you poor shits to unsubscribe and to either pay up or to get lost so that they can allocate that juicy compute to higher-margin customers
Lisan al Gaib@scaling01

Anthropic removed Claude Code from the Pro plan I'm obviously going to cancel my subscription if I lose access to Claude Code Mythos was actually the top of the Anthropic hype cycle

English
96
60
2K
426.3K
Jack
Jack@0ranguchad·
@jercarin @sama @Suolar_ @scaling01 I mean at this point people at oAI are definitely aware, I guarantee you SamA isn’t finding out about this through a twitter reply lmao. Models are already gone. It’s not like alerting them sooner would change the engineer’s fate anyway.
English
2
0
4
219
jeremy
jeremy@jercarin·
@yifan_zhang_ so are people like sure sure that gpt 5.5 == spud and not distilled spud
English
0
0
0
1.4K
jeremy
jeremy@jercarin·
@sama @Suolar_ @scaling01 hello mr sam altman are u aware gpt 5.5 and oai-2.1 are both listed in the model options right now. i feel like that was an accident
jeremy tweet media
English
5
0
16
18.1K
jeremy
jeremy@jercarin·
@letmutex When I ask GPT 5.5 which model it is, it says "I'm GPT 5". when I ask the model listed as oai-2.1, it says "I'm GPT 5.5". idk.
English
0
1
5
4.6K
jeremy
jeremy@jercarin·
@letmutex I just searched latest on twitter to see if anyone else noticed this. You're the first person I've seen with it (me too!)
English
2
0
2
307
letmutex
letmutex@letmutex·
Wait what? I got GPT 5.5?
letmutex tweet media
English
4
1
5
346
jeremy
jeremy@jercarin·
@badlogicgames @thsottiaux It is perplexing to imagine why someone would sit and pick between low/medium/high when the direct messaging of ant is that you should basically always use xhigh lol
English
0
0
2
267
Mario Zechner
Mario Zechner@badlogicgames·
@thsottiaux well, ant outdid you with: - max - anything below high is now useless, but can still be configured
English
4
0
33
4.4K
Mario Zechner
Mario Zechner@badlogicgames·
how many more thinking levels do we need? i really wonder what everyone is smoking at the model labs.
Mario Zechner tweet media
English
68
9
456
39.3K
jeremy
jeremy@jercarin·
@bcherny please scrape my ssh config à la vscode :)
Français
0
0
0
11
jeremy
jeremy@jercarin·
@kalomaze may be right. in any case we will slowly climb back to heights of GPT-2
English
0
0
1
36
kalomaze (is at iclr)
kalomaze (is at iclr)@kalomaze·
@jercarin i distinctly recall violent use of the em-dash as being a distinctly 4o and beyond phenomenon
English
1
0
1
103
jeremy
jeremy@jercarin·
@kalomaze From my memory em-dash was a gpt-3.5 artifact, no? And got turned to 11 by data contamination/cursed rlhf in gpt-4
English
1
0
0
99
kalomaze (is at iclr)
kalomaze (is at iclr)@kalomaze·
echoes of that one deprecated gemini ckpt, and early pre-sycophancy 4o checkpoints (yes, the early ones that introduced the em dash pre-RLHF makeover, and also seemed to really really like curly quotes beyond all logical justification)
English
7
0
43
2.7K
jeremy
jeremy@jercarin·
@AlexPalcuie fwiw, sonnet 4.6 is similarly quite funny (more than opus). we have it in household group chat and it provides great content
English
0
0
5
676
palcu
palcu@AlexPalcuie·
oh and one more thing about mythos preview -- genuinely good company in our slack
palcu tweet media
English
15
17
415
19K
jeremy
jeremy@jercarin·
@stalkermustang @jukan05 My impression (based on some epoch AI reporting I think that I can’t find right now) is all of these models have been midtraining + RL on top of the gpt-4o base. Happy to be proven wrong but I think given all the news about these new pretrains coming out it’s likely true.
English
1
0
1
260
Igor Kotenkov
Igor Kotenkov@stalkermustang·
@jercarin @jukan05 I'm sure there was a new pretrain base model after original GPT-5. Not sure where exactly though, 5.2 or 5.3 or 5.4.
English
3
0
5
772
Jukan
Jukan@jukan05·
AI lab folks, when the hell is the Blackwell-trained model finally dropping? Doesn’t look like it’s Gemini, and people are going crazy saying Claude Mythos is performing ridiculously well. Was that trained on Blackwell?
English
40
6
513
87.4K
Igor Kotenkov
Igor Kotenkov@stalkermustang·
@jukan05 weren't GPT-5.4 / 5.3-Codex pretrained on blackwell?
Igor Kotenkov tweet mediaIgor Kotenkov tweet media
English
1
0
57
8K
jeremy
jeremy@jercarin·
@AndrewCurran_ Are people sure that Gemini 3 wasn't a larger pretrain than 4.5? I had this impression that it was a larger or comperable pretrain with a very weak/bad posttrain, just from vibes. in other words, I have no idea :)
English
0
0
2
621
Andrew Curran
Andrew Curran@AndrewCurran_·
From the post:
Andrew Curran tweet media
Krishna Kaasyap@krishnakaasyap

From QT - //But if Anthropic found that training above a certain scale, or in a certain way at that scale, produces capabilities that sit far above the prior trendline, then that is an architectural breakthrough.// I believe this is the case, not just because an architectural and algorithmic breakthrough at this scale cannot be achieved in isolation, but also because, even if it were, it would soon leak via employee turnover, corporate espionage, or many other means. The moat of a frontier lab lies in enormously scaling an advancement, or simply in scaling a Transformer++ arch. I don't think any of the frontier lab would purely bet on an architectural or algorithmic breakthrough (that could be easily replicated like CoT reasoning/thinking was replicated by almost everyone) for them to be at the frontier! In addition to this business logic, research from @EpochAIResearch supports the same conclusion. From @ansonwhho's research - //For example, @MITFutureTech found that shifting from LSTMs (green) to Modern Transformers (purple) has an efficiency gain that depends on the compute scale: - At 1e15 FLOP, the gain is 6.3× - At 3e16 FLOP, the gain is 26× Naively extrapolating to 1e23 FLOP, the gain is 20,000×!// If Anthropic found that training above a certain scale... produces capabilities that sit far above the prior trendline... they would definitely attempt it, as it can be done by only two other labs in the world. This is especially relevant given that those two labs have their tentacles in everything from adult content slop to search engine & browser wars, thinning their available compute for a final training run of a single model. Source - epoch.ai/gradient-updat… Since past final training runs have typically accounted for <30% of total R&D compute, a significant amount of compute remains unused for these runs. It is possible that the compute allocated for final training runs has now been increased substantially. The largest final training run known to humanity occurred in late 2024 for GPT-4.5, which OpenAI officially released on February 27, 2025. Not a single GB200 NVL72 was available at that time. However, by early 2026, we have access to thousands of GB200 and GB300 NVL72 racks, along with more diversified compute from AMD, Google (TPUs), AWS (Tranium), and many other providers. All available evidence and reasonable inferences suggest that the observed step-change improvements are likely large primarily because Ant scaled final training run compute significantly, rather than due to a multitude of new innovations. Source - epoch.ai/gradient-updat… Total @EpochAIResearch victory - @datagenproc @cherylwoooo @Jsevillamol and the team!

English
10
34
428
63.5K
jeremy
jeremy@jercarin·
@dbreunig tbf I'm not sure this is a fully novel change for people deep in something technical. for example coming from OS/systems, I say "context switch overhead" or "race condition" way too much when describing things. it just so happens the technical thing here acts a lot like a human
English
1
0
4
316
Drew Breunig
Drew Breunig@dbreunig·
One takeaway from the recent Andreessen interview, and something I'm seeing more of lately, is that people _deep_ in AI have started using the LLM as a metaphor for reasoning about how brains and intelligence works. "You’re a 15 second sliding context window." Here we see it again, "Kepler was a high temperature LLM." I'm not sure what to take from this, other than many in this field/culture have crossed the line from projecting our theories about intelligence onto AI architecture designs, to projecting AI architecture designs on ourselves. Meanwhile, most people have no clue how an LLM even replies, let alone what a context window is.
Dwarkesh Patel@dwarkesh_sp

The Terence Tao episode. We begin with the absolutely ingenious and surprising way in which Kepler discovered the laws of planetary motion. People sometimes say that AI will make especially fast progress at scientific discovery because of tight verification loops. But the story of how we discovered the shape of our solar system shows how the verification loop for correct ideas can be decades (or even millennia) long. During this time, what we know today as the better theory can often actually make worse predictions (Copernicus's model of circular orbits around the sun was actually less accurate than Ptolemy's geocentric model). And the reasons it survives this epistemic hell is some mixture of judgment and heuristics that we don’t even understand well enough to actually articulate, much less codify into an RL loop. Hope you enjoy! 0:00:00 – Kepler was a high temperature LLM 0:11:44 – How would we know if there’s a new unifying concept within heaps of AI slop? 0:26:10 – The deductive overhang 0:30:31 – Selection bias in reported AI discoveries 0:46:43 – AI makes papers richer and broader, but not deeper 0:53:00 – If AI solves a problem, can humans get understanding out of it? 0:59:20 – We need a semi-formal language for the way that scientists actually talk to each other 1:09:48 – How Terry uses his time 1:17:05 – Human-AI hybrids will dominate math for a lot longer Look up Dwarkesh Podcast on YouTube, Apple Podcasts, or Spotify.

English
16
4
79
14.7K
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
if your skill depends on dynamic content, you can embed !`command` in your SKILL.md to inject shell output directly into the prompt Claude Code runs it when the skill is invoked and swaps the placeholder inline, the model only sees the result!
Lydia Hallie ✨ tweet media
English
128
249
3K
851.3K