Gregory

166 posts

Gregory

Gregory

@GRRRRRegor

Katılım Nisan 2018
462 Takip Edilen11 Takipçiler
Gregory
Gregory@GRRRRRegor·
@iamgrigorev @arcjax7 Its explicitly written that aurora is targeted at non moe arcs cz its applied to non square matrices
English
0
0
1
622
George Grigorev
George Grigorev@iamgrigorev·
lol there’s been so many optimizers this week I can’t test them all. (btw Aurora doesnt work on large scale)
English
6
0
73
6.7K
Gregory
Gregory@GRRRRRegor·
@JFPuget Isnt replacing svs by 1 just a projection on the kinda orthogonal space they want ?
English
0
0
0
75
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
Is it surprising that replacing singular values with random numbers still works? Muon replaces singular values by a value close to 1, i.e. it only keeps the directions, not the norm of SVs. I was wondering why replacing SVs with 1 was working. This paper answers that question.
Francesco Bertolotti@f14bertolotti

The authors introduce Kaon, a Muon variant with random noise replacing SVs. Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. step size, yielding a more effective learning rate during training. 🔗arxiv.org/abs/2605.11181

English
1
1
26
4.1K
Boring_Business
Boring_Business@BoringBiz_·
How is Google search still growing 19% year over year with nearly monopoly like market share in the category? Legitimately might be the greatest business ever created in the history of capitalism
Sundar Pichai@sundarpichai

Q1 earnings are in: 2026 is off to a terrific start. Our AI investments and full stack approach are lighting up every part of the business: Search queries are at an all-time high with AI continuing to drive usage. Google Cloud revenue grew 63%, Gemini models have incredible momentum, and it was our strongest quarter ever for consumer AI subs, driven by @GeminiApp. Thanks to our partners + employees around the world. Much more to share on our earnings call in 20 minutes… and at Google I/O in 20 days!

English
154
172
4.4K
336.3K
Gregory
Gregory@GRRRRRegor·
@_creito @VictorTaelin @raiam700 When u casually arrive in a casino and play poker, its likely that ul lose money to a serious player, so in a way its the same as if ur playing against someone with more information. This is a bet on who has more information / talent
English
0
0
1
10
Gregory
Gregory@GRRRRRegor·
@_creito @VictorTaelin @raiam700 if u really think about it in terms of what it brings to society, according to his argument, prediction markets are less harmful than casinos, so why would one ban one and not the other ?
English
1
0
1
18
Gregory
Gregory@GRRRRRegor·
@langstonnashold @jxnlco what sense does it make to compare opus and xhigh ? in terms of rate limit its absolutly not comparable so wouldnt it make more sense to compare pro and opus ?
English
0
0
0
17
jason
jason@jxnlco·
How did 5.5 do on Mercor, cognition, and cursor swe evals?
English
4
0
133
26.5K
Langston Nashold
Langston Nashold@langstonnashold·
@jxnlco #2 on Vibe Code Bench - private benchmark we made to test if models could write a web app completely from scratch.
Langston Nashold tweet media
English
4
0
9
1.7K
Gregory
Gregory@GRRRRRegor·
@_xjdr absolutely not surprising, k2.5 was already a baller
English
0
0
0
113
xjdr
xjdr@_xjdr·
wow k2.6 is very good. surprisingly so
English
15
8
418
26.4K
Gregory
Gregory@GRRRRRegor·
@JFPuget isnt there a thing with effective depth ?
English
0
0
0
37
JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱
I don't get the hype around looped transformers. Correction, I don't get why the hype now. It is not a new idea. I myself used it in a neurips competition few years ago. It is the justr same as weight sharing across transformer layers. It doesn't fundamental change what a transformer is, it is just more memory efficient.
English
28
9
180
19.7K
Gregory
Gregory@GRRRRRegor·
@nmboffi Ppl are saying oai is unofficially branching 5.5 when u select 5.4 pro, and everyone is indeed experiencing a lower thinking time. Can you elaborate on how lower the quality is ?
English
0
0
1
561
Nicholas Boffi
Nicholas Boffi@nmboffi·
did openai nerf 5.4 pro? it used to churn for like an hour on real mathematical research questions and i used it extensively. now it outputs a generic response after 5-10m that is significantly lower quality and much less useful
English
32
4
153
27.7K
Gregory
Gregory@GRRRRRegor·
@soumitrashukla9 Have u tried adding « ultrathink » in prompt and does it make it think longer ?
English
0
0
0
1.3K
Soumitra Shukla
Soumitra Shukla@soumitrashukla9·
I am calling it. I have seen enough. Spud is live via GPT 5.4 Pro🚀 It just feels so different and I talk to 5.4 Pro for 20+ hours everyday, I know my friend.
English
19
6
312
29.7K
Taelin
Taelin@VictorTaelin·
so if spud grossly disappoints it bursts?
English
41
2
262
27.8K
Jonathan Roomer
Jonathan Roomer@jonathanroomer·
@arrakis_ai @chrisgpt I don’t think it’s 5.5 pro it is considerably faster but the quality of the output is not better. I tested on a task that previously took 85 minutes today it took 12 ! I asked it to compare the two outputs and it said the old one was much better.
English
2
0
11
4.8K
CHOI
CHOI@arrakis_ai·
I don’t think the model currently being tested in GPT-5.4 Pro is GPT-5.5 Pro. That said, it does feel noticeably more capable while also being cheaper to run. However, rather than a direct successor to 5.4 Pro, it seems closer to a different class of model—something more in line with a “spud-medium”-type profile. In other words, it shows clear improvements in certain areas, but the overall behavior and trade-offs suggest it’s not a straightforward upgrade over GPT-5.4 Pro, but rather a different optimization point on the capability–cost spectrum.
CHOI@arrakis_ai

🚨 BREAKING: OpenAI just shadow-dropped a massive GPT Pro update. And it is completely slaughtering Claude Opus 4.7 in frontend coding. No official announcement. No release notes. But the performance gap is suddenly staggering. We just ran a head-to-head benchmark across GPT Pro, Gemini 3.1 Pro, and Claude Opus 4.7. The UI/UX implementation isn't even close anymore. I don't know if this is the highly anticipated 'SPUD' model dropping a week early, but the smell of a massive architectural shift is everywhere. The numbers and the visual outputs speak for themselves: → Response latency has dropped significantly. → Spatial and visual understanding has skyrocketed. → Frontend design implementation is now definitively SOTA. We ran comprehensive Image-to-Code and Text-to-Code tests. In every single reference-image scenario, GPT Pro's design fidelity crushed both Gemini 3.1 Pro and Claude Opus 4.7. But here is where it gets crazy. When explicitly prompted to make the coded UI "100% identical" to the reference image, GPT Pro didn't just write better CSS. It engaged in outright "reward hacking." Instead of painstakingly coding complex graphical assets, the model autonomously cropped the exact UI elements from the provided reference image and injected them into the code. Is it a lazy shortcut? Yes. Is it a brilliant, human-like interpretation of "make it exactly the same"? Absolutely. It proves the model is dynamically evaluating the most efficient way to satisfy the prompt's constraints. The strategic implications here are massive. All the reference images we used were generated via GPT-IMAGE-2. Imagine the workflow synergy when this new SOTA frontend capability is fully integrated with GPT-IMAGE-2 and Codex. 1/ Image-to-Code

English
21
5
229
38.9K
Kevin Vallier
Kevin Vallier@kvallier·
@tylercowen Mine too, thought easily 4x. And it blew through a formal obstruction 5.4 has been stuck on for two days.
English
1
0
8
2.3K
tylercowen
tylercowen@tylercowen·
My GPT Pro seems suddenly about 3x faster.
English
31
8
525
76K
Gregory
Gregory@GRRRRRegor·
@TheZvi they trained on it maybe
English
0
0
0
460
Leeham
Leeham@Liam06972452·
For those interested, 5.4 Pro one-shot this problem in 80 mins, then another 30 ish mins to convert the solution to a latex math paper.
English
19
14
431
32.9K
Leeham
Leeham@Liam06972452·
GPT-5.4 Pro solves Erdős Problem #1196! Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman's comments below). Formalisation is underway!
Leeham tweet mediaLeeham tweet media
English
80
351
2.5K
869.5K
Gregory
Gregory@GRRRRRegor·
@adonis_singh ur talking based on which evidences exactly ? else than the bench pictures they released on mythos, what knowledge do you have about this model ?
English
0
0
0
108
adi
adi@adonis_singh·
anyone thinking 5.4 pro is more-or-less mythos level is deeply mistaken
English
24
0
141
25.3K