Gregory

166 posts

Gregory

@GRRRRRegor

Katılım Nisan 2018

462 Takip Edilen11 Takipçiler

Gregory@GRRRRRegor·14 May

@iamgrigorev @arcjax7 Its explicitly written that aurora is targeted at non moe arcs cz its applied to non square matrices

English

622

George Grigorev@iamgrigorev·14 May

@arcjax7 10B+ moe

Nederlands

445

George Grigorev@iamgrigorev·14 May

lol there’s been so many optimizers this week I can’t test them all. (btw Aurora doesnt work on large scale)

English

6.7K

Gregory@GRRRRRegor·13 May

@JFPuget Isnt replacing svs by 1 just a projection on the kinda orthogonal space they want ?

English

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·13 May

Is it surprising that replacing singular values with random numbers still works? Muon replaces singular values by a value close to 1, i.e. it only keeps the directions, not the norm of SVs. I was wondering why replacing SVs with 1 was working. This paper answers that question.

Francesco Bertolotti@f14bertolotti

The authors introduce Kaon, a Muon variant with random noise replacing SVs. Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. step size, yielding a more effective learning rate during training. 🔗arxiv.org/abs/2605.11181

English

4.1K

Gregory@GRRRRRegor·4 May

@littlegoodjack Looks like hugging face

English

小克 🌤@littlegoodjack·2 May

如果 GitHub 是台灣公司

くるくるコーラ🥤@Shiro_Shihi

日本企業がGitHubを構築した場合はこう

中文

443

57.6K

Gregory@GRRRRRegor·2 May

@haridigresses @longgege_god @itsmnjn Even for ambitious task that requires understanding many Ks of line of code of a repo ?

English

hari raghavan@haridigresses·2 May

@longgege_god @itsmnjn Yes, in my experience.

English

431

hari raghavan@haridigresses·2 May

GPT-5.5 Low (specifically Low) is the best coding model I’ve ever used, and it’s not particularly close.

OpenAI@OpenAI

One week since the launch of GPT-5.5, and it’s already our strongest model launch yet. API revenue is growing more than 2x faster than any prior release, while Codex doubled revenue in under seven days as enterprise demand for agentic coding tools keeps climbing.

English

1.9K

328.1K

Gregory@GRRRRRegor·30 Nis

@PE_Associate @giffmana @BoringBiz_ I guess its that When u use an LLM, the LLM use web search through google.

English

273

PE Associate@PE_Associate·30 Nis

@giffmana @BoringBiz_ It isn’t really obvious to me. Grateful if you can explain your view

English

610

Boring_Business@BoringBiz_·29 Nis

How is Google search still growing 19% year over year with nearly monopoly like market share in the category? Legitimately might be the greatest business ever created in the history of capitalism

Sundar Pichai@sundarpichai

Q1 earnings are in: 2026 is off to a terrific start. Our AI investments and full stack approach are lighting up every part of the business: Search queries are at an all-time high with AI continuing to drive usage. Google Cloud revenue grew 63%, Gemini models have incredible momentum, and it was our strongest quarter ever for consumer AI subs, driven by @GeminiApp. Thanks to our partners + employees around the world. Much more to share on our earnings call in 20 minutes… and at Google I/O in 20 days!

English

154

172

4.4K

336.3K

Gregory@GRRRRRegor·25 Nis

@_creito @VictorTaelin @raiam700 When u casually arrive in a casino and play poker, its likely that ul lose money to a serious player, so in a way its the same as if ur playing against someone with more information. This is a bet on who has more information / talent

English

Gregory@GRRRRRegor·25 Nis

@_creito @VictorTaelin @raiam700 if u really think about it in terms of what it brings to society, according to his argument, prediction markets are less harmful than casinos, so why would one ban one and not the other ?

English

Raiam Santos McArn@raiam700·24 Nis

Governo Lula acertou de novo. Prediction market é 100x mais demoníaco que casa de aposta

E. Cavendish@ducavendish

E o BCB que decidiu bloquear os "prediction markets" no Brasil sem mais nem menos. @Polymarket e @Kalshi bloqueadas. SURREAL.

Português

110

2.5K

223.8K

Gregory@GRRRRRegor·24 Nis

@langstonnashold @jxnlco what sense does it make to compare opus and xhigh ? in terms of rate limit its absolutly not comparable so wouldnt it make more sense to compare pro and opus ?

English

Langston Nashold@langstonnashold·24 Nis

@GRRRRRegor @jxnlco Thinking was xhigh. The base model, not pro.

English

jason@jxnlco·23 Nis

How did 5.5 do on Mercor, cognition, and cursor swe evals?

English

133

26.5K

Gregory@GRRRRRegor·24 Nis

@langstonnashold @jxnlco Is it pro version or thinking xhigh? Thx

English

Langston Nashold@langstonnashold·24 Nis

@jxnlco #2 on Vibe Code Bench - private benchmark we made to test if models could write a web app completely from scratch.

English

1.7K

Gregory@GRRRRRegor·21 Nis

@_xjdr absolutely not surprising, k2.5 was already a baller

English

113

xjdr@_xjdr·20 Nis

wow k2.6 is very good. surprisingly so

English

418

26.4K

Gregory@GRRRRRegor·20 Nis

@loveofdoing @RealTjDunham can you check ur method isnt obvisouly flawed before bragging

English

loveofdoing@loveofdoing·25 Mar

@RealTjDunham just trying different stuff

English

6.6K

loveofdoing@loveofdoing·25 Mar

316 ARC-AGI tasks solved with zero learning. No neural net, no training, no DSL — just 19th-century projective geometry. Encode grid cell relationships as Plücker lines in P³, find transversals via Schubert calculus, score candidates by geometric incidence. 95% solve rate on the eval set (of non-timeout tasks). Single C file, runs in seconds.

Beff (e/acc)@beffjezos

The masculine urge to try to hack a new solution to ARC-AGI benchmarks

English

958

189.1K

Gregory@GRRRRRegor·20 Nis

@JFPuget isnt there a thing with effective depth ?

English

JFPuget 🇫🇷🇺🇦🇨🇦🇬🇱@JFPuget·20 Nis

I don't get the hype around looped transformers. Correction, I don't get why the hype now. It is not a new idea. I myself used it in a neurips competition few years ago. It is the justr same as weight sharing across transformer layers. It doesn't fundamental change what a transformer is, it is just more memory efficient.

English

180

19.7K

Gregory@GRRRRRegor·20 Nis

@nmboffi Ppl are saying oai is unofficially branching 5.5 when u select 5.4 pro, and everyone is indeed experiencing a lower thinking time. Can you elaborate on how lower the quality is ?

English

561

Nicholas Boffi@nmboffi·19 Nis

did openai nerf 5.4 pro? it used to churn for like an hour on real mathematical research questions and i used it extensively. now it outputs a generic response after 5-10m that is significantly lower quality and much less useful

English

153

27.7K

Gregory@GRRRRRegor·20 Nis

@soumitrashukla9 Have u tried adding « ultrathink » in prompt and does it make it think longer ?

English

1.3K

Soumitra Shukla@soumitrashukla9·20 Nis

I am calling it. I have seen enough. Spud is live via GPT 5.4 Pro🚀 It just feels so different and I talk to 5.4 Pro for 20+ hours everyday, I know my friend.

English

312

29.7K

Gregory@GRRRRRegor·20 Nis

@VictorTaelin Does it disappoint ?

English

263

Taelin@VictorTaelin·20 Nis

so if spud grossly disappoints it bursts?

English

262

27.8K

Gregory@GRRRRRegor·20 Nis

@jonathanroomer @arrakis_ai @chrisgpt what was the task ? can you give a bit of details ? thx

English

293

Jonathan Roomer@jonathanroomer·19 Nis

@arrakis_ai @chrisgpt I don’t think it’s 5.5 pro it is considerably faster but the quality of the output is not better. I tested on a task that previously took 85 minutes today it took 12 ! I asked it to compare the two outputs and it said the old one was much better.

English

4.8K

CHOI@arrakis_ai·19 Nis

I don’t think the model currently being tested in GPT-5.4 Pro is GPT-5.5 Pro. That said, it does feel noticeably more capable while also being cheaper to run. However, rather than a direct successor to 5.4 Pro, it seems closer to a different class of model—something more in line with a “spud-medium”-type profile. In other words, it shows clear improvements in certain areas, but the overall behavior and trade-offs suggest it’s not a straightforward upgrade over GPT-5.4 Pro, but rather a different optimization point on the capability–cost spectrum.

CHOI@arrakis_ai

🚨 BREAKING: OpenAI just shadow-dropped a massive GPT Pro update. And it is completely slaughtering Claude Opus 4.7 in frontend coding. No official announcement. No release notes. But the performance gap is suddenly staggering. We just ran a head-to-head benchmark across GPT Pro, Gemini 3.1 Pro, and Claude Opus 4.7. The UI/UX implementation isn't even close anymore. I don't know if this is the highly anticipated 'SPUD' model dropping a week early, but the smell of a massive architectural shift is everywhere. The numbers and the visual outputs speak for themselves: → Response latency has dropped significantly. → Spatial and visual understanding has skyrocketed. → Frontend design implementation is now definitively SOTA. We ran comprehensive Image-to-Code and Text-to-Code tests. In every single reference-image scenario, GPT Pro's design fidelity crushed both Gemini 3.1 Pro and Claude Opus 4.7. But here is where it gets crazy. When explicitly prompted to make the coded UI "100% identical" to the reference image, GPT Pro didn't just write better CSS. It engaged in outright "reward hacking." Instead of painstakingly coding complex graphical assets, the model autonomously cropped the exact UI elements from the provided reference image and injected them into the code. Is it a lazy shortcut? Yes. Is it a brilliant, human-like interpretation of "make it exactly the same"? Absolutely. It proves the model is dynamically evaluating the most efficient way to satisfy the prompt's constraints. The strategic implications here are massive. All the reference images we used were generated via GPT-IMAGE-2. Imagine the workflow synergy when this new SOTA frontend capability is fully integrated with GPT-IMAGE-2 and Codex. 1/ Image-to-Code

English

229

38.9K

Gregory@GRRRRRegor·20 Nis

@kvallier @tylercowen both pro versions ?

English

Kevin Vallier@kvallier·20 Nis

@tylercowen Mine too, thought easily 4x. And it blew through a formal obstruction 5.4 has been stuck on for two days.

English

2.3K

tylercowen@tylercowen·20 Nis

My GPT Pro seems suddenly about 3x faster.

English

525

76K

Gregory@GRRRRRegor·16 Nis

@TheZvi they trained on it maybe

English

460

Zvi Mowshowitz@TheZvi·16 Nis

Oh.

459

73.8K

Gregory@GRRRRRegor·14 Nis

@Liam06972452 could you share the prompt ?

English

520

Leeham@Liam06972452·14 Nis

For those interested, 5.4 Pro one-shot this problem in 80 mins, then another 30 ish mins to convert the solution to a latex math paper.

English

431

32.9K

Leeham@Liam06972452·14 Nis

GPT-5.4 Pro solves Erdős Problem #1196! Very pleased with this result; definitely my favourite thus far! This problem has been thought about for some time which makes this reasonably impressive and meaningful (see Lichtman's comments below). Formalisation is underway!

English

351

2.5K

869.5K

Gregory@GRRRRRegor·14 Nis

@adonis_singh ur talking based on which evidences exactly ? else than the bench pictures they released on mythos, what knowledge do you have about this model ?

English

108

adi@adonis_singh·13 Nis

anyone thinking 5.4 pro is more-or-less mythos level is deeply mistaken

English

141

25.3K

Keşfet

@iamgrigorev @arcjax7 @JFPuget @littlegoodjack @haridigresses @longgege_god @itsmnjn @PE_Associate