Aurko Roy

107 posts

Aurko Roy

@aurko79

San Francisco Katılım Şubat 2025

289 Takip Edilen2.1K Takipçiler

Sabitlenmiş Tweet

Aurko Roy@aurko79·4 Tem

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales better under token constraints than dot product attention It was fun collaborating with amazing folks including @dvsaisurya @_arohan_ and others

English

811

147.2K

Aurko Roy@aurko79·4d

Still found the GitHub repo after all these years github.com/royaurko/primes

English

171

Aurko Roy@aurko79·4d

Still remember implementing the Miller–Rabin primality test as an undergrad — one of those rare algorithms that feels almost magical in its simplicity. RIP.

Moshe Vardi@vardi

Michael Rabin passed away. Sic transit :-( en.wikipedia.org/wiki/Michael_O…

English

Aurko Roy retweetledi

Sebastian Pokutta@spokutta·6d

For a decade it was open whether Frank-Wolfe's O(1/√ε) rate on strongly convex sets is tight. We show it is: Ω(1/√ε), even for a simple quadratic on a unit ball. With J. Halbey, D. Deza, @maxzimmerberlin, @chrisrx13, @b_stellato. 1/2

English

8.1K

Aurko Roy@aurko79·8 Nis

@_arohan_ Congrats!

English

545

rohan anil@_arohan_·7 Nis

At least several code reds in menlo park, seattle and mountain view?

English

148

11.8K

rohan anil@_arohan_·7 Nis

😛📈

leo 🐾@synthwavedd

Mythos scores 77.8% on SWE-Bench Pro (vs 53.4% for Opus 4.6), and 82% on Terminal-Bench 2.0 (up from 65.4% on Opus 4.6)

ART

23.9K

Aurko Roy retweetledi

Microsoft AI@MicrosoftAI·19 Mar

Meet MAI‑Image‑2. Built with creatives, for real creative work. Ranked #5 on @arena’s text‑to‑image leaderboard. Available now: msft.it/6014QUCBe

English

120

839

114.2K

Aurko Roy@aurko79·29 Oca

@_arohan_ Yonghui was claude coding before LLMs were even a thing.

English

8.9K

Aurko Roy@aurko79·29 Oca

@_arohan_ Haha. Just rooting for the idea, I think we may have been too early :) arxiv.org/abs/2207.06366

English

739

rohan anil@_arohan_·29 Oca

@aurko79 Aurko “Schmidhuber” Roy 🫡

Indonesia

1.6K

Aurko Roy@aurko79·29 Oca

Wait so everyone is scaling N-grammer now?

Meituan LongCat@Meituan_LongCat

🚀 Scaling embeddings, not just experts—introducing a new path for efficient LLMs. Key Finding: In high-sparsity scenarios, N-gram embeddings yield a better Pareto frontier than just adding more MoE experts. Therefore, we introduce LongCat-Flash-Lite—the first opensource model built on this insight. ⚙️ 68.5B Total Params(37.13B non-embedding) | 2.9B~4.5B Active 📊 High Performance: SWE-Bench 54.4 | τ²-Bench 72.8 | TerminalBench 33.75 📃 256K Context Window (YARN-powered) ✨ Optimized for Agentic/Coding, strong in general reasoning ⚡ ~700 tokens/s peak inference speed The result: Achieves competitive performance within its scale at a significantly lower cost and latency. Hugging Face: huggingface.co/meituan-longca… Tech Report: huggingface.co/meituan-longca…

English

3.8K

Aurko Roy@aurko79·24 Oca

@hardmaru Congrats David!

English

812

hardmaru@hardmaru·23 Oca

I founded Sakana AI after my time at Google, so it is incredibly meaningful to be able to partner with them now. It feels like a special connection to be working together again to advance the AI ecosystem in Japan. #en" target="_blank" rel="nofollow noopener">sakana.ai/google#en

Sakana AI@SakanaAILabs

We are thrilled to announce a strategic partnership with Google! Google is also making a financial investment in Sakana AI to strengthen this collaboration. This underscores their recognition of our technical depth and our mission to advance AI in Japan. We are combining Google’s world-class products with our agile R&D to tackle complex challenges. By leveraging models like Gemini and Gemma, we will accelerate our breakthroughs in automated scientific discovery. Our work on The AI Scientist and ALE-Agent has already demonstrated the power of these models. Now we are going further. We are scaling our deployment of reliable AI in mission-critical sectors. We are working with financial institutions and government organizations to deliver solutions that meet the highest standards of security and data sovereignty. We are excited to drive the widespread adoption of reliable AI and advance Japan’s AI ecosystem together!

English

755

251.5K

Aurko Roy retweetledi

Alex Cui@alexcdot·21 Oca

Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇

English

280

1.4K

6.3K

996.5K

Aurko Roy@aurko79·17 Oca

@HarvardMath I really enjoyed watching his abstract algebra videos in undergrad, RIP.

English

904

Harvard Department of Mathematics@HarvardMath·16 Oca

It is with heavy hearts that we say farewell to Professor Emeritus Benedict Gross, who passed away in December, 2025. We collected stories, anecdotes, and recollections about Gross’ impact on the lives of his friends, colleagues, and former students. math.harvard.edu/in-memory-of-p…

English

475

53.5K

Aurko Roy@aurko79·26 Ara

@_arohan_ @SeunghyunSEO7 gradients are low rank but not that low rank :)

English

415

rohan anil@_arohan_·25 Ara

@SeunghyunSEO7 Adafactor and its rsqrt schedule probably set back many projects from T5 and Palm-1 as they all were under trained.

English

6.6K

Seunghyun Seo@SeunghyunSEO7·25 Ara

when most people ask, “is this proven?”, CHADs just do it. Noam created Transformer to push parallelism for scale, and already knew adam’s optim state would become a bottleneck. so he created Adafactor, a rank-1 approx adam, and casually used it to train a 540B LLM in 2022

Simon Mo@simon_mo_

In the docstring, Noam simply wrote: "Noam just made this up. Replacement for Zero++ gradient compression" and it ended up unblocking large scale run on almost impossible cluster topology.

English

314

37.5K

Aurko Roy retweetledi

Ashish Vaswani@ashVaswani·11 Ara

Rnj-1-Instruct is now the #1 trending text generation model on HF!

English

405

59.2K

Aurko Roy retweetledi

rohan anil@_arohan_·11 Ara

Math startups beefing with each other on who is first at RLing math for competition on my timeline. Math is beautiful enough that history will eventually correct it who did what: Zehfuss showed the actual factorization like what we call Kronecker product Or contributions of Cardano and his predecessor en.wikipedia.org/wiki/Cubic_equ… Feel the prize of discovery in math is beyond competition math and everyone will look silly beefing on undergraduate math.

English

2.6K

Aurko Roy@aurko79·10 Ara

We all stand on the shoulders of giants - some simply care less for power and recognition, and choose quiet brilliance instead.

Eve Bodnia@evelovesolive

Actually I was lucky to meet Grisha as a teen way before he became famous. I had an internship not far from a park where he used to walk, and he was going for a walk every day same route same time. He was quite famous in the neighborhood bc of his non standard look (according to slavic standards). His response to me on why he declined was that he didn’t feel like it was fair. He said he was building his proof on top of works of other great mathematicians and none of them got any medals, but without them he wouldn’t solved it

English

4.2K

Aurko Roy@aurko79·7 Ara

@ashVaswani Congrats on the great work @ashVaswani !

English

411

Aurko Roy retweetledi

Ashish Vaswani@ashVaswani·6 Ara

We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….

Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English

104

169

1.8K

605.7K

Aurko Roy@aurko79·6 Ara

Amazing work by my good friend and ex-colleague, many congratulations!

Ashish Vaswani@ashVaswani

English

6.7K

Aurko Roy retweetledi

Lisan al Gaib@scaling01·11 Ağu

I believe the future of LLMs is in bidirectional objectives and more complex attention mechanisms the reason: we are data constrained, compute keeps growing and the data constrains also put a limit to the maximum efficiently usable parameter count* - bidirectional objectives help out with the data-constraints and makes use of more FLOPS during training - more complex attention mechanisms make use of extra FLOPS during inference and change the slope of the scaling laws From the "Diffusion Language Models are Super Data Learners" Blog on why Diffusion Language Models perform well in data constrained environments: MLM-U is the most interesting approach I have seen so far for bidirectional training, but autoregressive decoding. It also "exposes the model to a diverse distribution of token orderings": Now pair this with something like 2-simplical attention. jinjieni.notion.site/Diffusion-Lang… arxiv.org/pdf/2406.05183 arxiv.org/abs/2507.02754 *a 100T param LLM doesn't make much sense if you only have 50-100T tokens, + economics, you centralize too much

English

11.4K

Aurko Roy@aurko79·24 Kas

@_arohan_ Congrats!!

English

248

rohan anil@_arohan_·24 Kas

Incredible to see what a talented team can do. Previous SoTA(s) lasted less than a week.

Claude@claudeai

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

English

378

37.5K

Keşfet

@maxzimmerberlin @chrisrx13 @b_stellato @_arohan_ @arena @hardmaru @GoogleDeepMind @Meta