Aurko Roy

107 posts

Aurko Roy banner
Aurko Roy

Aurko Roy

@aurko79

Math & computer science | @AIatMeta (2025-2025) | @GoogleDeepmind (2023-2025) | @GoogleAI (Brain) (2017-2023) | CS PhD @Georgiatech | CS @IITKanpur

San Francisco เข้าร่วม Şubat 2025
289 กำลังติดตาม2.1K ผู้ติดตาม
ทวีตที่ปักหมุด
Aurko Roy
Aurko Roy@aurko79·
Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales better under token constraints than dot product attention It was fun collaborating with amazing folks including @dvsaisurya @_arohan_ and others
Aurko Roy tweet media
English
26
94
811
147.2K
Aurko Roy รีทวีตแล้ว
Sebastian Pokutta
Sebastian Pokutta@spokutta·
For a decade it was open whether Frank-Wolfe's O(1/√ε) rate on strongly convex sets is tight. We show it is: Ω(1/√ε), even for a simple quadratic on a unit ball. With J. Halbey, D. Deza, @maxzimmerberlin, @chrisrx13, @b_stellato. 1/2
English
1
9
84
8.1K
rohan anil
rohan anil@_arohan_·
At least several code reds in menlo park, seattle and mountain view?
English
5
0
148
11.8K
Aurko Roy รีทวีตแล้ว
Microsoft AI
Microsoft AI@MicrosoftAI·
Meet MAI‑Image‑2. Built with creatives, for real creative work. Ranked #5 on @arena’s text‑to‑image leaderboard. Available now: msft.it/6014QUCBe
English
61
121
840
114.2K
Aurko Roy
Aurko Roy@aurko79·
@_arohan_ Yonghui was claude coding before LLMs were even a thing.
English
1
0
37
8.9K
rohan anil
rohan anil@_arohan_·
@aurko79 Aurko “Schmidhuber” Roy 🫡
Indonesia
1
0
18
1.6K
Aurko Roy รีทวีตแล้ว
Alex Cui
Alex Cui@alexcdot·
Okay so, we just found that over 50 papers published at @Neurips 2025 have AI hallucinations I don't think people realize how bad the slop is right now It's not just that researchers from @GoogleDeepMind, @Meta, @MIT, @Cambridge_Uni are using AI - they allowed LLMs to generate hallucinations in their papers and didn't notice at all. It's insane that these made it through peer review👇
Alex Cui tweet media
English
280
1.4K
6.3K
996.5K
Aurko Roy
Aurko Roy@aurko79·
@HarvardMath I really enjoyed watching his abstract algebra videos in undergrad, RIP.
English
0
0
10
904
Harvard Department of Mathematics
It is with heavy hearts that we say farewell to Professor Emeritus Benedict Gross, who passed away in December, 2025. We collected stories, anecdotes, and recollections about Gross’ impact on the lives of his friends, colleagues, and former students. math.harvard.edu/in-memory-of-p…
English
15
75
475
53.5K
rohan anil
rohan anil@_arohan_·
@SeunghyunSEO7 Adafactor and its rsqrt schedule probably set back many projects from T5 and Palm-1 as they all were under trained.
English
3
4
78
6.6K
Seunghyun Seo
Seunghyun Seo@SeunghyunSEO7·
when most people ask, “is this proven?”, CHADs just do it. Noam created Transformer to push parallelism for scale, and already knew adam’s optim state would become a bottleneck. so he created Adafactor, a rank-1 approx adam, and casually used it to train a 540B LLM in 2022
Seunghyun Seo tweet media
Simon Mo@simon_mo_

In the docstring, Noam simply wrote: "Noam just made this up. Replacement for Zero++ gradient compression" and it ended up unblocking large scale run on almost impossible cluster topology.

English
4
20
314
37.5K
Aurko Roy รีทวีตแล้ว
Ashish Vaswani
Ashish Vaswani@ashVaswani·
Rnj-1-Instruct is now the #1 trending text generation model on HF!
Ashish Vaswani tweet media
English
22
33
405
59.2K
Aurko Roy รีทวีตแล้ว
rohan anil
rohan anil@_arohan_·
Math startups beefing with each other on who is first at RLing math for competition on my timeline. Math is beautiful enough that history will eventually correct it who did what: Zehfuss showed the actual factorization like what we call Kronecker product Or contributions of Cardano and his predecessor en.wikipedia.org/wiki/Cubic_equ… Feel the prize of discovery in math is beyond competition math and everyone will look silly beefing on undergraduate math.
English
1
1
31
2.6K
Aurko Roy รีทวีตแล้ว
Ashish Vaswani
Ashish Vaswani@ashVaswani·
We are beyond thrilled to share our first flagship models, Rnj-1 base and instruct 8B parameter models. Rnj-1 is the culmination of 10 months of hard work by a phenomenal team, dedicated to advancing American SOTA OSS AI. Lots of wins with Rnj-1. 1. SWE bench performance close to GPT 4o. 2. Tool use outperforming all comparable open source models. 3. Mathematical reasoning (AIME’25) nearly at par with GPT OSS MoE 20B. ….
Essential AI@essential_ai

Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring American open-source at par with the best in the world.

English
104
169
1.8K
605.7K
Aurko Roy รีทวีตแล้ว
Lisan al Gaib
Lisan al Gaib@scaling01·
I believe the future of LLMs is in bidirectional objectives and more complex attention mechanisms the reason: we are data constrained, compute keeps growing and the data constrains also put a limit to the maximum efficiently usable parameter count* - bidirectional objectives help out with the data-constraints and makes use of more FLOPS during training - more complex attention mechanisms make use of extra FLOPS during inference and change the slope of the scaling laws From the "Diffusion Language Models are Super Data Learners" Blog on why Diffusion Language Models perform well in data constrained environments: MLM-U is the most interesting approach I have seen so far for bidirectional training, but autoregressive decoding. It also "exposes the model to a diverse distribution of token orderings": Now pair this with something like 2-simplical attention. jinjieni.notion.site/Diffusion-Lang… arxiv.org/pdf/2406.05183 arxiv.org/abs/2507.02754 *a 100T param LLM doesn't make much sense if you only have 50-100T tokens, + economics, you centralize too much
Lisan al Gaib tweet mediaLisan al Gaib tweet media
English
4
3
32
11.4K