Saurabh Dash

546 posts

Saurabh Dash

Saurabh Dash

@TheyCallMeMr_

Bottomless pit supervisor. ML @CohereAI , PhD Student @GeorgiaTech. Previously @Apple, @IITkgp. https://t.co/yZLkUsiZ7P. Opinions expressed are my own

Atlanta, GA, United States Katılım Eylül 2016
692 Takip Edilen620 Takipçiler
Saurabh Dash
Saurabh Dash@TheyCallMeMr_·
@F1 bring back the 3 digit precision in race interval visuals, you cowards
English
0
0
0
27
Saurabh Dash retweetledi
Dwarak
Dwarak@DwaraknathG·
All this and more! we go over our design decisions and the lessons learned along the way! Please do come hang out if you are at GTC. A huge thank you to all the team members for their hard work! nvidia.com/gtc/session-ca…
English
0
2
7
754
Saurabh Dash retweetledi
Dwarak
Dwarak@DwaraknathG·
Hey all, I will be at GTC next week talking about all the work my team and I did on large-scale MoE training in JAX on GPUs! We decided early on to have a fully dropless training stack to avoid token dropping. (1/7)
English
2
11
103
14.8K
Saurabh Dash
Saurabh Dash@TheyCallMeMr_·
My ICML review pile in a nutshell: This is the best thing invented since sliced bread. Look at how good it's on CIFAR-10.
English
3
0
85
11.4K
Awni Hannun
Awni Hannun@awnihannun·
Today is my last day at Apple. Building MLX with our amazing team and community has been an absolute pleasure. It's still early days for AI on Apple silicon. Apple makes the best consumer hardware on the planet. There's so much potential for it to be the leading platform for AI. And I'm confident MLX will continue to have a big role in that. To the future: MLX remains in the exceptionally capable hands of our team including @angeloskath, @zcbenz, @DiganiJagrit, @NasFilippova, @trebolloc (and others not on X). Follow them or @shshnkp for future updates.
Awni Hannun tweet media
English
260
94
2.2K
396.1K
Saurabh Dash retweetledi
Matthew Leavitt
Matthew Leavitt@leavittron·
@RicardoMonti9 @KaleighMentzer @agcrnz I'm also quite pleased that @Cohere_Labs released Tiny Aya the day before we released ÜberWeb, and we were able to evaluate it and include it in our report. The whole Aya project has been a big inspiration for us. I officially declare it Multilingual Release Week!!
English
3
2
23
1.1K
Saurabh Dash retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Tiny Aya reimplementation From Scratch! Have been reading through the technical reports of the recent wave of open-weight LLM releases (more on that soon). Tiny Aya (2 days ago) was a bit under the radar. Looks like a nice, small 3.35B model with strongest multilingual support of that size class. Great for on-device translation tasks. Just did a from-scratch implementation here: github.com/rasbt/LLMs-fro… Architecture-wise, Tiny Aya is a classic decoder-style transformer with a few noteworthy modifications (besides the obvious ones like SwiGLU and Grouped Query Attention): 1. Parallel transformer blocks. A parallel transformer block computes attention and MLP from the same normalized input, then adds both to the residual in one step. I assume this is to reduce serial dependencies inside a layer to improve computational throughput. 2. Sliding window attention. Specifically, it uses a 3:1 local:global ratio similar to Arcee Trinity and Olmo 3. The window size is also 4096. Also, similar to Arcee, the sliding window layers use RoPE whereas the full attention layers use NoPE. 3. LayerNorm. Most architectures moved to RMSNorm as it's computationally a bit cheaper and performs well. Tiny Aya is keeping it more classic with a modified version of LayerNorm (the implementation here is like standard LayerNorm but without shift, i.e., bias, parameter).
Sebastian Raschka tweet media
English
23
166
1.2K
67.3K
Saurabh Dash retweetledi
Asghar Ghorbani
Asghar Ghorbani@ghorbani_asghar·
@Cohere_Labs Tiny Aya is the true multilingual on-device model we've been waiting for. 70+ global languages in a 3.35B parameter model 🤯
Asghar Ghorbani tweet mediaAsghar Ghorbani tweet mediaAsghar Ghorbani tweet mediaAsghar Ghorbani tweet media
English
1
9
30
3K
Saurabh Dash retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Mobile access unlocks the real potential of open models. Thanks to @pocketpal_ai and especially @ghorbani_asghar for helping us bring Tiny Aya to mobile! 📱 Their expertise made it possible to deliver the most capable multilingual model at this scale directly to people's phones.
Cohere Labs tweet media
English
3
4
29
1.4K
Saurabh Dash retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.
English
30
158
859
185.2K
aakanksha
aakanksha@____aakanksha·
@sarahookr bukhara for north indian, carnatic cafe for south indian and chaat + rasmalai from haldiram’s! friends rave about gulati’s for butter chicken etc; oh and the tender coconut ice cream at natural’s 🤤 (should also try indo-chinese / momos in delhi - maybe at berco’s!) cc @mziizm :)
English
2
1
34
2.8K
Sara Hooker
Sara Hooker@sarahookr·
Ok. It is time. I have time on Saturday and Sunday night to explore New Delhi ahead of the summit. Let’s go Delhi food recommendations. 🔥
English
142
4
414
57K
Saurabh Dash
Saurabh Dash@TheyCallMeMr_·
@sarahookr Definitely try Bukhara! Especially the kebabs and the Dal Bukhara.
English
0
0
0
66
Saurabh Dash
Saurabh Dash@TheyCallMeMr_·
Calling it compute rich/poor and not Jensen’s Inequality is a missed opportunity
English
1
8
38
6.9K
Xuezhe Ma (Max)
Xuezhe Ma (Max)@MaxMa1987·
After about 2 years, we are proud to release Gecko, an efficient architecture that improves upon Megalodon, with capability of efficiently and inherently processing sequences with unlimited context length. One of the most important idea in Gecko is Adaptive Working Memory(AWM), implemented using a linear attention mechanism with a position-aware online softmax activation. Notably, AWM globally compresses information into memory, rather than discarding historical information through forgetting. In a controlled head-to-head comparison with Llama2 and Megalodon, Gecko achieves better performance in the scale of 7B and 2T training tokens. Gecko achieves 1.68 training loss, vs. 1.67 of Llama2-13B, with half number of parameters on 2T tokens. Paper: arxiv.org/abs/2601.06463 Code: github.com/XuezheMax/geck…
Xuezhe Ma (Max) tweet mediaXuezhe Ma (Max) tweet media
English
3
23
143
20.7K
Sara Hooker
Sara Hooker@sarahookr·
I'm really looking forward to also getting to know the AI ecosystem in Delhi (for the summit) + Bangalore. If you have recommendations of initiatives I should visit or people I should meet while I'm there, send them my way.
English
24
1
77
5.1K
Sara Hooker
Sara Hooker@sarahookr·
My first trip to India is next month. I'm honored to be attending the India-AI Impact Summit 2026. Truly very meaningful given our commitment @adaptionlabs to building global technology and ensuring language coverage.
English
49
15
551
27.3K