Daniel D'souza 

1.7K posts

Daniel D'souza  banner
Daniel D'souza 

Daniel D'souza 

@mrdanieldsouza

Research Engineer @Cohere_Labs💙 | @UMichECE Alum 〽️ | 🇮🇳✖️🇺🇸 💫"The Universe Works in Mysterious Ways"💫

Ann Arbor, MI Katılım Kasım 2016
1K Takip Edilen991 Takipçiler
Sabitlenmiş Tweet
Daniel D'souza 
Daniel D'souza @mrdanieldsouza·
“Best Paper Award” @ ACL 2024 🪄What an incredible culmination of perseverance to connect and represent languages around the 🗺️! 🪄 🤗 Huge thanks to the @aclmeeting committee for recognizing the massive effort behind Project Aya @CohereForAI 💙 #ACL2024
Daniel D'souza  tweet media
Ahmet Üstün@ahmetustun89

I'm incredibly proud that Aya received #ACL2024 Best Paper Award 🥹. Huge congratulations to the Aya team and @CohereForAI community who make this possible by for extending frontiers of LLMs to multilingual, building Aya Model and Aya Dataset 🌿🌏

English
4
9
51
9K
Daniel D'souza  retweetledi
MichiganAI
MichiganAI@michigan_AI·
We’re excited to announce our next #AI Seminar! Marzieh Fadaee @mziizm (Head of @Cohere_Labs) will join us virtually for a talk on "Data Strategies for Language Models: Insights from Multilingual Research:" 📅 Thursday, April 9 | 1:30 pm ET 🔗cse.engin.umich.edu/event/data-str…
MichiganAI tweet media
English
0
5
19
1.4K
Daniel D'souza  retweetledi
Ammar Khairi
Ammar Khairi@ammar__khairi·
So exciting to see our work 𝑭𝒖𝒔𝒊𝒐𝑵 : Making, not Taking the Best-of-N in action @OpenRouter 🔥 I will be presenting this work from @Cohere_Labs at @iclr_conf 🇧🇷 in 2 weeks, come and find us if you are there !
OpenRouter@OpenRouter

New public experiment: Model Fusion Use multiple models, analyze outputs, and fuse the results for a response that every Deep Research agent preferred to its own, in our testing. No subscription needed at all.

English
3
9
16
3.4K
Daniel D'souza 
Daniel D'souza @mrdanieldsouza·
🚨There is always exciting work that comes from The Expedition at @Cohere_Labs that goes on to become highly-technical blogspots or even full-blown research papers! 🪄 Sign up for this years edition where we work to build BIG ideas using 🤏 "Tiny" Aya 🌎🌍🌏
Cohere Labs@Cohere_Labs

We’re officially kicking off Expedition Tiny Aya, a global, mentor-supported open build challenge built on the Tiny Aya model — and we’d love to see you there.

English
0
2
21
1.8K
Daniel D'souza  retweetledi
Sebastian Raschka
Sebastian Raschka@rasbt·
Tiny Aya reimplementation From Scratch! Have been reading through the technical reports of the recent wave of open-weight LLM releases (more on that soon). Tiny Aya (2 days ago) was a bit under the radar. Looks like a nice, small 3.35B model with strongest multilingual support of that size class. Great for on-device translation tasks. Just did a from-scratch implementation here: github.com/rasbt/LLMs-fro… Architecture-wise, Tiny Aya is a classic decoder-style transformer with a few noteworthy modifications (besides the obvious ones like SwiGLU and Grouped Query Attention): 1. Parallel transformer blocks. A parallel transformer block computes attention and MLP from the same normalized input, then adds both to the residual in one step. I assume this is to reduce serial dependencies inside a layer to improve computational throughput. 2. Sliding window attention. Specifically, it uses a 3:1 local:global ratio similar to Arcee Trinity and Olmo 3. The window size is also 4096. Also, similar to Arcee, the sliding window layers use RoPE whereas the full attention layers use NoPE. 3. LayerNorm. Most architectures moved to RMSNorm as it's computationally a bit cheaper and performs well. Tiny Aya is keeping it more classic with a modified version of LayerNorm (the implementation here is like standard LayerNorm but without shift, i.e., bias, parameter).
Sebastian Raschka tweet media
English
22
160
1.1K
67.6K
Daniel D'souza 
Daniel D'souza @mrdanieldsouza·
@sarahookr I’m loving this journey of discovery. It’s been a long time coming :)
English
0
0
2
101
Sara Hooker
Sara Hooker@sarahookr·
The most consumed biscuit is the entire world. G stands for genius.
Sara Hooker tweet media
English
64
43
884
30.5K
Sara Hooker
Sara Hooker@sarahookr·
Huge congrats to the @Cohere_Labs team and community! Super special to see tiny Aya in the open. Congrats to everyone involved, and the large amount of care required for this type of launch.
Cohere Labs@Cohere_Labs

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

English
6
10
126
8.7K
Daniel D'souza 
Daniel D'souza @mrdanieldsouza·
Extremely proud to present ✨Tiny Aya ✨ Tiny 🤏 but mighty 💪3.35B parameter models, massively multilingual from the ground up 🌎🌍🌏, built with immense care w.r.t language representation🤗 We had a blast building this! 💗 Have at it! 🎆
Cohere Labs@Cohere_Labs

Introducing ✨Tiny Aya✨, a family of massively multilingual small language models built to run where people actually are. Tiny Aya delivers strong multilingual performance in 70+ global languages in a 3.35B parameter model, efficient enough to run locally, even on a phone.

English
0
11
43
5.5K
Sara Hooker
Sara Hooker@sarahookr·
Beginnings are very special. Today is an important day for @adaptionlabs. Today a handful of one-size-fits-all-models are optimized for the average use case. Averages erase the exceptional. Everything intelligent adapts. So should AI.
English
83
84
840
220.5K
Daniel D'souza  retweetledi
Daniel D'souza  retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
🧩In modern LLM development, we often end up with many specialized checkpoints. Merging into one model is attractive, but the results depend a lot on the selected merge method & order.  Our paper introduces SimMerge: a simple way to choose the merge configuration automatically.
Cohere Labs tweet media
English
2
6
21
17.9K
Sara Hooker
Sara Hooker@sarahookr·
I always ask a trusted group for feedback at the beginning of an endeavor. critical to find 1-2 calibrated skeptics without an agenda. Take their view over time seriously since they can be swayed w evidence. If their view remains the same, you are not making progress.
English
8
3
67
5.3K
Daniel D'souza  retweetledi
Cohere Labs
Cohere Labs@Cohere_Labs·
Sr Research Scientist, Julia Kreutzer: Treasure Hunt paper. 🗺️ This work introduces a method to improve model performance by adding markers to tokens of the pretraining data, enabling real-time targeting of the long tail using training-time markers. youtu.be/K3BUpKag_nA
YouTube video
YouTube
English
1
1
4
1.1K
Arnav Gupta
Arnav Gupta@championswimmer·
Went to the famous Hafiz Mustafa 1864 baklava place in Istanbul and had the Humbara. By far one of the best desserts I've had in recent memory. The clotted cream is out of the world. 10/10 highly recommend. They have branches in London and Dubai too.
Arnav Gupta tweet mediaArnav Gupta tweet media
English
31
19
432
104.5K
Daniel D'souza  retweetledi
Sara Hooker
Sara Hooker@sarahookr·
Careers are long. My first director gave me something that has guided my entire career: treat people with respect because you will meet them again and again. Interactions are rarely one off, and what matters in the long run is that you are fair and have integrity.
English
56
1.2K
6.6K
455.8K
Daniel D'souza  retweetledi
SAIL Media
SAIL Media@readsail·
Doing More with Less: "Treasure Hunt" & The Future of Multilingual AI ft @mrdanieldsouza
SAIL Media tweet media
English
1
2
10
497
Daniel D'souza 
Daniel D'souza @mrdanieldsouza·
Such a fun chat with @readsail at #NeurIPS2025 ! 🤗 Got to chat about our recent work "TreasureHunt" (arxiv.org/abs/2506.14702…) and play a game of Guess Who? 👀 Kudos to @natolambert for setting this in motion! 🔥
SAIL Media@readsail

"AI should be built for the world, not just for the people who speak English." 🌍 @mrdanieldsouza from @Cohere_Labs discusses the critical need for multilingual models at #NeurIPS2025. Unlocking knowledge across languages is the next frontier. 🚀 Full video now on Youtube.

English
1
2
14
2.9K
Daniel D'souza  retweetledi
Convai_rg
Convai_rg@convAI2024·
📢 Join the final Conversational AI Reading Group meeting of 2025! 📅 Thursday, Dec 18th | 11 AM - 12 EST 🎙 Speaker: Daniel D'souza (@mrdanieldsouza) - @cohere Labs 📖 Topic: "Data as Leverage: Improving Foundation Models Beyond Scaling." 🔗 Details: poonehmousavi.github.io/rg.html
English
0
2
2
359