Muqeeth

33 posts

Muqeeth banner
Muqeeth

Muqeeth

@Muqeeth10

Interested in AI for social good. Grad Student @Mila_Quebec, Former RE @MITIBMLab | MS @unccs | RA @iitdelhi | BTech @iitmadras

Montréal, Québec Katılım Mayıs 2017
424 Takip Edilen163 Takipçiler
Muqeeth retweetledi
Cooperative AI Foundation
The Cooperative AI Summer School 2026 'Expression of interest' applications are now open! If you're an early-career professional studying or working in cooperative AI, apply to join us in Canada this August for an exciting intensive programme.
English
2
14
56
15.1K
Muqeeth retweetledi
Kawin Ethayarajh
Kawin Ethayarajh@ethayarajh·
AI is changing economics, and --- as we just saw in Dwarkesh's interview with Dario --- AI researchers need to start thinking about economics too! The Center for Applied AI at UChicago will be hosting an AI & Economics Summer Institute to explore exactly this. We will bring together leading researchers with advanced graduate students in economics/AI/ML/NLP for an in-person program between Aug 6 - 11.
Kawin Ethayarajh tweet media
English
6
45
203
36.2K
Muqeeth
Muqeeth@Muqeeth10·
@dvnxmvl_hdf5 As the game is played repeatedly, agent can display reciprocity across rounds : cooperate when other player cooperates and retaliate when the other player defects last round. Since the values of items are public in this specific game, it is possible to do so.
English
0
0
2
336
Dane Malenfant
Dane Malenfant@dvnxmvl_hdf5·
@Muqeeth10 How does this split-no-comm game variant “… support reciprocity without the need for communication” if it is a textual environment?
English
1
0
1
56
Muqeeth
Muqeeth@Muqeeth10·
New preprint! Learning Robust Social Strategies with Large Language Models. We apply multi-agent RL finetuning to train LLMs that achieve cooperative and non-exploitable behavior in social dilemmas for the first time. 📄 arxiv.org/abs/2511.19405 🧵 ⬇️ (1/8)
Muqeeth tweet media
English
1
14
21
1.6K
Muqeeth
Muqeeth@Muqeeth10·
AdAlign agents are also robust when facing RL agents trained specifically to exploit them, while GPT-5 nano is exploitable in the same setup. The RL agent ends up cooperating with AdAlign’s tit for tat style policy, since that is its best response. (7/8)
Muqeeth tweet mediaMuqeeth tweet media
English
1
0
5
214
Muqeeth retweetledi
Anirudh Buvanesh
Anirudh Buvanesh@AnirudhBuvanesh·
Zero rewards after tons of RL training? 😞 Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. 🔓📈To know more checkout our blog post here: spiffy-airbus-472.notion.site/What-Can-You-D…. Keep reading 🧵(1/n)
English
2
30
106
14K
Muqeeth
Muqeeth@Muqeeth10·
@esha_hq Good one. I do it after learning from a movie I watched in childhood
English
0
0
2
47
Esha
Esha@esha_hq·
whenever i see a car crash or an ambulance/fire truck pass by, i make it a practice to say a small prayer and consciously think of them. growing up, my parents made this a thing in our house maybe because they’d been in a head on collision before. it doesn’t matter if they’re a stranger, i’ve been realizing that there’s nothing strange about the fact that all of our lives could completely change overnight. in that way, even if we don’t know them, we’re deeply connected. and sure my thoughts may not change anything, but i believe positive energy compounds. plus sirens are possibly the loudest reminders to stay human - if we are desensitized to them (which is easy in a big city, headphones in) what else are we tuning out?
English
15
1
134
8.1K
Muqeeth retweetledi
Prateek Yadav
Prateek Yadav@prateeky2806·
We just released our survey on "Model MoErging", But what is MoErging?🤔Read on! Imagine a world where fine-tuned models, each specialized in a specific domain, can collaborate and "compose/remix" their skills using some routing mechanism to tackle new tasks and queries! 🧵👇 co first-author @colinraffel 📰: arxiv.org/abs/2408.07057
Prateek Yadav tweet media
English
5
44
217
21K
Muqeeth
Muqeeth@Muqeeth10·
@sourab_m @Tim_Dettmers Thanks for sharing your work. IIUC, the approach in your paper is similar to the Expert Ensemble, which averages expert outputs by activating all experts. SMEAR achieves comparable performance while being significantly cheap by activating just one merged expert per example.
English
0
0
1
124
Muqeeth
Muqeeth@Muqeeth10·
Introducing Soft Merging of Experts with Adaptive Routing (SMEAR) for gradient-based training of mixture-of-experts models. SMEAR matches or outperforms prior routing methods without increasing costs or relying on task metadata. 📄  arxiv.org/abs/2306.03745 🧵 ⬇️ (1/7)
Muqeeth tweet media
English
3
40
170
35.8K
Muqeeth
Muqeeth@Muqeeth10·
@KhanovMax That's correct! Having homogeneous experts is a simpler and more common approach. :)
English
0
0
1
149
Max Khanov
Max Khanov@KhanovMax·
@Muqeeth10 This is so incredibly clever!! Though this means all the experts have to be identical right?
English
1
0
0
22
Muqeeth
Muqeeth@Muqeeth10·
@kleptid Therefore, the peak memory cost arises from the inner activations num_tokens * hidden_dim, rather than the merged experts, and is same as other methods. Token-level routing with SMEAR is mathematically equivalent to ensembles. Please refer our paper for discussion on this topic.
English
0
0
0
57
Muqeeth
Muqeeth@Muqeeth10·
@kleptid In SMEAR, example-level rating is used, with memory cost of merged expert: hidden_dim . expert_ffn_dim. Additionally, the expert_ffn_dim is an order of magnitude smaller than hidden_dim due to our use of parameter-efficient modules as experts.
English
1
0
0
35