Oscar Mañas

1.1K posts

Oscar Mañas banner
Oscar Mañas

Oscar Mañas

@oscmansan

Research scientist at @AIatMeta, PhD from @Mila_Quebec @UMontrealDIRO. Working on multimodal vision+language generation. Català a Zúric.

Zurich, Switzerland Katılım Eylül 2014
2.4K Takip Edilen1.3K Takipçiler
Oscar Mañas retweetledi
kache
kache@yacineMTB·
you can outsource your thinking but you cannot outsource your understanding
English
238
3.6K
16.2K
2.2M
Oscar Mañas
Oscar Mañas@oscmansan·
How did @claudeai fix the overthinking issue? Asking for a friend
Oscar Mañas tweet media
English
1
0
1
278
Oscar Mañas retweetledi
Artificial Analysis
Artificial Analysis@ArtificialAnlys·
Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads
Artificial Analysis tweet media
English
76
323
2.5K
498.6K
Oscar Mañas retweetledi
Alexandr Wang
Alexandr Wang@alexandr_wang·
1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵
Alexandr Wang tweet media
English
727
1.2K
10.3K
4.5M
Oscar Mañas retweetledi
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
If you're going to ICLR next month and are interested in research at Meta, it's a good idea to come to our event there: events.atmeta.com/iclrnetworking… (I'm unfortunately not going to iclr this year - not sure yet which conference I'll go to)
English
2
3
93
12.8K
Oscar Mañas retweetledi
AK
AK@_akhaliq·
LatentLens Revealing Highly Interpretable Visual Tokens in LLMs paper: huggingface.co/papers/2602.00…
AK tweet media
English
3
22
120
12.5K
Oscar Mañas retweetledi
Benno Krojer
Benno Krojer@benno_krojer·
🚨New paper Are visual tokens going into an LLM interpretable 🤔 Existing methods (e.g. logit lens) and assumptions would lead you to think “not much”... We propose LatentLens and show that most visual tokens are interpretable across *all* layers 💡 Details 🧵
Benno Krojer tweet media
English
3
58
251
57.4K
Oscar Mañas retweetledi
Oscar Mañas
Oscar Mañas@oscmansan·
.@AIatMeta's Breakfast Club @ Zurich
Oscar Mañas tweet media
Zurich, Switzerland 🇨🇭 English
10
0
22
1.4K
Oscar Mañas
Oscar Mañas@oscmansan·
What's this redacted thing on #CVPR review papers?
English
0
0
1
511
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@gdb This is one of the two reasons i hate MacBooks for. I know there's `caffeinate` for this but it's a hack
English
7
0
56
11.1K
Oscar Mañas
Oscar Mañas@oscmansan·
📣 Hiring Research Interns for Meta Superintelligence Labs in Zurich! Work on large-scale generative models (image/video gen, multimodal, world models) with real impact on products used by billions. 📍 Zurich | 🕒 6 months | 🎓 PhD students metacareers.com/profile/job_de…
English
4
27
290
23.7K
Oscar Mañas retweetledi
Benno Krojer
Benno Krojer@benno_krojer·
An interesting observation on the history of vision+language models: There are cyclces of how tightly unified/integrated the two modalities are inside models You would hope that is linear progress towards more unification (after all, the field is moving towards end2end solutions right?) But in reality we haven't really figured out how to unify vision and language --> so we keep going back and forth on how deeply we unify (slide from a recent talk i've been giving about ongoing VL interp work)
Benno Krojer tweet media
English
1
1
4
751
Oscar Mañas
Oscar Mañas@oscmansan·
Happening in ~2 hours! Come say hi :)
Oscar Mañas@oscmansan

🌺 Attending @ICCVConference in Honolulu this week! I'll be presenting our work on multimodal reward-guided decoding. Come check it out on October 21 (morning), poster #122. If you’re around, I’d love to connect and chat about multimodal models and real-time video generation!

English
0
2
9
2K
Oscar Mañas
Oscar Mañas@oscmansan·
🌺 Attending @ICCVConference in Honolulu this week! I'll be presenting our work on multimodal reward-guided decoding. Come check it out on October 21 (morning), poster #122. If you’re around, I’d love to connect and chat about multimodal models and real-time video generation!
Oscar Mañas@oscmansan

I’m happy to share that our paper "Controlling Multimodal LLMs via Reward-guided Decoding" has been accepted to #ICCV2025! 🎉 w/ @proceduralia, @koustuvsinha, @adri_romsor, @michal_drozdzal, and @aagrawalAA 🔗 Read more: arxiv.org/abs/2508.11616 🧵 Here's what we did:

English
0
6
21
3K
Oscar Mañas retweetledi
World Modeling Workshop
World Modeling Workshop@worldmodel_conf·
🚨Announcing the World Modeling Workshop 2026 🚨 📅 When: Feb 4–6, 2026 📍Where: Mila (Montréal) + Online (free) 💡 What: Keynotes, Methods Deep Dive, and Tutorials 🌐 world-model-mila.github.io ✉️ worldmodel.mila@gmail.com 🧵 Details below:
World Modeling Workshop tweet media
English
6
61
258
180.2K