Shengyi Qian

109 posts

Shengyi Qian banner
Shengyi Qian

Shengyi Qian

@JasonQSY

Research Scientist at Meta FAIR | Computer Vision, NLP, Robotics | CS PhD @UMich

Seattle, WA Katılım Mayıs 2016
465 Takip Edilen840 Takipçiler
Sabitlenmiş Tweet
Sasha Sax
Sasha Sax@iamsashasax·
In a couple weeks I'm joining @AnthropicAI to work on pretraining after nearly 3 years at FAIR, developing post-training flywheels for physical intelligence (like SAM 3D) I'm stoked to build new capabilities for a model I personally love, with such thoughtful people
English
35
9
643
26.3K
Shengyi Qian retweetledi
DailyPapers
DailyPapers@HuggingPapers·
Beyond Language Modeling FAIR Meta and NYU present a deep dive into native multimodal pretraining. They show RAEs unify visual understanding/generation, vision/language data are complementary, world modeling emerges naturally, and MoE harmonizes vision's higher data hunger—paving the way for truly unified models.
DailyPapers tweet media
English
1
6
37
2.7K
Shengyi Qian retweetledi
John Nguyen
John Nguyen@__JohnNguyen__·
Humans communicate through language and interact with the world through vision, yet most multimodal models are language-first. What happens when we go beyond language? 🤔 Beyond Language Modeling: a deep dive into the design space of truly native multimodal models Paper: arxiv.org/abs/2603.03276 Project: beyond-llms.github.io
John Nguyen tweet media
English
10
40
205
38.9K
Shengyi Qian retweetledi
Peter Tong
Peter Tong@TongPetersb·
Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]
Peter Tong tweet media
English
34
220
1.1K
206.2K
Shengyi Qian
Shengyi Qian@JasonQSY·
Excited to share our latest work from Meta Superintelligence Labs! 🚀 We’re moving beyond static AI to agents that actually evolve with you. Our PAHF framework solves "Alignment Drift" through a continuous feedback loop. Check out the paper!
Kaiqu Liang@kaiqu_liang

New Meta Research 🚀 AI agents are powerful, but don’t stay aligned with you over time. When preferences shift, they don’t adapt. You correct them once…they repeat the mistake. 🤦 Introducing PAHF: continual personalization where agents learn from feedback to stay in sync.

English
0
2
13
1.6K
Shengyi Qian
Shengyi Qian@JasonQSY·
We are excited to host the 2nd 3D-LLM / VLA Workshop at CVPR this June! If your research explores the synergy between spatial intelligence, robotics, and language grounding, we invite you to submit your work. We also have an incredible lineup of speakers. Join us!
Yining Hong@yining_hong

LLMs are now learning space, geometry, and how to move. 🤖📐 The 2nd CVPR 3D-LLM VLA Workshop brings together language, 3D perception, and action for embodied intelligence. 📢 Call for Papers is OPEN: #tab-your-consoles" target="_blank" rel="nofollow noopener">openreview.net/group?id=thecv…
🌐 Website: 3d-llm-vla.github.io If your research lives at the intersection of words, worlds, and robots—this one’s for you. #CVPR2026 @CVPR

English
0
2
15
2.3K
Shengyi Qian retweetledi
Liliang Ren
Liliang Ren@liliang_ren·
Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput at 32K generation length with vLLM. 🤯 Model: huggingface.co/microsoft/Phi-… Codebase: github.com/microsoft/Arch… Blog: aka.ms/flashreasoning… Paper: aka.ms/flashreasoning… (1/8)
Liliang Ren tweet media
English
2
71
364
43.6K
Shengyi Qian retweetledi
Wei-Chiu Ma
Wei-Chiu Ma@weichiuma·
Interactable Digital Twins hold great promises. It allows us to train in sim and test in real. But can we go a step further? Can we deploy a robot w/o training? Key idea: simulate the outcome of each action with Digital Twins and use VLM as critic to select the best action.
Chuanruo Ning@TritiumAc

How can robots solve tasks that demand both semantic and physical reasoning, like playing real-world Angry Birds, without tons of data? We introduce Prompting with the Future: an MPC framework that fuses a pretrained VLM with an interactive digital twin for grounded, open-world motion planning. 🌐 prompting-with-the-future.github.io

English
1
6
60
5.2K
Shengyi Qian retweetledi
Voxel51
Voxel51@Voxel51·
One of the biggest bottlenecks in deploying visual AI and computer vision is annotation, which can be both costly and time-consuming. Today, we’re introducing Verified Auto Labeling, a new approach to AI-assisted annotation that achieves up to 95% of human-level performance while cutting labeling costs by up to 100,000x and time by 5,000x. Read the full paper: arxiv.org/abs/2506.02359
English
3
200
114
12.1K
Shengyi Qian
Shengyi Qian@JasonQSY·
3️⃣ 3D-GRAND: Towards Better Grounding & Less Hallucination for 3D-LLMs. A large-scale dataset & models for improved 3D visual grounding. Project: 3d-grand.github.io #3DLLM #AI DM me if you're at #CVPR or want to chat about these! Looking forward to it!
English
0
0
3
123
Shengyi Qian
Shengyi Qian@JasonQSY·
Thrilled to be heading to Nashville next week for #CVPR2025! Can't wait to connect with the community & dive into the latest in computer vision.
English
1
0
3
181
Shengyi Qian retweetledi
Furong Huang
Furong Huang@furongh·
🔥 How can we align #LLMs effectively with messy, imbalanced real-world data? #GRPO is great 🤩—simple, strong, and doesn't even need a learned value function. 😥But it struggles when data isn’t evenly balanced across domains. 🕺💃 Enter 🪩 DISCO 🪩: Domain- & Difficulty-Aware #RLHF! 🔗👉: arxiv.org/abs/2505.15074 #NLP #LLM #RLHF #GRPO A 🧵👇
Furong Huang tweet media
English
2
13
66
15.9K
Shengyi Qian
Shengyi Qian@JasonQSY·
Excited to announce the launch of Llama 4, a major leap in open-source AI! As part of the team supporting Llama 4 at FAIR, I’m proud to have contributed to these cutting-edge models.
Ahmad Al-Dahle@Ahmad_Al_Dahle

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4 collection 🦙. Here are some highlights: 📌 The Llama series have been re-designed to use state of the art mixture-of-experts (MoE) architecture and natively trained with multimodality. We’re dropping Llama 4 Scout & Llama 4 Maverick, and previewing Llama 4 Behemoth. 📌 Llama 4 Scout is highest performing small model with 17B activated parameters with 16 experts. It’s crazy fast, natively multimodal, and very smart. It achieves an industry leading 10M+ token context window and can also run on a single GPU! 📌 Llama 4 Maverick is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding – at less than half the active parameters. It offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. It can also run on a single host! 📌 Previewing Llama 4 Behemoth, our most powerful model yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight. A big thanks to all of our launch partners (full list in blog) for helping us bring Llama 4 to developers everywhere including @huggingface, @togethercompute, @SnowflakeDB, @ollama, @databricks and many others👏 This is just the start, we have more models coming and the team is really cooking – look out for Llama 4 Reasoning 😉 A few weeks ago, we celebrated Llama being downloaded over 1 billion times. Llama 4 demonstrates our long-term commitment to open source AI, the entire open source AI community, and our unwavering belief that open systems will produce the best small, mid-size and soon frontier models. Llama would be nothing without the global open source AI community & we are so ready to begin this next chapter with you. 🦙 Read more about the release here: llama.com, and try it in our products today.

English
0
0
17
812