Jiawei Gu

131 posts

Jiawei Gu banner
Jiawei Gu

Jiawei Gu

@Kuvvius

Katılım Mart 2023
286 Takip Edilen325 Takipçiler
Sabitlenmiş Tweet
Jiawei Gu
Jiawei Gu@Kuvvius·
🚨Sensational title alert: we may have cracked the code to true multimodal reasoning. Meet ThinkMorph — thinking in modalities, not just with them. And what we found was... unexpected. 👀 Emergent intelligence, strong gains, and …🫣 🧵 arxiv.org/abs/2510.27492 (1/16)
Jiawei Gu tweet media
English
27
65
316
68.6K
Jiawei Gu retweetledi
Peter Tong
Peter Tong@TongPetersb·
Train Beyond Language. We bet on the visual world as the critical next step alongside and beyond language modeling. So, we studied building foundation models from scratch with vision. We share our exploration: visual representations, data, world modeling, architecture, and scaling behavior! [1/9]
Peter Tong tweet media
English
34
222
1.1K
208.1K
AIGCLINK
AIGCLINK@aigclink·
过去30天,128家基于openclaw的初创公司,总计产生了28万美元的真实营收,平均每家月收入约2200刀 其中排名第一的月营收5万刀 TrustMRR上目前收录了128家,还在不断增长中 当下这128家产品的商业模式还比较集中,这其中80%的公司都在做降低OpenClaw使用门槛的活儿,做应用层的只有3-5家,目前商业场景挖掘的还不够深 #Openclaw #openclaw赚钱 #AIagent
中文
53
228
1.1K
212.5K
Jiawei Gu retweetledi
Manling Li
Manling Li@ManlingLi_·
📍Theory of Space (accepted at #ICLR2026) Theory of Mind → hidden mental states Theory of Space → hidden spatial beliefs from passive observers “What do I know?” to active explorers “What don’t I know, and how do I reduce that uncertainty?” Theory of Space is to evaluate if foundation models can actively construct, revise, and exploit internal spatial beliefs. We quantify Active-Passive Gap. Not just measure task accuracy, but how much uncertainty is reduced per step, and how many steps are needed in total for agents to build stable spatial beliefs. Exploration should prioritize information gain and reduce uncertainty per step. Instead, we observe LLMs/VLMs explore redundantly with stalled belief updates. Key findings: 1. Active agents perform worse than rule based programs 2. Cognitive Map Failures & Belief Drift (beliefs about previously observed objects degrades over time; new updates corrupt earlier correct perceptions) 3. Poor Vision Identification & Belief Inertia in Belief Revision Website: theory-of-space.github.io Code: github.com/mll-lab-nu/The… Data: huggingface.co/datasets/MLL-L… Theory of Space is a joint effort of @NorthwesternEng, @StanfordAILab, @uwcse, @Cornell_CS. Led by the amazing @WilliamZhangNU, jointly done with @zihanhuang66, @YueYuew8314, @JieyuZhang20, @XLe41402, @wzihanw, @qineng_wang, @keshigeyan, @RuohanZhang76, @YejinChoinka, @RanjayKrishna, @jiajunwu_cs, @drfeifei
English
7
93
492
51.3K
Jiawei Gu
Jiawei Gu@Kuvvius·
💥Exciting to see Seed 2.0 evaluated on our EMMA multimodal reasoning benchmark! Frontier-level results. Congrats to the team. 👏 emma-benchmark.github.io
Jiawei Gu tweet media
English
0
0
8
621
Wei Liu
Wei Liu@WeiLiu99·
Happy to share that LASER has been accepted to ICLR 2026. Also, huge congrats on the success of Kimi 2.5! It’s thrilling to see them achieve such impressive results in efficiency enhancement via RL. Their approach shares a similar philosophy with our LASER-D: using an adaptive, difficulty-aware mechanism. It’s fascinating to see this logic align so well in a more online setting (w/ rollout info). Great validation that this is a promising path for efficient reasoning w/o compromising effectiveness!
Wei Liu tweet media
Wei Liu@WeiLiu99

“What is the answer of 1 + 1?” Large Reasoning Models (LRMs) may generate 1500+ tokens just to answer this trivial question. Too much thinking 🤯 Can LRMs be both Faster AND Stronger? Yes. Introducing LASER💥: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping We propose LASER and its adaptive variants LASER-D / LASER-DE: → +6.1 accuracy on AIME24 → –63% token usage 🔧 What we introduce: A unified framework that connects truncation, previous length rewards under one view A novel Length-bAsed StEp Reward (LASER) that softly encourages conciseness LASER-D: Adapts target lengths based on question difficulty + training dynamics LASER-DE: Encourages exploration on incorrect attempts 🔥 Unlike prior methods that trade efficiency for accuracy, LASER-D/E achieve Pareto-optimality: ✔️ Higher accuracy ✔️ Shorter outputs ✔️ Robust across model sizes (1.5B → 32B) ✔️ Strong generalization (GPQA, LSAT, MMLU) Example: Original LRM needs 1490 tokens to answer “1 + 1” (with many self-reflections and finger counting 🤦) LASER-D model? ✅ Answers directly in 76 tokens ✅ No lost reasoning ability ✅ More concise and intelligent Check it out: 📄 Paper: huggingface.co/papers/2505.15… 💻 Code & Models: github.com/hkust-nlp/Laser

English
2
8
57
5.5K
Jiawei Gu retweetledi
Yejin Choi
Yejin Choi@YejinChoinka·
Excited to share TTT-Discover (Test-Time Training for Discovery)—seeking new discoveries on long-standing problems: ✅Erdős min overlap, ✅denoising for single-cell analysis, and ✅GPU kernels! The key insight: Scientific discovery requires learning from a long sequence of trials and errors. Current approaches like AlphaEvolve operate with a frozen policy 🧊—only prompts evolve at test time. TTT-Discover instead lets the policy itself adapt 🚀, laser-focusing on one extremely hard problem for as long as it takes. Test-Time Training (TTT): a new frontier for scaling intelligence 🔥
Mert Yuksekgonul@mertyuksekgonul

How to get AI to make discoveries on open scientific problems? Most methods just improve the prompt with more attempts. But the AI itself doesn't improve. With test-time training, AI can continue to learn on the problem it’s trying to solve: test-time-training.github.io/discover.pdf

English
10
30
288
44.8K
Jiawei Gu
Jiawei Gu@Kuvvius·
⛔️ Can MLLMs truly learn WHEN and HOW to use tools? (🛠AdaReasoner says: yes!! Like… actually decide: - “Should I call a tool right now?” - “Which one?” - “How many times?” What happened surprised us: a 7B model beats GPT-5 on visual tool-reasoning—and shows adaptive behaviors we never programmed. (1/17)🧵👇 📄 arxiv.org/abs/2601.18631 🌐 adareasoner.github.io
GIF
English
1
5
15
6.4K