Yaoyao(Freax) Qian

86 posts

Yaoyao(Freax) Qian

@RubyFreax

🔬Build https://t.co/ItWrF2h1eK | 🐱🐶🐯🦁🦋

Boston, MA เข้าร่วม Ocak 2024

1.6K กำลังติดตาม383 ผู้ติดตาม

ทวีตที่ปักหมุด

Yaoyao(Freax) Qian@RubyFreax·5 Eyl

🥰Now is a #CoRL2024 Paper!

Yaoyao(Freax) Qian@RubyFreax

❓ How do you solve grasping problems when your target object is completely out of sight? 🚀 Excited to share our latest research! Check out ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter. 🔗 Site: h-freax.github.io/thinkgrasp_page

English

16K

Yaoyao(Freax) Qian@RubyFreax·14 Mar

Just open-sourced TermHub!😍 A terminal-style academic homepage template built for AI workflows. Turn your CV into Markdown with ChatGPT / Claude, plug it into the template, and deploy. Guide term-hub.vercel.app/guide Repo github.com/H-Freax/TermHub

English

178

Yaoyao(Freax) Qian รีทวีตแล้ว

Hokin Deng@DengHokin·2 Mar

Yes! We are top paper of the week 🚀🚀🚀

DailyPapers@HuggingPapers

Top AI Papers of The Week (Feb 24 - Mar 2) - A Very Big Video Reasoning Suite: 200 tasks, 1M+ video clips for video reasoning research - Does Your Reasoning Model Implicitly Know When to Stop Thinking? Introducing SAGE paradigm - AgentFly: Fine-tuning LLM agents without fine-tuning LLMs - Microsoft rStar2-Agent: 80.6% on AIME24 with just 14B parameters - From Blind Spots to Gains: Diagnostic-driven iterative training for LMMs - VibeVoice: Synthesizing 90-minute multi-speaker conversational speech - Alibaba MobilityBench: Benchmarking real-world route-planning agents - NVIDIA's data engineering strategies for scaling LLM terminal capabilities - VESPO: Variational sequence-level soft policy optimization for stable RL training - Beyond Pass @1: Self-play with variational problem synthesis sustains RLVR Find them below:

English

1.1K

Yaoyao(Freax) Qian@RubyFreax·25 Şub

@DengHokin 👏Amazing leadership on this project!

English

Hokin Deng@DengHokin·25 Şub

@RubyFreax This has been awesome!!!

English

Yaoyao(Freax) Qian@RubyFreax·25 Şub

Glad to have been part of this! There's genuinely a lot of interesting stuff in the data and tools if you're into video reasoning video-reason.com

Hokin Deng@DengHokin

#VideoReason We are open-sourcing the entire VBVR stack to speed-up the arrival of video reasoning as the next fundamental paradigm of intelligence - 150+ synthetic generators - 1 million training clips - Cloud-scale data factory - Unified EvalKit - 100 rule-based evaluators - Strong baseline model Checkout at video-reason.com

English

673

Yaoyao(Freax) Qian@RubyFreax·3 Oca

@SonglinYang4 @MITEECS @thinkymachines 🤩🤩🤩Congrats!!!!!

English

Songlin Yang@SonglinYang4·1 Oca

Life update at the end of 2025: I’ve completed my PhD at @MITEECS and joined @thinkymachines to work on LLM archs

English

1.7K

86K

Yaoyao(Freax) Qian@RubyFreax·2 Ara

Off to San Diego for #NeurIPS2025 ! 🌴 I’ll be there the whole week (Dec 2–7). If you’re around and want to talk research, I’m always up for it. DM me if you want to meet up!

English

299

Yaoyao(Freax) Qian รีทวีตแล้ว

Andrej Karpathy@karpathy·21 Eki

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English

564

1.6K

13.4K

3.3M

Yaoyao(Freax) Qian รีทวีตแล้ว

Yinpei Dai@YinpeiD·14 Ağu

Thanks @_akhaliq for sharing our work! Aim and Grasp! AimBot introduces a new design to leverage visual cues for robots - similar to scope reticles in shooting games. Let's equip your VLA models with low-cost visual augmentation for better manipulation! aimbot-reticle.github.io

AK@_akhaliq

AimBot A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

English

3.8K

Yaoyao(Freax) Qian รีทวีตแล้ว

Guohao Li 🐫@guohao_li·29 Tem

Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, Windows. 100% open-source on Github. Comment for 500 extra credits.

English

143

132

686

220.5K

Yaoyao(Freax) Qian รีทวีตแล้ว

Haibo Zhao@ZhaoHaibo47588·18 Tem

Excited to share our #ICML2025 paper, Hierarchical Equivariant Policy via Frame Transfer. Our Frame Transfer interface imposes high-level decision as a coordinate frame change in the low-level, boosting sim performance by 20%+ and enabling complex manipulation with 30 demos.

English

4.9K

Yaoyao(Freax) Qian@RubyFreax·15 Tem

Owen will be presenting our poster for the paper Hierarchical Equivariant Policy via Frame Transfer at ICML Today (see lnkd.in/e-7p9Viq for details). If you are interested in equivariance and/or robotic manipulation please stop by!

English

275

Yaoyao(Freax) Qian@RubyFreax·22 Haz

inspired talk! #RSS2025

English

830

Yaoyao(Freax) Qian@RubyFreax·22 Haz

🤣First time at RSS! Happy to meet up and chat!

English

244

Yaoyao(Freax) Qian@RubyFreax·18 Haz

🥳Visual Tree Search of Web Agent has been accepted!

Danqing (Rex) Zhang@Danqing_Z

🎉 Exciting News! We're thrilled to announce that our paper "Visual Tree Search of Web Agent" has been accepted to ECML-PKDD 2025, one of the premier European conferences in machine learning and data science! This breakthrough work comes from our talented PathOnAI.org community members, with another web agent paper in the pipeline. What is VisualTreeSearch?It's a fully-deployed system that makes web agent decision-making transparent and interpretable. For the first time, researchers and practitioners can observe how AI agents navigate and make decisions on the web in real-time. Key innovations include: ⚡ Ultra-fast API-based state reset (50s → 2s) ☁️ Scalable cloud infrastructure with WebSocket + ECS 🌳 Interactive tree visualization with live browser execution 🧠 Support for advanced algorithms including LATS (Language Agent Tree Search) This open-source system bridges the critical gap between research and real-world deployment, providing essential infrastructure for debugging web agents, analyzing search strategies, and prototyping new planning algorithms. Explore the project: 🔗 Project details: pathonai.org/projects/visua… 🎮 Live demo: visual-tree-search.pathonai.org💻 GitHub: github.com/PathOnAIOrg/Vi… #MachineLearning #AI #WebAgents #OpenSource #Research #ECMLPKDD2025

English

417

Yaoyao(Freax) Qian รีทวีตแล้ว

Infini-AI-Lab@InfiniAILab·17 Haz

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

GIF

English

221

120.5K

Yaoyao(Freax) Qian รีทวีตแล้ว

Songlin Yang@SonglinYang4·24 May

📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381

English

546

76.6K

Yaoyao(Freax) Qian@RubyFreax·6 May

Proud to be part of this open-source effort after joining PathOnAI! 🌱 We hope this helps push web agent research toward more robust, interpretable, and deployable systems.

English

119

Yaoyao(Freax) Qian@RubyFreax·6 May

📄 Paper coming soon. 🎥 Video + docs in progress. 🧪 Explore the system now: Project: pathonai.org/projects/visua… Demo: visual-tree-search.pathonai.org Code: github.com/PathOnAI/Visua…

English

162

Yaoyao(Freax) Qian@RubyFreax·6 May

🚀 Excited to share VisualTreeSearch, my first project + upcoming paper with the open-source research group @PathOnAI ! It is a fully-deployed system for understanding test-time tree search in web agents, now open-source & demo-ready. 👇

English

7.1K

ค้นพบ

@DengHokin @SonglinYang4 @MITEECS @thinkymachines @_akhaliq @elonmusk @BarackObama @taylorswift13