Yaoyao(Freax) Qian

86 posts

Yaoyao(Freax) Qian banner
Yaoyao(Freax) Qian

Yaoyao(Freax) Qian

@RubyFreax

🔬Build https://t.co/ItWrF2h1eK | 🐱🐶🐯🦁🦋

Boston, MA เข้าร่วม Ocak 2024
1.6K กำลังติดตาม383 ผู้ติดตาม
Yaoyao(Freax) Qian รีทวีตแล้ว
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
Glad to have been part of this! There's genuinely a lot of interesting stuff in the data and tools if you're into video reasoning video-reason.com
Hokin Deng@DengHokin

#VideoReason We are open-sourcing the entire VBVR stack to speed-up the arrival of video reasoning as the next fundamental paradigm of intelligence - 150+ synthetic generators - 1 million training clips - Cloud-scale data factory - Unified EvalKit - 100 rule-based evaluators - Strong baseline model Checkout at video-reason.com

English
1
3
7
673
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
Off to San Diego for #NeurIPS2025 ! 🌴 I’ll be there the whole week (Dec 2–7). If you’re around and want to talk research, I’m always up for it. DM me if you want to meet up!
English
0
0
2
299
Yaoyao(Freax) Qian รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...
vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English
564
1.6K
13.4K
3.3M
Yaoyao(Freax) Qian รีทวีตแล้ว
Guohao Li 🐫
Guohao Li 🐫@guohao_li·
Introducing Eigent — the first multi-agent workforce on your desktop. Eigent is a team of AI agents collaborating to complete complex tasks in parallel. It is your long-term working partner with fullly customizable workers and MCPs. Public beta available to download for MacOS, Windows. 100% open-source on Github. Comment for 500 extra credits.
English
143
132
686
220.5K
Yaoyao(Freax) Qian รีทวีตแล้ว
Haibo Zhao
Haibo Zhao@ZhaoHaibo47588·
Excited to share our #ICML2025 paper, Hierarchical Equivariant Policy via Frame Transfer. Our Frame Transfer interface imposes high-level decision as a coordinate frame change in the low-level, boosting sim performance by 20%+ and enabling complex manipulation with 30 demos.
English
2
12
46
4.9K
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
Owen will be presenting our poster for the paper Hierarchical Equivariant Policy via Frame Transfer at ICML Today (see lnkd.in/e-7p9Viq for details). If you are interested in equivariance and/or robotic manipulation please stop by!
Yaoyao(Freax) Qian tweet media
English
0
0
1
275
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
🤣First time at RSS! Happy to meet up and chat!
Yaoyao(Freax) Qian tweet media
English
0
0
2
244
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
🥳Visual Tree Search of Web Agent has been accepted!
Danqing (Rex) Zhang@Danqing_Z

🎉 Exciting News! We're thrilled to announce that our paper "Visual Tree Search of Web Agent" has been accepted to ECML-PKDD 2025, one of the premier European conferences in machine learning and data science! This breakthrough work comes from our talented PathOnAI.org community members, with another web agent paper in the pipeline. What is VisualTreeSearch?It's a fully-deployed system that makes web agent decision-making transparent and interpretable. For the first time, researchers and practitioners can observe how AI agents navigate and make decisions on the web in real-time. Key innovations include: ⚡ Ultra-fast API-based state reset (50s → 2s) ☁️ Scalable cloud infrastructure with WebSocket + ECS 🌳 Interactive tree visualization with live browser execution 🧠 Support for advanced algorithms including LATS (Language Agent Tree Search) This open-source system bridges the critical gap between research and real-world deployment, providing essential infrastructure for debugging web agents, analyzing search strategies, and prototyping new planning algorithms. Explore the project: 🔗 Project details: pathonai.org/projects/visua… 🎮 Live demo: visual-tree-search.pathonai.org💻 GitHub: github.com/PathOnAIOrg/Vi… #MachineLearning #AI #WebAgents #OpenSource #Research #ECMLPKDD2025

English
0
0
4
417
Yaoyao(Freax) Qian รีทวีตแล้ว
Infini-AI-Lab
Infini-AI-Lab@InfiniAILab·
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n
GIF
English
6
76
221
120.5K
Yaoyao(Freax) Qian รีทวีตแล้ว
Songlin Yang
Songlin Yang@SonglinYang4·
📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks arxiv.org/abs/2505.16381
English
9
86
546
76.6K
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
Proud to be part of this open-source effort after joining PathOnAI! 🌱 We hope this helps push web agent research toward more robust, interpretable, and deployable systems.
English
0
0
0
119
Yaoyao(Freax) Qian
Yaoyao(Freax) Qian@RubyFreax·
🚀 Excited to share VisualTreeSearch, my first project + upcoming paper with the open-source research group @PathOnAI ! It is a fully-deployed system for understanding test-time tree search in web agents, now open-source & demo-ready. 👇
English
1
2
6
7.1K