Rui Li

6 posts

Rui Li

Rui Li

@rui__li

First-year PhD student in CS at @stanford. Multimodal AI, Foundation Models, AI for Science

Stanford, CA Katılım Eylül 2023
68 Takip Edilen23 Takipçiler
Rui Li retweetledi
Suning Huang
Suning Huang@suning_huang·
🤖Low-data post-training can teach a VLA policy a new robot skill. But it also makes it too attached to the training demos. We call this lock-in🔒: the policy can execute the post-training task, yet fails to respond to seemingly obvious prompt changes. DeLock preserves steerability using only the policy’s own pretrained knowledge. No extra supervision needed!🚀🚀🚀 #Robotics #AI #EmbodiedAI #VLA
English
5
44
175
29K
Rui Li retweetledi
Ken Liu
Ken Liu@kenziyuliu·
Sharing a super simple, user-owned memory module we've been playing around: nanomem The basic idea is to treat memory as a pure intelligence problem: ingestion, structuring, and (selective) retrieval are all just LLM calls & agent loops on a on-device markdown file tree. Each file lists a set of facts w/ metadata (timestamp, confidence, source, etc.); no embeddings/RAG/training of any kind. For example: - `nanomem add ` starts an agent loop to walk the tree, read relevant files, and edit. - `nanomem retrieve ` walks the tree and returns a single summary string (possibly assembled from many subtrees) related to the query. What’s nice about this approach is that the memory system is, by construction: 1. partitionable (human/agents can easily separate `hobbies/snowboard.md` from `tax/residency.md` for data minimization + relevance) 2. portable and user-owned (it’s just text files) 3. interpretable (you know exactly what’s written and you can manually edit) 4. forward-compatible (future models can read memory files just the same, and memory quality/speed improves as models get better) 5. modularized (you can optimize ingestion/retrieval/compaction prompts separately) Privacy & utility. I'm most excited about the ability to partition + selectively disclose memory at inference-time. Selective disclosure helps with both privacy (principle of least privilege & “need-to-know”) and utility (as too much context for a query can harm answer quality). Composability. An inference-time memory module means: (1) you can run such a module with confidential inference (LLMs on TEEs) for provable privacy, and (2) you can selectively disclose context over unlinkable inference of remote models (demo below). We built nanomem as part of the Open Anonymity project (openanoymity.ai), but it’s meant to be a standalone module for humans and agents (e.g., you can write a SKILL for using the CLI tool). Still polishing the rough edges! - GitHub (MIT): github.com/OpenAnonymity/… - Blog: openanonymity.ai/blog/nanomem/ - Beta implementation in chat client soon: chat.openanonymity.ai Work done with amazing project co-leads @amelia_kuang @cocozxu @erikchi !!
English
7
44
301
70.2K
Rui Li retweetledi
Ken Liu
Ken Liu@kenziyuliu·
Can we build a blind, *unlinkable inference* layer where ChatGPT/Claude/Gemini can't tell which call came from which users, like a “VPN for AI inference”? Yes! Blog post below + we built it into open source infra/chat app and served >15k prompts at Stanford so far. How it helps with AI user privacy: # The AI user privacy problem If you ask AI to analyze your ChatGPT history today, it’s surprisingly easy to infer your demographics, health, immigration status, and political beliefs. Every prompt we send accumulates into an (identity-linked) profile that the AI lab controls completely and indefinitely. At a minimum this is a goldmine for ads (as we know now). A bigger issue is the concentration of power: AI labs can easily become (or asked to become) a Cambridge Analytica, whistleblow your immigration status, or work with health insurance to adjust your premium if they so choose. This is a uniquely worse problem than search engines because your average query is now more revealing (not just keywords), interactive, and intelligence is now cheap. Despite this, most of us still want these remote models; they’re just too good and convenient! (this is aka the "privacy paradox".) # Unlinkable inference as a user privacy architecture The idea of unlinkable inference is to add privacy while preserving access to the remote models controlled by someone else. A “privacy wrapper” or “VPN for AI inference”, so to speak. Concretely, it’s a blind inference middle layer that: (1) consists of decentralized proxies that anyone can operate; (2) blindly authenticates requests (via blind signatures / RFC9474,9578) so requests are provably sandboxed from each other and from user identity; (3) relays prompts over randomly chosen proxies that don’t see or log traffic (via client-side ephemeral keys or hosting in TEEs); and (4) the provider simply sees a mixed pool of anonymous prompts from the proxies. No state, pseudonyms, or linkable metadata. If you squint, an unlinkable inference layer is essentially a vendor for per-request, anonymous, ephemeral AI access credentials (for users or agents alike). It partitions your context so that user tracking is drastically harder. Obviously, unlinkability isn’t a silver bullet: the prompt itself still goes to the remote model and can leak privacy (so don't use our chat app for a therapy session!). It aims to combat *longitudinal tracking* as a major threat to user privacy, and its statistical power increases quickly by mixing more users and requests. Unlinkability can be applied at any granularity. For an AI chat app, you can unlinkably request a fresh ephemeral key for every session so tracking is virtually impossible. # The Open Anonymity Project We started this project with the belief that intelligence should be a truly public utility. Like water and electricity, providers should be compensated by usage, not who you are or what you do with it. We think unlinkable inference is a first step towards this “intelligence neutrality”. # Try it out! It’s quite practical - Chat app “oa-chat”: chat.openanonymity.ai (<20 seconds to get going) - Blog post that should be a fun read: openanonymity.ai/blog/unlinkabl… - Project page: openanonymity.ai - GitHub: github.com/OpenAnonymity
Ken Liu tweet media
English
62
157
834
380.4K
Rui Li
Rui Li@rui__li·
Super excited to finally share FineVision — 24M open-source multimodal samples built from 200+ datasets! Open data can scale — and it’s all reproducible and available to everyone 🚀 Super excited to be a part of this project and huge thanks to the amazing team! 🙌
Andi Marafioti@andimarafioti

🚨 New paper out! “FineVision: Open Data Is All You Need” 🥳 We unified 200+ data sources into 24M samples. That’s 17.3M images and 9.5B answer tokens, the largest open VLM dataset ever released. All fully documented, reproducible, and available for everyone. And there's more! 🎢

English
1
0
3
139
Rui Li retweetledi
Xiaohan Wang
Xiaohan Wang@XiaohanWang96·
🚀 Introducing Temporal Preference Optimization (TPO) – a video-centric post-training framework that enhances temporal grounding in long-form videos for Video-LMMs! 🎥✨ 🔍 Key Highlights: ✅ Self-improvement via preference learning – Models learn to differentiate well-grounded from inaccurate responses without manual annotations. ✅ Multi-level temporal grounding – Effectively captures both localized segments and comprehensive video sequences. ✅ Efficient and scalable – Utilizes only 10k video QA pairs for post-training and can scale seamlessly to larger datasets. ✅ Proven performance – TPO enhances two state-of-the-art Video-LMMs on LongVideoBench, MLVU, and Video-MME. 🔥 LLaVA-Video-TPO is now the top-performing 7B model on Video-MME, highlighting TPO's potential in advancing temporal reasoning. This work was co-led with the talented undergraduate @rui__li, alongside fantastic collaborators @Zhang_Yu_hui and Zeyu Wang, and advised by @yeung_levy!
Xiaohan Wang tweet media
English
1
10
28
3.1K