pat ✈️ CVPR

418 posts

pat ✈️ CVPR banner
pat ✈️ CVPR

pat ✈️ CVPR

@patrickamadeus_

multimodal, grounding, and embodied ai | PhD @mbzuai

Katılım Kasım 2020
708 Takip Edilen375 Takipçiler
Sabitlenmiş Tweet
pat ✈️ CVPR
pat ✈️ CVPR@patrickamadeus_·
Personal update: I am starting my PhD @mbzuai where I look forward to work in multimodal realm (interpretability, modality imbalance, eval & application) to address foundational gaps with @AlhamFikri and co.
pat ✈️ CVPR tweet mediapat ✈️ CVPR tweet mediapat ✈️ CVPR tweet media
English
6
3
144
13K
pat ✈️ CVPR retweetledi
Vals AI
Vals AI@ValsAI·
Pitch us a benchmark or eval technique. We'll fund you to build it. We're opening applications for the Vals Fellowship. 3–6 months working on the hardest open problems in AI evaluation, with the resources to actually solve them. What you get: - Unlimited API credits + budget capacity for GPUs and human data - Vals’ evaluation infrastructure - $1,000–2,500 / week stipend - A network of evals researchers across frontier labs and academia Location: Both remote / in-person in SF applications will be considered
Vals AI tweet media
English
13
23
281
28.6K
pat ✈️ CVPR retweetledi
Yuki Kondo
Yuki Kondo@y_kondo_vision·
同研究室B4の前田君,松ケ谷君らが、#Kaggle Gemma 4 Good Hackathonで「GEM-4」を公開! Gemmaをロボットアームと接続し、ユーザーの意図に応じて物理的に支援する身体性AIデモを短期間で実装しています。B4で短期間でここまで形にしていて本当にすごいです。 youtu.be/OhaIA3bYwmg?si…
YouTube video
YouTube
Yuki Kondo tweet media
日本語
3
62
383
27.3K
pat ✈️ CVPR
pat ✈️ CVPR@patrickamadeus_·
This slaps me and couldnt be truer 😭
Garry Tan@garrytan

Bob McGrew has a framework I keep thinking about: in the AI future there are only two jobs. The Lone Genius and the Manager. That's it. Everything else gets absorbed. The Lone Genius is the person sitting alone at a computer, amplified 1000x by AI. One person with taste, vision, and relentless focus who can now do what used to take a team of 50. The Manager is the person who becomes CEO of their own "firm" where most of the employees are AI agents. They define the goals. They decide what matters. They coordinate. The AI does the execution. The Marxists will hear "two jobs" and panic. "What about everyone else?!" But here's what they're missing: AI doesn't shrink these two categories. It explodes them open. More people get to be geniuses. More people get to be managers. The barrier to entry for both just collapsed. What actually gets eliminated? David Graeber called them "bullshit jobs." Graeber was no libertarian! He inspired Occupy Wall Street. His words: "Huge swaths of people spend their entire working lives performing tasks they secretly believe don't really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul." Graeber said bullshit jobs are "a form of spiritual violence directed at the essence of what it means to be a human being." They induce "hopelessness, depression, and self-loathing." This is who the left should be fighting for. Not to preserve those jobs. To liberate people from them and give them better ones. The dirty secret of the modern economy: millions of people sit in roles so pointless that even they can't justify their existence. Compliance layers. Reporting layers. Coordination layers. Meeting-about-the-meeting layers. They know it's meaningless. It eats them alive. AI eats those layers. Good. That's a jailbreak. What I love about Bob's framework is where it points. The Lone Genius used to require a PhD, a lab, institutional backing. Now a 19-year-old with taste and Codex can ship what took a research team a year. The genius bottleneck was never talent. It was access. The Manager used to mean you needed to hire 50 people, raise money, build an org chart. Now you can orchestrate a fleet of AI agents from your laptop. The management bottleneck was never skill. It was capital. AI doesn't concentrate genius and management into fewer hands. It distributes them into more hands. The working class kid in West Virginia. The single mom in Ohio. The 55-year-old who got laid off and now builds software for the first time. Those are some of Bob's future geniuses and managers. The best founders I see at YC are already living this. They toggle between both modes in the same day. Morning: lone genius, creative insight, the thing nobody else sees. Afternoon: manager, spinning up agents, steering, shipping. The cycle time between genius and manager IS the new productivity metric. So when someone tells you AI means "only two jobs and everyone else starves," quote Graeber to them, they’ll get it. Graeber knew the real violence was making people do meaningless work and pretending it was dignity. AI ends that. More genius. More agency. Fewer spiritual prisons.

English
0
0
0
60
pat ✈️ CVPR retweetledi
Replica
Replica@John82924749·
🚨🚨Shunyu Yao is currently a Senior Staff Research Scientist at DeepMind. This interview was recorded around May 10 and runs nearly four hours in full. I selected the parts that I personally found most interesting, covering the following topics:
English
14
72
1.1K
176.9K
pat ✈️ CVPR
pat ✈️ CVPR@patrickamadeus_·
Hierarchical things really bat my interest lately, but the application are mostly at applied level (agentic, harness, etc.), even memory archi modif. still felt incremental. Great to see and that things like these are being actively explored at foundational level.
Clarisse Wibault@ClarisseWibault

CV has CNNs, NLP has transformers - what inductive bias does RL have? How can policies generalise to regions of the dataset suffering from poor transitions? We motivate hierarchy by enabling distinct state-representations at different levels of the hierarchy @FLAIR_Ox @j_foerst

English
0
1
4
113
pat ✈️ CVPR retweetledi
SkalskiP
SkalskiP@skalskip92·
CVPR is 2 weeks away putting together a list of must-see papers with links to code, demos, and posters; all in one place link: github.com/SkalskiP/top-c…
SkalskiP tweet media
English
6
30
209
11.9K
Dorsa
Dorsa@dorsa_rohani·
This paper might be the bible of distributed inference atp
Dorsa tweet media
English
5
45
581
33.4K
pat ✈️ CVPR retweetledi
Jiafei Duan
Jiafei Duan@DJiafei·
A week since MolmoAct2 (@allen_ai) launched, and the community response has been incredible — deployments, builds, and out-of-the-box rollouts all over the place. 🤖 We've now released the fine-tuning code, with @LeRobotHF support coming soon. github.com/allenai/molmoa… Below: a 2x rollout fine-tuned on just 50 demos 👇
English
3
14
110
5.6K
pat ✈️ CVPR
pat ✈️ CVPR@patrickamadeus_·
@Lianhuiq So cool! Wondering whether it can also generate uncertainty inside the environment? (e.g. sudden car crash, or anything wild :/ )
English
1
0
1
143
Lianhui Qin
Lianhui Qin@Lianhuiq·
Scaling embodied AI starts with automating the environments. Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D environments where agents act, fail, and learn. With coding-agent + embodied-agent co-evolution, navigation success improves from 50% → 90%. 1/
English
10
40
241
56.2K
Institute of Foundation Models
Zihan Liu, Technical Lead for the PAN world model at IFM’s Silicon Valley Lab, is joining us at @Stanford on May 21 to share how he and his team build interactive, long-horizon world simulations.
Institute of Foundation Models tweet media
English
3
3
6
1.2K
pat ✈️ CVPR retweetledi
Chubby♨️
Chubby♨️@kimmonismus·
We are still so early. And sometimes we forget that we live in an AI ivory tower. The majority don't use AI as intensively as we do. (Source reddit h/t r/Terrible-Priority-21)
Chubby♨️ tweet media
English
52
40
529
32.1K
Gabriele Berton
Gabriele Berton@gabriberton·
Are you throwing an event at CVPR this year? If you want it to be special, throw it on June 2, 3 or 7. There are countless events on the 4, 5, 6 but the 2, 3, 7 are quite empty. And researchers are in Denver already!
English
2
0
38
9.4K
pat ✈️ CVPR retweetledi
Google DeepMind
Google DeepMind@GoogleDeepMind·
We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵
English
410
1.3K
8.4K
1.2M
Gabriele Berton
Gabriele Berton@gabriberton·
Cool paper from Meta suggesting that future MLLMs will be Native Multimodal Models (NMM), hence no vision encoders anymore But I disagree I actually think we'll go in the other direction (what? more encoders? yes! read on...) All you need to know about the future of MLLMs 🧵
Gabriele Berton tweet media
Weiming Ren@wmren993

1/ 🚀 We’re excited to share Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! Tuna-2 is a native unified multimodal model that supports visual understanding, text-to-image generation, and image editing directly from pixel embeddings. 🐟✨ 📄 Paper: arxiv.org/abs/2604.24763 🌐 Project: tuna-ai.org/tuna-2 💻 Code: github.com/facebookresear… Most unified multimodal models still rely on pretrained vision encoders, which add architectural complexity and can create representation mismatches between understanding and generation. Tuna-2 asks a simple question: Do we still need vision encoders? 👀 Our answer is No! Tuna-2 has a completely encoder-free architecture, where images are processed directly by a unified transformer together with text tokens. Take a glimpse at what our model can generate ↓ 🎨🖼️

English
10
24
191
68.3K
pat ✈️ CVPR
pat ✈️ CVPR@patrickamadeus_·
any playlist / song recomms for melancholic banger songs? stuffs like Stressed Out by 21 Pilot, High Hopes by Panic Disco, etc.
English
2
0
1
154