pat ✈️ CVPR

418 posts

pat ✈️ CVPR

@patrickamadeus_

multimodal, grounding, and embodied ai | PhD @mbzuai

Katılım Kasım 2020

708 Takip Edilen375 Takipçiler

Sabitlenmiş Tweet

pat ✈️ CVPR@patrickamadeus_·24 Ağu

Personal update: I am starting my PhD @mbzuai where I look forward to work in multimodal realm (interpretability, modality imbalance, eval & application) to address foundational gaps with @AlhamFikri and co.

English

144

13K

pat ✈️ CVPR retweetledi

Vals AI@ValsAI·8h

Pitch us a benchmark or eval technique. We'll fund you to build it. We're opening applications for the Vals Fellowship. 3–6 months working on the hardest open problems in AI evaluation, with the resources to actually solve them. What you get: - Unlimited API credits + budget capacity for GPUs and human data - Vals’ evaluation infrastructure - $1,000–2,500 / week stipend - A network of evals researchers across frontier labs and academia Location: Both remote / in-person in SF applications will be considered

English

281

28.6K

pat ✈️ CVPR retweetledi

Yuki Kondo@y_kondo_vision·2d

同研究室B4の前田君，松ケ谷君らが、#Kaggle Gemma 4 Good Hackathonで「GEM-4」を公開！ Gemmaをロボットアームと接続し、ユーザーの意図に応じて物理的に支援する身体性AIデモを短期間で実装しています。B4で短期間でここまで形にしていて本当にすごいです。 youtu.be/OhaIA3bYwmg?si…

YouTube

日本語

383

27.3K

pat ✈️ CVPR@patrickamadeus_·1d

This slaps me and couldnt be truer 😭

Garry Tan@garrytan

Bob McGrew has a framework I keep thinking about: in the AI future there are only two jobs. The Lone Genius and the Manager. That's it. Everything else gets absorbed. The Lone Genius is the person sitting alone at a computer, amplified 1000x by AI. One person with taste, vision, and relentless focus who can now do what used to take a team of 50. The Manager is the person who becomes CEO of their own "firm" where most of the employees are AI agents. They define the goals. They decide what matters. They coordinate. The AI does the execution. The Marxists will hear "two jobs" and panic. "What about everyone else?!" But here's what they're missing: AI doesn't shrink these two categories. It explodes them open. More people get to be geniuses. More people get to be managers. The barrier to entry for both just collapsed. What actually gets eliminated? David Graeber called them "bullshit jobs." Graeber was no libertarian! He inspired Occupy Wall Street. His words: "Huge swaths of people spend their entire working lives performing tasks they secretly believe don't really need to be performed. The moral and spiritual damage that comes from this situation is profound. It is a scar across our collective soul." Graeber said bullshit jobs are "a form of spiritual violence directed at the essence of what it means to be a human being." They induce "hopelessness, depression, and self-loathing." This is who the left should be fighting for. Not to preserve those jobs. To liberate people from them and give them better ones. The dirty secret of the modern economy: millions of people sit in roles so pointless that even they can't justify their existence. Compliance layers. Reporting layers. Coordination layers. Meeting-about-the-meeting layers. They know it's meaningless. It eats them alive. AI eats those layers. Good. That's a jailbreak. What I love about Bob's framework is where it points. The Lone Genius used to require a PhD, a lab, institutional backing. Now a 19-year-old with taste and Codex can ship what took a research team a year. The genius bottleneck was never talent. It was access. The Manager used to mean you needed to hire 50 people, raise money, build an org chart. Now you can orchestrate a fleet of AI agents from your laptop. The management bottleneck was never skill. It was capital. AI doesn't concentrate genius and management into fewer hands. It distributes them into more hands. The working class kid in West Virginia. The single mom in Ohio. The 55-year-old who got laid off and now builds software for the first time. Those are some of Bob's future geniuses and managers. The best founders I see at YC are already living this. They toggle between both modes in the same day. Morning: lone genius, creative insight, the thing nobody else sees. Afternoon: manager, spinning up agents, steering, shipping. The cycle time between genius and manager IS the new productivity metric. So when someone tells you AI means "only two jobs and everyone else starves," quote Graeber to them, they’ll get it. Graeber knew the real violence was making people do meaningless work and pretending it was dignity. AI ends that. More genius. More agency. Fewer spiritual prisons.

English

pat ✈️ CVPR retweetledi

Replica@John82924749·2d

🚨🚨Shunyu Yao is currently a Senior Staff Research Scientist at DeepMind. This interview was recorded around May 10 and runs nearly four hours in full. I selected the parts that I personally found most interesting, covering the following topics:

English

1.1K

176.9K

pat ✈️ CVPR@patrickamadeus_·2d

Hierarchical things really bat my interest lately, but the application are mostly at applied level (agentic, harness, etc.), even memory archi modif. still felt incremental. Great to see and that things like these are being actively explored at foundational level.

Clarisse Wibault@ClarisseWibault

CV has CNNs, NLP has transformers - what inductive bias does RL have? How can policies generalise to regions of the dataset suffering from poor transitions? We motivate hierarchy by enabling distinct state-representations at different levels of the hierarchy @FLAIR_Ox @j_foerst

English

113

pat ✈️ CVPR retweetledi

SkalskiP@skalskip92·3d

CVPR is 2 weeks away putting together a list of must-see papers with links to code, demos, and posters; all in one place link: github.com/SkalskiP/top-c…

English

209

11.9K

pat ✈️ CVPR@patrickamadeus_·2d

@dorsa_rohani Ril? @erla_ndpg

746

Dorsa@dorsa_rohani·2d

This paper might be the bible of distributed inference atp

English

581

33.4K

pat ✈️ CVPR retweetledi

Jiafei Duan@DJiafei·3d

A week since MolmoAct2 (@allen_ai) launched, and the community response has been incredible — deployments, builds, and out-of-the-box rollouts all over the place. 🤖 We've now released the fine-tuning code, with @LeRobotHF support coming soon. github.com/allenai/molmoa… Below: a 2x rollout fine-tuned on just 50 demos 👇

English

110

5.6K

pat ✈️ CVPR@patrickamadeus_·4d

@Lianhuiq So cool! Wondering whether it can also generate uncertainty inside the environment? (e.g. sudden car crash, or anything wild :/ )

English

143

Lianhui Qin@Lianhuiq·5d

Scaling embodied AI starts with automating the environments. Introducing SimWorld Studio: a self-evolving factory for endless interactive 3D environments where agents act, fail, and learn. With coding-agent + embodied-agent co-evolution, navigation success improves from 50% → 90%. 1/

English

241

56.2K

pat ✈️ CVPR@patrickamadeus_·5d

@IFM_MBZUAI @Stanford @ericxing hi! actively researching WM eval and would like to learn more and use PAN. Will there be any live streaming / recording?

English

Institute of Foundation Models@IFM_MBZUAI·5d

Zihan Liu, Technical Lead for the PAN world model at IFM’s Silicon Valley Lab, is joining us at @Stanford on May 21 to share how he and his team build interactive, long-horizon world simulations.

Institute of Foundation Models tweet media

English

1.2K

pat ✈️ CVPR retweetledi

Chubby♨️@kimmonismus·13 May

We are still so early. And sometimes we forget that we live in an AI ivory tower. The majority don't use AI as intensively as we do. (Source reddit h/t r/Terrible-Priority-21)

English

529

32.1K

pat ✈️ CVPR@patrickamadeus_·5d

@ImperialistsL 🤣

QME

pat ✈️ CVPR@patrickamadeus_·5d

kapan ya indo 😔

Gabriel Chua@gabrielchua

OpenAI for Singapore 🇸🇬

Indonesia

258

pat ✈️ CVPR@patrickamadeus_·5d

@IkhlasulHanif0 :(

QAM

Hanif | AI NOT FOR PRODUCTIVITY@IkhlasulHanif0·5d

@patrickamadeus_ When yh

English

pat ✈️ CVPR@patrickamadeus_·5d

@gabriberton @arcjax7 hi Gabriele! can you share any other similar events that you have encountered?

English

Gabriele Berton@gabriberton·6d

@arcjax7 Oh I see, I'm talking more about evening company events like luma.com/cvpr-ai-after-…

English

205

Gabriele Berton@gabriberton·6d

Are you throwing an event at CVPR this year? If you want it to be special, throw it on June 2, 3 or 7. There are countless events on the 4, 5, 6 but the 2, 3, 7 are quite empty. And researchers are in Denver already!

English

9.4K

pat ✈️ CVPR retweetledi

Google DeepMind@GoogleDeepMind·6d

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

English

410

1.3K

8.4K

1.2M

pat ✈️ CVPR@patrickamadeus_·5d

yeah ok man

English

pat ✈️ CVPR@patrickamadeus_·5d

@IkhlasulHanif0 @gabriberton yea, data hogger vs shallower learning basically, kinda pick ur poison situation (unless u are gpu rich)

English

Hanif | AI NOT FOR PRODUCTIVITY@IkhlasulHanif0·6d

@gabriberton @patrickamadeus_ thoughts?

English

Gabriele Berton@gabriberton·8 May

Cool paper from Meta suggesting that future MLLMs will be Native Multimodal Models (NMM), hence no vision encoders anymore But I disagree I actually think we'll go in the other direction (what? more encoders? yes! read on...) All you need to know about the future of MLLMs 🧵

Weiming Ren@wmren993

1/ 🚀 We’re excited to share Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation! Tuna-2 is a native unified multimodal model that supports visual understanding, text-to-image generation, and image editing directly from pixel embeddings. 🐟✨ 📄 Paper: arxiv.org/abs/2604.24763 🌐 Project: tuna-ai.org/tuna-2 💻 Code: github.com/facebookresear… Most unified multimodal models still rely on pretrained vision encoders, which add architectural complexity and can create representation mismatches between understanding and generation. Tuna-2 asks a simple question: Do we still need vision encoders? 👀 Our answer is No! Tuna-2 has a completely encoder-free architecture, where images are processed directly by a unified transformer together with text tokens. Take a glimpse at what our model can generate ↓ 🎨🖼️

English

191

68.3K

pat ✈️ CVPR@patrickamadeus_·6d

At this point Anthropic is like the avengers man, so bullish on them!

Andrej Karpathy@karpathy

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

pat ✈️ CVPR@patrickamadeus_·6d

@ImperialistsL Yessir, thanks boy

English

pat ✈️ CVPR@patrickamadeus_·13 May

any playlist / song recomms for melancholic banger songs? stuffs like Stressed Out by 21 Pilot, High Hopes by Panic Disco, etc.

English

154

Keşfet

@dorsa_rohani @erla_ndpg @allen_ai @LeRobotHF @Lianhuiq @IFM_MBZUAI @Stanford @ericxing