Alan Melling

7.4K posts

Alan Melling

@alanmelling

junction of Computer Vision and Graphics, Principal R&D Engineer at Carvana, Co-Creator at Nature Time

ATL 가입일 Aralık 2010

2.2K 팔로잉532 팔로워

Alan Melling 리트윗함

Tanmay Gupta@tanmay2099·19 Eki

Had the surreal experience of telling a room full of computer vision researchers at the ICCV25 AC workshop why “computer vision researcher” won’t be a thing in 5 years 🌶️ Of course, this was an extreme stance to keep things lively in a fun debate setting but it echoed some of my own internal monologue over the past few years as someone who has identified as a computer vision researcher for the last decade. The argument went as follows: ⚡️a research community needs a set of core problems and methods that are specific to that community ⚡️the vision community used to have these 10-15yrs ago but today’s general purpose multimodal architectures assume very little about the input/output modality and are likely to subsume more tasks and modalities over time ⚡️time and again we have had to swallow the bitter pill — methods that bake human intuition into learning algorithm might show gains in the short term but are eventually surpassed by more general methods that utilize more data and compute - llms, vlms, sora, genie etc ⚡️gains in vision systems over the last many years have come from things that have nothing to do with vision or images but general advances in deep learning - optimizers, normalization layers, attention, residual connections, quantization, parallelization methods, larger models etc. Computer vision ends at tokenization and then deep learning and distributed systems engineering take over. ⚡️so not only would “vision researcher” be obsolete, we must actively fight the urge to play a “computer vision researcher” to avoid our biases from creeping into our AI systems ⚡️in short, there is nothing uniquely vision in today’s computer vision research and there is too much overlap with other specialized communities like robotics, graphics etc Thanks again @ICCVConference PCs for hosting the debate and @anikembhavi for inviting me to participate! It was incredibly awesome for everyone at the AC workshop to take this discussion in a fun spirit 🙌 Arguing for the motion with me were @sarameghanbeery and @RoozbehMottaghi In our opposition were @HildeKuehne @aagrawalAA and @bluevincent If you are a vision researcher, share your thoughts whether you agree or not!

English

210

28.8K

Alan Melling 리트윗함

Moritz Reuss@moritz_reuss·14 Eki

VLAs have become the fastest-growing subfield in robot learning. So where are we now? After reviewing ICLR 2026 submissions and conversations at CoRL, I wrote an overview of the current state of VLA research with some personal takes: is.gd/1pqw9w

English

105

533

53K

Alan Melling 리트윗함

Chris Offner@chrisoffner3d·12 Eki

Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.

English

21.3K

Alan Melling 리트윗함

Stephen James@stepjamUK·14 Eki

𝗗𝗟𝗥 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗴𝗮𝘃𝗲 𝗮 𝗿𝗼𝗯𝗼𝘁𝗶𝗰 𝗮𝗿𝗺 𝗳𝘂𝗹𝗹-𝗯𝗼𝗱𝘆 𝘁𝗼𝘂𝗰𝗵 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆 𝘄𝗶𝘁𝗵 𝗻𝗼 𝗮𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝘀𝗸𝗶𝗻 𝗻𝗲𝗲𝗱𝗲𝗱. They used internal force-torque sensors at 8 kHz + deep learning. The robot can feel where you touch it, recognize letters drawn on its surface, and respond to virtual buttons placed anywhere on its body. What's interesting is the infrastructure behind it. To train these models, you need high-frequency sensor streams, manifold learning to unfold trajectories, and the ability to iterate fast. They collected 2,300 samples from 20 people and hit 95.5% accuracy on digit recognition. This is what's possible when you have the right data infrastructure. 📄 lnkd.in/exgWfeXf Video credit: @DLR_en

English

343

2.3K

173.3K

Alan Melling 리트윗함

Interintellect 🧭@interintellect_·4 Eki

Can mathematical models of history help us escape the worst outcomes? @Peter_Turchin, the pioneering complexity scientist, and Interintellect founder @TheAnnaGat explore his groundbreaking work on cliodynamics, recurring cycles of societal collapse, and America's current position on the brink of upheaval. Watch the full, thought-provoking conversation: youtu.be/52hM9vbi-ck

YouTube

English

Alan Melling 리트윗함

Thomas Kipf@tkipf·25 Eyl

Project page: video-zero-shot.github.io Paper: arxiv.org/abs/2509.20328 by @thwiedemer, Yuxuan Li, @PaulVicol, @shaneguML, @nmatares, @kswersk, @_beenkim, @priyankjaini & Robert Geirhos

English

10.9K

Alan Melling 리트윗함

Thinking Machines@thinkymachines·29 Eyl

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/

English

560

3.5K

1.4M

Alan Melling 리트윗함

Zhiqiu Lin@ZhiqiuLin·19 Eyl

🎉CameraBench has been accepted as a Spotlight (3%) @ NeurIPS 2025. Huge congrats to all collaborators at CMU, MIT-IBM, UMass, Harvard, and Adobe. CameraBench is a large-scale effort that pushes video-language models to reason about the language of camera motion just like professional cinematographers. 🌍 Our open-source dataset, models, and code are also gaining strong interest and adoption from frontier labs such as DeepMind and Kling to advance video generation research. 📄Paper: arxiv.org/abs/2504.15376 🌐 Website: linzhiqiu.github.io/papers/camerab…

Zhiqiu Lin@ZhiqiuLin

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.

English

158

25.8K

Alan Melling@alanmelling·20 Eyl

@Dragoneyes_001 @MatthewBerman @jonah_lipsitt Stop perpetuating untrue myths.

English

Dragoneyes Hatesg00gle@Dragoneyes_001·19 Eyl

@MatthewBerman @jonah_lipsitt And yet the windmills are still Worthless forms of creating energy. Just the energy footprint they consume to be created and installed exceeds the total energy they create before the leading edges of the blades starts breaking down and need servicing!

English

Matthew Berman@MatthewBerman·19 Eyl

Mind BLOWN This animation was created by my team (@jonah_lipsitt) with 100% AI.

English

147

671

212.8K

Alan Melling 리트윗함

himanshu@himanshustwts·19 Eyl

how crazy they have put 15TB worth of physics simulation datasets on internet

English

309

3.3K

125.3K

Alan Melling 리트윗함

Jay Guthrie@StraughterG·18 Eyl

Inspired by recent convos on ViPE, VGGT, and MapAnything—e.g., @rsasaki0109, @ducha_aiki, @gabriberton. Optimized VGGT for Apple Silicon with MPS for efficient 3D recon on Macs! Repo: github.com/jmanhype/vggt-… 1/4

English

102

7.2K

Alan Melling 리트윗함

JingyuanLiu@JingyuanLiu123·13 Eyl

I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes in large flops run, thus invented so many stability-related tricks, including all kinds of soft-cap, MuP, and spectral norm control tricks - Treats predictabilities more seriously. Check GPT 4 report for reference, even trying to predict the eval task performances - Because of the stability and predictability ask, treats hyper-params and optimization more seriously - Generally believe more in data, optimization than arch China labs be like: - has very limited GPUs, e.g. k2 in 4k GPU and v3 in 2k GPU - as a result, pushing for the limit of pretrain modeling-infra co-design, see so many tricks in V3, and K2 has some cool stuff too (the offload trick helps remove the stupid MoE gating constrain and only uses EP 16) - cares model arch/token efficiency over optimization, stability - cares more about data quality than data quantity - taking inference into consideration day 0, even before the training starts In general, China labs are trying to use <4e+24 flops models to catch up with >1e+25 flops models. It is hard or impossible, but they are making good progress. I am actually very happy to see Qwen's new try on model archs, they used to be focusing more on data side rather than on model arch side. They developed linear attn, not just for people to think they are innovating, it is actually considering pushing the limit for test time scaling. Llama4 failed for many reasons, but qwen-next is different. They just used very limited flops and it is a brave try for good reasons.

Charuru Charuru@CharuruCha14310

@teortaxesTex @JingyuanLiu123 I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

English

331

3.2K

525.3K

Alan Melling 리트윗함

Emma P@emmaconcepts·26 Eki

@fchollet @generativist You say this but a lot of early to mid 20th century physics papers were like "notes about what kind of things I'm thinking about the stuff we are all thinking about" 😛

English

824

Alan Melling 리트윗함

Dmytro Mishkin 🇺🇦@ducha_aiki·18 Eyl

Towards the Next Generation of 3D Reconstruction @Parskatt PhD Thesis. tl;dr: would be useful in teaching image matching - nice explanations. (too) Fancy and stylish notation. Cool Ack section and cover image. liu.diva-portal.org/smash/record.j…

English

127

6.4K

Alan Melling 리트윗함

Jonathon Luiten@JonathonLuiten·18 Eyl

Introducing: Hyperscape Capture 📷 Last year we showed the world's highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. meta.com/experiences/87…

Jonathon Luiten@JonathonLuiten

Hyperscape: The future of VR and the Metaverse Excited that Zuckerberg @finkd announced what I have been working on at Connect. Hyperscape enables people to create high fidelity replicas of physical spaces, and embody them in VR. Check out the demo app: meta.com/experiences/79…

English

274

339.7K

Alan Melling 리트윗함

PlayCanvas@playcanvas·18 Eyl

Goodbye SOGS, hello SOG! 👋 PlayCanvas open sources Spatially Ordered Gaussians - a new super-compressed format for 3D Gaussian Splatting. blog.playcanvas.com/playcanvas-ope…

English

427

48.2K

Alan Melling 리트윗함

Jia-Bin Huang@jbhuang0604·17 Eyl

How AI Taught Itself to See Self-supervised learning is fascinating! How can AI learn from images only without labels? In this video, we’ll build the method from first principles and uncover the key ideas behind CLIP, MAE, SimCLR, and DINO (v1–v3). Video 👇

English

578

36.1K

Alan Melling 리트윗함

nikita diakur@nikitadiakur·16 Eyl

beach 💔 dolly in

English

107

10.5K

294.4K

Alan Melling 리트윗함

Jasper@zjasper·15 Eyl

AI is great at hitting explicit goals, but often at the cost of the hidden ones. Terence Tao just wrote about this. He points out: AI is the ultimate executor of Goodhart’s law, i.e. when a measure becomes the target, it stops measuring what we care about. Take a call center. Management sets a KPI: “shorten average call time.” Sounds reasonable: shorter calls should mean faster resolutions, happier customers. At first, it works. Agents become more efficient. But soon, people start gaming it: nudging customers to hang up when the problem is tricky, or just dropping the call themselves. The numbers look amazing. Call times plummet. But customer satisfaction? Straight into the ground. Now replace “call time” with “prove theorem X.” If human mathematicians did it, they’d refine definitions, polish lemmas, contribute back to Mathlib, train juniors, deepen the understanding of math structures, and strengthen the community. The AI, by contrast, optimizes only for the explicit goal. It might generate a 10,000-line proof in hours. Perfectly correct, but unreadable, unusable, and useless for human learning. The summit is reached but the forest along the way is gone. We need to start making our implicit goals explicit and design systems that protect the values we actually care about, not just the numbers we can measure.

English

115

948

79.8K

Alan Melling 리트윗함

Stephen Wolfram@stephen_wolfram·15 Eyl

If you do functional programming (like in Wolfram Language) you've probably used lots of pure functions, or lambdas. But what are lambdas like in the wild? Things I'm doing in CS, bio and ML converged to make me curious to find out... And as seems to happen whenever I go exploring in the computational universe ... they surprised me ... writings.stephenwolfram.com/2025/09/the-ru…

English

197

1.4K

167.2K

탐색

@ICCVConference @anikembhavi @sarameghanbeery @RoozbehMottaghi @HildeKuehne @aagrawalAA @bluevincent @DLR_en