Alan Melling

7.4K posts

Alan Melling

Alan Melling

@alanmelling

junction of Computer Vision and Graphics, Principal R&D Engineer at Carvana, Co-Creator at Nature Time

ATL 가입일 Aralık 2010
2.2K 팔로잉532 팔로워
Alan Melling 리트윗함
Tanmay Gupta
Tanmay Gupta@tanmay2099·
Had the surreal experience of telling a room full of computer vision researchers at the ICCV25 AC workshop why “computer vision researcher” won’t be a thing in 5 years 🌶️ Of course, this was an extreme stance to keep things lively in a fun debate setting but it echoed some of my own internal monologue over the past few years as someone who has identified as a computer vision researcher for the last decade. The argument went as follows: ⚡️a research community needs a set of core problems and methods that are specific to that community ⚡️the vision community used to have these 10-15yrs ago but today’s general purpose multimodal architectures assume very little about the input/output modality and are likely to subsume more tasks and modalities over time ⚡️time and again we have had to swallow the bitter pill — methods that bake human intuition into learning algorithm might show gains in the short term but are eventually surpassed by more general methods that utilize more data and compute - llms, vlms, sora, genie etc ⚡️gains in vision systems over the last many years have come from things that have nothing to do with vision or images but general advances in deep learning - optimizers, normalization layers, attention, residual connections, quantization, parallelization methods, larger models etc. Computer vision ends at tokenization and then deep learning and distributed systems engineering take over. ⚡️so not only would “vision researcher” be obsolete, we must actively fight the urge to play a “computer vision researcher” to avoid our biases from creeping into our AI systems ⚡️in short, there is nothing uniquely vision in today’s computer vision research and there is too much overlap with other specialized communities like robotics, graphics etc Thanks again @ICCVConference PCs for hosting the debate and @anikembhavi for inviting me to participate! It was incredibly awesome for everyone at the AC workshop to take this discussion in a fun spirit 🙌 Arguing for the motion with me were @sarameghanbeery and @RoozbehMottaghi In our opposition were @HildeKuehne @aagrawalAA and @bluevincent If you are a vision researcher, share your thoughts whether you agree or not!
Tanmay Gupta tweet media
English
10
21
210
28.8K
Alan Melling 리트윗함
Moritz Reuss
Moritz Reuss@moritz_reuss·
VLAs have become the fastest-growing subfield in robot learning. So where are we now? After reviewing ICLR 2026 submissions and conversations at CoRL, I wrote an overview of the current state of VLA research with some personal takes: is.gd/1pqw9w
English
11
105
533
53K
Alan Melling 리트윗함
Chris Offner
Chris Offner@chrisoffner3d·
Is the terror reign of redundant scene representations ending? Where VGGT, CUT3R, and other recent models relied on godless redundant outputs (depth+points+pose) without guaranteeing internal prediction consistency, MapAnything and DepthAnything 3 are now heroically pushing back.
Chris Offner tweet mediaChris Offner tweet media
English
4
11
70
21.3K
Alan Melling 리트윗함
Stephen James
Stephen James@stepjamUK·
𝗗𝗟𝗥 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗴𝗮𝘃𝗲 𝗮 𝗿𝗼𝗯𝗼𝘁𝗶𝗰 𝗮𝗿𝗺 𝗳𝘂𝗹𝗹-𝗯𝗼𝗱𝘆 𝘁𝗼𝘂𝗰𝗵 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆 𝘄𝗶𝘁𝗵 𝗻𝗼 𝗮𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝘀𝗸𝗶𝗻 𝗻𝗲𝗲𝗱𝗲𝗱. They used internal force-torque sensors at 8 kHz + deep learning. The robot can feel where you touch it, recognize letters drawn on its surface, and respond to virtual buttons placed anywhere on its body. What's interesting is the infrastructure behind it. To train these models, you need high-frequency sensor streams, manifold learning to unfold trajectories, and the ability to iterate fast. They collected 2,300 samples from 20 people and hit 95.5% accuracy on digit recognition. This is what's possible when you have the right data infrastructure. 📄 lnkd.in/exgWfeXf Video credit: @DLR_en
English
56
343
2.3K
173.3K
Alan Melling 리트윗함
Interintellect 🧭
Interintellect 🧭@interintellect_·
Can mathematical models of history help us escape the worst outcomes? @Peter_Turchin, the pioneering complexity scientist, and Interintellect founder @TheAnnaGat explore his groundbreaking work on cliodynamics, recurring cycles of societal collapse, and America's current position on the brink of upheaval. Watch the full, thought-provoking conversation: youtu.be/52hM9vbi-ck
YouTube video
YouTube
English
2
5
14
3K
Alan Melling 리트윗함
Thinking Machines
Thinking Machines@thinkymachines·
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA. thinkingmachines.ai/blog/lora/
Thinking Machines tweet media
English
82
560
3.5K
1.4M
Alan Melling 리트윗함
Zhiqiu Lin
Zhiqiu Lin@ZhiqiuLin·
🎉CameraBench has been accepted as a Spotlight (3%) @ NeurIPS 2025. Huge congrats to all collaborators at CMU, MIT-IBM, UMass, Harvard, and Adobe. CameraBench is a large-scale effort that pushes video-language models to reason about the language of camera motion just like professional cinematographers. 🌍 Our open-source dataset, models, and code are also gaining strong interest and adoption from frontier labs such as DeepMind and Kling to advance video generation research. 📄Paper: arxiv.org/abs/2504.15376 🌐 Website: linzhiqiu.github.io/papers/camerab…
Zhiqiu Lin@ZhiqiuLin

📷 Can AI understand camera motion like a cinematographer? Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links below! We contribute a taxonomy of motion primitives, co-designed over months with professional cinematographers, and apply rigorous quality control to label and caption all aspects of camera motion. CameraBench shows that even the best SfMs and VLMs struggle with real-world, dynamic videos. Yet, a generative VLM post-trained on our high-quality data matches SOTA SfM (MegaSAM) in geometric understanding and outperforms SOTA VLMs (Gemini-2.5 / GPT-4o) in semantic understanding, e.g., describing how the camera moves. 📄 Paper: huggingface.co/papers/2504.15… 🌐 Website: linzhiqiu.github.io/papers/camerab… Work led by CMU, MIT-IBM, UMass, Adobe, Harvard, Emerson with @censiyuan1, @chancharikm, @JayKarhade, @du_yilun, @gan_chuang, and @RamananDeva.

English
7
24
158
25.8K
Dragoneyes Hatesg00gle
Dragoneyes Hatesg00gle@Dragoneyes_001·
@MatthewBerman @jonah_lipsitt And yet the windmills are still Worthless forms of creating energy. Just the energy footprint they consume to be created and installed exceeds the total energy they create before the leading edges of the blades starts breaking down and need servicing!
English
1
0
0
45
Alan Melling 리트윗함
himanshu
himanshu@himanshustwts·
how crazy they have put 15TB worth of physics simulation datasets on internet
himanshu tweet mediahimanshu tweet mediahimanshu tweet media
English
19
309
3.3K
125.3K
Alan Melling 리트윗함
JingyuanLiu
JingyuanLiu@JingyuanLiu123·
I was lucky to work in both China and the US LLM labs, and I've been thinking this for a while. The current values of pretraining are indeed different: US labs be like: - lots of GPUs and much larger flops run - Treating stabilities more seriously, and could not tolerate spikes in large flops run, thus invented so many stability-related tricks, including all kinds of soft-cap, MuP, and spectral norm control tricks - Treats predictabilities more seriously. Check GPT 4 report for reference, even trying to predict the eval task performances - Because of the stability and predictability ask, treats hyper-params and optimization more seriously - Generally believe more in data, optimization than arch China labs be like: - has very limited GPUs, e.g. k2 in 4k GPU and v3 in 2k GPU - as a result, pushing for the limit of pretrain modeling-infra co-design, see so many tricks in V3, and K2 has some cool stuff too (the offload trick helps remove the stupid MoE gating constrain and only uses EP 16) - cares model arch/token efficiency over optimization, stability - cares more about data quality than data quantity - taking inference into consideration day 0, even before the training starts In general, China labs are trying to use <4e+24 flops models to catch up with >1e+25 flops models. It is hard or impossible, but they are making good progress. I am actually very happy to see Qwen's new try on model archs, they used to be focusing more on data side rather than on model arch side. They developed linear attn, not just for people to think they are innovating, it is actually considering pushing the limit for test time scaling. Llama4 failed for many reasons, but qwen-next is different. They just used very limited flops and it is a brave try for good reasons.
Charuru Charuru@CharuruCha14310

@teortaxesTex @JingyuanLiu123 I bet OpenAI/xAI is laughing so hard, this result is obvious tbh, they took a permanent architectural debuff in order to save on compute costs.

English
58
331
3.2K
525.3K
Alan Melling 리트윗함
Emma P
Emma P@emmaconcepts·
@fchollet @generativist You say this but a lot of early to mid 20th century physics papers were like "notes about what kind of things I'm thinking about the stuff we are all thinking about" 😛
English
0
2
10
824
Alan Melling 리트윗함
Jonathon Luiten
Jonathon Luiten@JonathonLuiten·
Introducing: Hyperscape Capture 📷 Last year we showed the world's highest quality Gaussian Splatting, and the first time GS was viewable in VR. Now, capture your own Hyperscapes, directly from your Quest headset in only 5 minutes of walking around. meta.com/experiences/87…
Jonathon Luiten@JonathonLuiten

Hyperscape: The future of VR and the Metaverse Excited that Zuckerberg @finkd announced what I have been working on at Connect. Hyperscape enables people to create high fidelity replicas of physical spaces, and embody them in VR. Check out the demo app: meta.com/experiences/79…

English
42
274
2K
339.7K
Alan Melling 리트윗함
PlayCanvas
PlayCanvas@playcanvas·
Goodbye SOGS, hello SOG! 👋 PlayCanvas open sources Spatially Ordered Gaussians - a new super-compressed format for 3D Gaussian Splatting. blog.playcanvas.com/playcanvas-ope…
English
8
72
427
48.2K
Alan Melling 리트윗함
Jia-Bin Huang
Jia-Bin Huang@jbhuang0604·
How AI Taught Itself to See Self-supervised learning is fascinating! How can AI learn from images only without labels? In this video, we’ll build the method from first principles and uncover the key ideas behind CLIP, MAE, SimCLR, and DINO (v1–v3). Video 👇
Jia-Bin Huang tweet media
English
13
87
578
36.1K
Alan Melling 리트윗함
nikita diakur
nikita diakur@nikitadiakur·
beach 💔 dolly in
English
107
1K
10.5K
294.4K
Alan Melling 리트윗함
Jasper
Jasper@zjasper·
AI is great at hitting explicit goals, but often at the cost of the hidden ones. Terence Tao just wrote about this. He points out: AI is the ultimate executor of Goodhart’s law, i.e. when a measure becomes the target, it stops measuring what we care about. Take a call center. Management sets a KPI: “shorten average call time.” Sounds reasonable: shorter calls should mean faster resolutions, happier customers. At first, it works. Agents become more efficient. But soon, people start gaming it: nudging customers to hang up when the problem is tricky, or just dropping the call themselves. The numbers look amazing. Call times plummet. But customer satisfaction? Straight into the ground. Now replace “call time” with “prove theorem X.” If human mathematicians did it, they’d refine definitions, polish lemmas, contribute back to Mathlib, train juniors, deepen the understanding of math structures, and strengthen the community. The AI, by contrast, optimizes only for the explicit goal. It might generate a 10,000-line proof in hours. Perfectly correct, but unreadable, unusable, and useless for human learning. The summit is reached but the forest along the way is gone. We need to start making our implicit goals explicit and design systems that protect the values we actually care about, not just the numbers we can measure.
Jasper tweet mediaJasper tweet media
English
80
115
948
79.8K
Alan Melling 리트윗함
Stephen Wolfram
Stephen Wolfram@stephen_wolfram·
If you do functional programming (like in Wolfram Language) you've probably used lots of pure functions, or lambdas. But what are lambdas like in the wild? Things I'm doing in CS, bio and ML converged to make me curious to find out... And as seems to happen whenever I go exploring in the computational universe ... they surprised me ... writings.stephenwolfram.com/2025/09/the-ru…
Stephen Wolfram tweet media
English
26
197
1.4K
167.2K