Luis

189 posts

Luis

Luis

@lusxvr

CS @ TUM

Katılım Ocak 2015
404 Takip Edilen1K Takipçiler
Sabitlenmiş Tweet
Luis
Luis@lusxvr·
Today, we are releasing FineVision, a huge open-source dataset for training state-of-the-art Vision-Language Models: > 17.3M images > 24.3M samples > 88.9M turns > 9.5B answer tokens Here are my favourite findings:
Luis tweet media
English
19
204
1.4K
109.6K
Luis retweetledi
Fulcrum
Fulcrum@fulcrum_inc·
🚨 We're open-sourcing Druids, a library for coordinating and deploying coding agents across machines. Our beta users have used Druids to work on open math problems, conduct ML "autoresearch," and make software faster.
English
3
31
219
22.7K
Luis
Luis@lusxvr·
@lvwerra So you're saying FineVision2 is going to be coming? 👀
English
0
0
2
226
Leandro von Werra
Leandro von Werra@lvwerra·
Auto-research for ML training models is all the rage now, but underrated is: auto-research for data! Sure, you can squeeze out a bit of model performance by optimizing hyperparameters, but code agents can do data work that has been very labour intensive and required a lot of attention to a lot details effortlessly: > download data from many different data sources > bring all the data sources into uniform format > do detailed EDA: find patterns and outliers > look at 100s of samples and take detailed notes > make beautiful infographics rather than mpl plots > iterate on data filtering by looking at more samples > make a simple pipelines robust and scalable It's now possible to write data pipelines for dozens of data sources in hours that would have taken weeks of reading many docs, debugging APIs and data formats, wrangling outliers and missing data. A few weeks ago we gave Claude access to the CPU partition of our cluster and it iteratively refined filters to retrieve a domain subset of FineWeb. This would have taken me 2-3 days to work through while it took Claude just a few hours with almost no babysitting and with a nice logbook. Thus the long tail of small, niche data sources becomes more accessible and can be aggregated to even larger high quality datasets for cool applications. Data has been fuelling LLM progress more than model architecture innovations, so I am very excited about this!
English
11
30
276
21.7K
Francesco Capuano
Francesco Capuano@_fracapuano·
be me, aka my three body problem > wake up: jax is so going to take over > sip coffee: but jax-metal sucks > get coding: apple silicon training rocks > more coffee: torch.device("mps") is my best friend > more more coffee: torch is king > more coding: but jax is so cool...
English
2
1
13
1.7K
Luis
Luis@lusxvr·
@eliebakouch One of the goats, it was a pleasure to work with you
English
0
0
1
38
elie
elie@eliebakouch·
today is my last day at hugging face feeling really grateful to have worked with such an amazing team and learned so much along the way. i’m proud of what we accomplished together, especially the smollm series. building that project from scratch, putting so much into it, and getting to iterate on a model and training recipe that pushed the frontier for its size was really rewarding i hope i was able to play a part in making model training more accessible and in pushing the open model ecosystem forward. i’m also very thankful to hf for giving me the chance to share my passion for llm research, especially here, and to connect with so many awesome people things can get quite intense in this field, but i’m still very excited about the next challenges and about the good this technology can do but first, taking a few weeks break :)
English
116
10
745
33K
Luis
Luis@lusxvr·
@andimarafioti Ofc, had to check if the models opinion of you was accurate ;)
English
0
0
0
21
Luis
Luis@lusxvr·
@cgeorgiaw Can highly recommend, improves mental clarity and creativity for me if I sleep longer
English
0
0
1
159
Georgia Channing
Georgia Channing@cgeorgiaw·
I’ve been sleeping 10+ hours every night for the last week Is this normal?
English
7
0
21
3.9K
Andi Marafioti
Andi Marafioti@andimarafioti·
Pretty excited about this project I'm releasing later this week :) (audio only)
English
1
1
26
1.8K
Francesco Capuano
Francesco Capuano@_fracapuano·
Had a ton of fun giving a talk today on how I’ve been looking at Robot Learning. Giving talks is a great way to pull one’s ideas together, and I was happy to share my (highly opinionated, hehe) take with such a techies crowd :)
Francesco Capuano tweet media
English
3
1
18
1.1K
Luis retweetledi
Fulcrum
Fulcrum@fulcrum_inc·
We're launching Lunette: a platform that uses investigator agents to audit your AI agents and environments. It answers questions like: why does my agent fail? Are there bugs in my eval? What behavioral patterns emerge across tasks?
English
2
8
46
9.5K
Vishaal Udandarao
Vishaal Udandarao@vishaal_urao·
We also emailed the authors of Cambrian-S sharing our findings, we included their response in Appendix! +1 for Open Science!🧑‍🔬 Great 1st step by the Cambrian-S team and we hope to jointly push for better spatial supersensing video models in the near future!🙏 @shushengyang
Vishaal Udandarao tweet media
English
3
1
16
979
Vishaal Udandarao
Vishaal Udandarao@vishaal_urao·
🚀 New paper! arxiv.org/abs/2511.16655 Recently, Cambrian-S released models & two benchmarks (VSR & VSC) for “spatial supersensing” in video! We found: 1️⃣ Simple no-frame baseline (NoSense) ~perfectly solves VSR! 2️⃣ Tiny sanity check collapses Cambrian-S perf to 0% on VSC! 🧵👇
Vishaal Udandarao tweet media
English
5
23
122
40.2K
Chris Offner
Chris Offner@chrisoffner3d·
That’s the direction I want. NeRF/3DGS-SLAM works along roughly those lines. We predict what we’ll see next, then update our model based on what we predicted vs. what we actually see. Except there, the prior/model is based solely on previous test-time images from the same scene.
Chris Offner tweet media
Saining Xie@sainingxie

looking ahead, we’re prototyping something new -- we call it predictive sensing. our paper cited tons of work from cogsci and developmental psychology. the more we read, the more amazed we became by human / animal sensing. the human visual system is super high-bandwidth, yet insanely efficient. each eye’s 6 million cone receptors can transmit ~1.6 Gbit/s, yet the brain uses only about 10 bits/s to guide behavior. most sensory data is filtered, compressed, and everything is autopiloted -- you don’t even notice. how does our brain pull that off? one leading theory: your brain runs a predictive world model in the background for sensing, constantly forecasting the future and comparing it to what actually happens. - if the prediction error is low → it’s expected, you can ignore it. - if it’s high → it’s a surprise, and your brain pays attention, updating memory. we don't have anything comparable in LLMs right now. to test this idea, we trained a latent frame prediction (LFP) head on top of Cambrian-S. we estimate "surprise" during inference, and use it in two ways: 1️⃣ surprise-driven memory management -- compress or skip non-surprising frames, focus compute on surprising ones. 2️⃣ surprise-driven event segmentation -- use surprise spikes to detect event boundaries or scene changes. by leveraging signals from this internal predictive model, we’re already seeing promising gains on spatial cognition tasks. it’s just a toy predictive world model -- but with this mechanism, our small model outperforms gemini on vsi-super. [6/n]

English
7
13
177
23.7K
Gabriele Berton
Gabriele Berton@gabriberton·
Summary of this conversation: VGGT is faster. But not robust to OOD data. Robust to doppelgangers (e.g. buildings with two similar walls on opposite sides). COLMAP gives more precise poses. More scalable. Works well on ~anything except doppelgangers VGGT with BA would help pose precision and probably coming soon For most people speed is not a priority (nobody cares about "online reconstruction") COLMAP is actively maintained and improved by the goats of 3D. It is here to stay
Gabriele Berton@gabriberton

Is COLMAP still widely used or are Mast3r / VGGT taking over?

English
9
9
85
10.7K
Luis
Luis@lusxvr·
@andimarafioti Yeah, that would have been confusing for sure haha
English
0
0
0
60
Andi Marafioti
Andi Marafioti@andimarafioti·
@lusxvr you broke the trend with naming! But also, SmolVision? didn't make sense xD
English
1
0
1
138
Andi Marafioti
Andi Marafioti@andimarafioti·
2025 was the year of smol publications for me
Andi Marafioti tweet media
English
10
4
213
14.2K
Andi Marafioti
Andi Marafioti@andimarafioti·
@lusxvr Did you order one? We are getting some shipped in time for Christmas 💕
English
1
0
0
78
Andi Marafioti
Andi Marafioti@andimarafioti·
I finally got a reachy mini beta! Setting down to build it now 🤗
Andi Marafioti tweet media
English
4
2
36
5.2K
Andi Marafioti
Andi Marafioti@andimarafioti·
You can now train SOTA models without any storage!🌩️ We completely revamped the Hub’s backend to enable streaming at scale. We streamed TBs of data to 100s of H100s to train SOTA VLMs and saw serious speed-ups. But how?
English
15
30
223
49.9K