Thomas Kollar

113 posts

Thomas Kollar banner
Thomas Kollar

Thomas Kollar

@tkollar

Head of AI @ Wayve; ex-Research Manager @ TRI; ex-Amazon; ex Apple. CMU / MIT

san jose, ca Tham gia Ekim 2008
507 Đang theo dõi331 Người theo dõi
Thomas Kollar đã retweet
Wayve
Wayve@wayve_ai·
Meet LA-Pose. Our latest model taking Wayve another step towards generalization at scale. LA-Pose employs large-scale self-supervised learning, building strong motion representations for 3D perception from 10.2 million unlabeled driving video snippets, unlike today's strongest approaches that often depend on expensive, carefully curated 3D supervision. With only a lightweight pose head and limited labelled data, LA-Pose achieves: 📷 State-of-the-art camera pose estimation 🌎 Strong zero-shot generalization across diverse driving scenarios 🏷️ Orders of magnitude less labelled data than fully supervised 3D approaches Our full blog post: wayve.ai/thinking/la-po… Explore the full paper here: la-pose.github.io
English
1
36
145
35.1K
Thomas Kollar đã retweet
Wayve
Wayve@wayve_ai·
GAIA 3 introduces four powerful new capabilities that unlock richer and more scalable evaluation of autonomous driving systems. 🌍 🧵 Follow the thread below to see examples of; 1. Long perturb generations 🚗 2. Safety augmentations ⚠️ 3. Semantic augmentations 🌤️🌅🌙 4. Embodiment transfer 🚘📷 GAIA 3 re-generates the same scenario as if observed from different vehicles with different camera positions. One scene, three embodiments, consistent dynamics. Ideal for testing models across different hardware setups. These advances show how GAIA-3 brings new realism, diversity, and scale to the evaluation of end-to-end driving systems. 🚀 Dive into the full blog: wayve.ai/thinking/gaia-… Every clip you see below is generated by GAIA-3.👇 #GAIA3 #EmbodiedAI #AISafety #GenerativeAI #AutonomousVehicles
English
1
8
37
3.9K
Thomas Kollar đã retweet
Jamie Shotton
Jamie Shotton@Jamie_Shotton·
Big things cooking in Tahoe... 🚀
Jamie Shotton tweet media
English
1
1
20
1.7K
Siddharth Karamcheti
Siddharth Karamcheti@siddkaramcheti·
Thrilled to share that I'll be starting as an Assistant Professor at Georgia Tech (@ICatGT / @GTrobotics / @mlatgt) in Fall 2026. My lab will tackle problems in robot learning, multimodal ML, and interaction. I'm recruiting PhD students this next cycle – please apply/reach out!
Siddharth Karamcheti tweet mediaSiddharth Karamcheti tweet media
English
72
23
565
61.1K
Thomas Kollar đã retweet
Jamie Shotton
Jamie Shotton@Jamie_Shotton·
It's awesome to be back in the Bay Area this week at @wayve_ai's other North American office. I can't wait to test the massive progress the team's been making on rides around the Bay Area and city while I'm here, and to meet with our science leaders @vijaycivs @tkollar @gianlucacorrado and others to galvanise the groups at the start of an incredibly exciting #YearOfEmbodiedAI ahead! #Science #Team #EmbodiedAI
Jamie Shotton tweet media
English
1
1
32
1.3K
Thomas Kollar
Thomas Kollar@tkollar·
Building language models is difficult and requires high quality preprocessing, modeling, evaluation and large scale training. As significant collaborators in this project at TRI, the resulting 7B model DCLM-7B is a significant achievement. It is a competitor to Mistral 7B and LLaMA-7B, even though trained on less data. And it’s fully open. And that’s just the start of the competition. Excited to see how others leverage these results to build even more capable language models and improve dataset quality.
Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English
1
1
3
884
Thomas Kollar
Thomas Kollar@tkollar·
Excited to release Prismatic! Cutting through the noise of vision-language modeling, Prismatic is a release of 42 pre-trained VLMs from the 7B to 13B scale, a codebase for rigorous evaluation and a myriad of insights for what matters for performance.
Siddharth Karamcheti@siddkaramcheti

What design choices matter when developing a visually-conditioned language model (VLM)? Check out our paper – Prismatic VLMs – and open-source training code, evaluation suite, and 42 pretrained VLMs at the 7B-13B scale! 📜 arxiv.org/abs/2402.07865 ⚙️ + 🤗 github.com/TRI-ML/prismat…

English
2
1
10
1.6K
Thomas Kollar
Thomas Kollar@tkollar·
By first developing some of the best Vision-Language Models with Prismatic at TRI: github.com/TRI-ML/prismat… OpenVLA was able to quickly build some of the best generalist policies for robotics. Code, data and weights are all open-source: openvla.github.io This is a great achievement! Congrats @moo_jin_kim @siddkaramcheti @KarlPertsch @ashwinb96 @SurajNair_1 and all collaborators.
Moo Jin Kim@moo_jin_kim

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐 - SOTA generalist policy - 7B params - outperforms Octo, RT-2-X on zero-shot evals 🦾 - trained on 970k episodes from OpenX dataset 🤖 - fully open: model/code/data all online 🤗 🧵👇

English
0
2
8
1.5K
Dan Roy
Dan Roy@roydanroy·
Where I mess with a "scammer" posing as Kyunghyun Cho (or is it Cho himself???? we will find out!)
Dan Roy@roydanroy

@kchonyc

English
4
0
5
11K
Thomas Kollar đã retweet
Sedrick Keh
Sedrick Keh@sedrickkeh2·
Recurrent models like RWKV and Mamba have gained attention recently, but these can be costly to train and iterate on. What if we could simply... turn Mistral/Llama/Gemma into an RNN? 🎩🪄 Presenting our work, Linearizing Large Language Models! arxiv.org/abs/2405.06640
English
4
32
165
19.6K
Thomas Kollar
Thomas Kollar@tkollar·
Over the last year at TRI we’ve been training Large Language Models, including results in the following areas: Scaling: arxiv.org/abs/2403.08540 Alignment: arxiv.org/abs/2402.12366 As a part of upcoming work, we are sharing back with the open source community and releasing a performant Mamba model that we’ve trained at the 7B parameter scale. More results on linear transformers upcoming.
Sedrick Keh@sedrickkeh2

📢 Releasing TRI's open-source Mamba-7B trained on 1.2T tokens of RefinedWeb! Mamba-7B is the largest fully recurrent Mamba model trained and is a state-of-the-art recurrent LLM. 🚀🚀🚀 huggingface.co/TRI-ML/mamba-7…

English
1
3
12
3K
(((ل()(ل() 'yoav))))👾
Here is a meta-review we got for third submission of a paper that aims to study text-understanding capacities of LLMs, focusing on very simple, if not trivial, cases where they systematically fail. We see every stated weakness as a strength, and they are all by design.
(((ل()(ل() 'yoav))))👾 tweet media
English
23
24
231
94.3K