Kushal Arora

439 posts

Kushal Arora

@karora4u

Research Scientist at Toyota Research Institute. Ph.D. student at @rllabmcgill, @MILAMontreal. Prev: FAIR, MSR, BorealisAI, Amazon, UF.

Seattle, WA Katılım Mayıs 2009

1.2K Takip Edilen429 Takipçiler

Kushal Arora retweetledi

Jean Mercat@MercatJean·22 Nis

Our white paper just came out on Arxiv arxiv.org/abs/2604.19728…. We open-sourced it all github.com/TRI-ML/vla_fou…. Our project website also has links to the white paper, the weights, and more videos tri-ml.github.io/vla_foundry

English

4.4K

Kushal Arora retweetledi

Jean Mercat@MercatJean·22 Nis

Releasing VLA Foundry: an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. End-to-end control from language pretraining to action-expert fine-tuning — no more stitching together incompatible repos.

English

490

73.5K

Kushal Arora retweetledi

Katherine Liu@robo_kat·2 Ağu

ReFiNe expands on a neat idea we first presented at CoRL with Recursive Octree Auto-Decoders: that recursion can enable very high compression rates of 3D data. In ReFiNe, we use this property to represent continuous fields and can decode multiple NeRFs/SDFs with a single network.

Sergey Zakharov@ZakharovSergeyN

Excited to introduce our paper, ReFiNe, at #SIGGRAPH2024 this Thursday! Learn how we encode multiple assets as continuous neural fields with high precision & low memory usage by exploiting object self-similarity. @RaresAmbrus @robo_kat @adnothing Webpage: zakharos.github.io/projects/refin…

English

Kushal Arora retweetledi

Sergey Zakharov@ZakharovSergeyN·1 Ağu

English

1.9K

Kushal Arora@karora4u·25 Tem

@Swarooprm7 Congrats Swaroop!

English

124

Swaroop Mishra@Swarooprm7·25 Tem

Feeling incredibly proud of this achievement ❤️ See details here: dpmd.ai/imo-silver

Google DeepMind@GoogleDeepMind

We’re presenting the first AI to solve International Mathematical Olympiad problems at a silver medalist level.🥈 It combines AlphaProof, a new breakthrough model for formal reasoning, and AlphaGeometry 2, an improved version of our previous system. 🧵 dpmd.ai/imo-silver

English

9.7K

Kushal Arora@karora4u·25 Tem

@ke_huang275 @achalddave Though it is difficult to say why benchmarks got better with IT, my speculation is this is due to the DCLM-IT data, as it contains datasets such as Nectar, no_robots, StarCoder2-Self-OSS-Instruct, which have math, code, QA data that might help improve the benchmarks performance.

English

Kushal Arora@karora4u·25 Tem

@ke_huang275 @achalddave @ke_huang275 We trained for 10 epochs as we saw AlpacaEval score improving beyond first few epochs. So, we decided to keep fine-tuning. Here is how the AlpacaEval looked for each epoch:

English

Achal Dave@achalddave·23 Tem

Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!

English

114

30.6K

Kushal Arora retweetledi

Sachin Grover@sachingrover·20 Haz

I am looking for positions in LLM based agents, and combining planning and learning techniques/systems. I have around 2.5 years of industry research including two years at @PARCinc as a research scientist and multiple summer intern positions @amazon @alexa99. 1/6

English

5.8K

Kushal Arora retweetledi

Thomas Kollar@tkollar·20 Haz

Building language models is difficult and requires high quality preprocessing, modeling, evaluation and large scale training. As significant collaborators in this project at TRI, the resulting 7B model DCLM-7B is a significant achievement. It is a competitor to Mistral 7B and LLaMA-7B, even though trained on less data. And it’s fully open. And that’s just the start of the competition. Excited to see how others leverage these results to build even more capable language models and improve dataset quality.

Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English

884

Kushal Arora@karora4u·19 Haz

For more details, see the paper arxiv.org/abs/2406.11794…, and the website: datacomp.ai/dclm/.

English

Kushal Arora@karora4u·19 Haz

One thing I have come to greatly appreciate over the last year is the role of data filtering in building SOTA language models. DCLM introduces a filtered 240T dataset, a 7B open-source model that is competitive w/ Llama3 with 2-6x fewer tokens & a pipeline to build new datasets.

Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English

735

Kushal Arora retweetledi

Aran Komatsuzaki@arankomatsuzaki·18 Haz

DataComp-LM: In search of the next generation of training sets for language models - Provides a corpus of 240T tokens from Common Crawl - Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B proj: datacomp.ai/dclm/ abs: arxiv.org/abs/2406.11794

English

202

33.9K

Kushal Arora retweetledi

Luca Soldaini 🎀@soldni·19 Haz

Really impressed by the work DCLM folks did!!

Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English

1.7K

Kushal Arora retweetledi

Achal Dave@achalddave·19 Haz

Check out DataComp for language models! Open data, open code, open training recipe, and close to Llama3-8B performance. This has been a labor of love over the last year, a huge thanks to all the collaborators for helping make this happen!

Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English

4.4K

Kushal Arora retweetledi

Vaishaal Shankar@Vaishaal·19 Haz

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English

274

120.1K

Kushal Arora@karora4u·18 Haz

Sedrick is an amazing researcher and has done amazing work on pre-training, scaling, evaluation, Japanese LMs, code models, VLMs, and more in the last year. If you are at NAACL, do get a coffee with him!

Sedrick Keh@sedrickkeh2

I'm attending #NAACL2024 at Mexico City this week! Excited to chat about pre-training, evaluation, and multimodality! (also excited for🌮🌯🫔)

English

673

Kushal Arora retweetledi

Sedrick Keh@sedrickkeh2·15 May

Recurrent models like RWKV and Mamba have gained attention recently, but these can be costly to train and iterate on. What if we could simply... turn Mistral/Llama/Gemma into an RNN? 🎩🪄 Presenting our work, Linearizing Large Language Models! arxiv.org/abs/2405.06640

English

165

19.6K

Kushal Arora retweetledi

Sedrick Keh@sedrickkeh2·15 May

2024 has seen tons of cool work on RNNs (cc @RWKV_AI, @BlinkDL_AI, @GoogleDeepMind Griffin, @AI21Labs). We hope our work helps further research into linear models! Work done at @ToyotaResearch with @MercatJean @vslevic @sedrickkeh2 @karora4u @achalddave @adnothing @tkollar

English

834

Kushal Arora@karora4u·21 Mar

A really large in-the-wild robotics dataset from TRI colleagues And university partners, a major step in the direction of building Robotics Foundation Model.

Alexander Khazatsky@SashaKhazatsky

After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices

English

230

Keşfet

@RaresAmbrus @robo_kat @adnothing @Swarooprm7 @ke_huang275 @achalddave @PARCinc @amazon