Kushal Arora

439 posts

Kushal Arora

Kushal Arora

@karora4u

Research Scientist at Toyota Research Institute. Ph.D. student at @rllabmcgill, @MILAMontreal. Prev: FAIR, MSR, BorealisAI, Amazon, UF.

Seattle, WA Katılım Mayıs 2009
1.2K Takip Edilen429 Takipçiler
Kushal Arora retweetledi
Jean Mercat
Jean Mercat@MercatJean·
Releasing VLA Foundry: an open-source framework that unifies LLM, VLM, and VLA training in a single codebase. End-to-end control from language pretraining to action-expert fine-tuning — no more stitching together incompatible repos.
English
10
76
490
73.5K
Kushal Arora retweetledi
Katherine Liu
Katherine Liu@robo_kat·
ReFiNe expands on a neat idea we first presented at CoRL with Recursive Octree Auto-Decoders: that recursion can enable very high compression rates of 3D data. In ReFiNe, we use this property to represent continuous fields and can decode multiple NeRFs/SDFs with a single network.
Sergey Zakharov@ZakharovSergeyN

Excited to introduce our paper, ReFiNe, at #SIGGRAPH2024 this Thursday! Learn how we encode multiple assets as continuous neural fields with high precision & low memory usage by exploiting object self-similarity. @RaresAmbrus @robo_kat @adnothing Webpage: zakharos.github.io/projects/refin…

English
0
2
10
1K
Kushal Arora
Kushal Arora@karora4u·
@ke_huang275 @achalddave Though it is difficult to say why benchmarks got better with IT, my speculation is this is due to the DCLM-IT data, as it contains datasets such as Nectar, no_robots, StarCoder2-Self-OSS-Instruct, which have math, code, QA data that might help improve the benchmarks performance.
English
0
0
2
78
Kushal Arora
Kushal Arora@karora4u·
@ke_huang275 @achalddave @ke_huang275 We trained for 10 epochs as we saw AlpacaEval score improving beyond first few epochs. So, we decided to keep fine-tuning. Here is how the AlpacaEval looked for each epoch:
Kushal Arora tweet media
English
1
0
2
75
Achal Dave
Achal Dave@achalddave·
Excited to share our new-and-improved 1B models trained with DataComp-LM! - 1.4B model trained on 4.3T tokens - 5-shot MMLU 47.5 (base model) => 51.4 (w/ instruction tuning) - Fully open models: public code, weights, dataset!
Achal Dave tweet media
English
3
29
114
30.6K
Kushal Arora retweetledi
Sachin Grover
Sachin Grover@sachingrover·
I am looking for positions in LLM based agents, and combining planning and learning techniques/systems. I have around 2.5 years of industry research including two years at @PARCinc as a research scientist and multiple summer intern positions @amazon @alexa99. 1/6
English
3
5
16
5.8K
Kushal Arora retweetledi
Thomas Kollar
Thomas Kollar@tkollar·
Building language models is difficult and requires high quality preprocessing, modeling, evaluation and large scale training. As significant collaborators in this project at TRI, the resulting 7B model DCLM-7B is a significant achievement. It is a competitor to Mistral 7B and LLaMA-7B, even though trained on less data. And it’s fully open. And that’s just the start of the competition. Excited to see how others leverage these results to build even more capable language models and improve dataset quality.
Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English
1
1
3
884
Kushal Arora
Kushal Arora@karora4u·
One thing I have come to greatly appreciate over the last year is the role of data filtering in building SOTA language models. DCLM introduces a filtered 240T dataset, a 7B open-source model that is competitive w/ Llama3 with 2-6x fewer tokens & a pipeline to build new datasets.
Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English
1
0
6
735
Kushal Arora retweetledi
Aran Komatsuzaki
Aran Komatsuzaki@arankomatsuzaki·
DataComp-LM: In search of the next generation of training sets for language models - Provides a corpus of 240T tokens from Common Crawl - Trains a LM using their filtered dataset, which performs similarly on NLU tasks w/ 6.6x less compute than Llama 3 8B proj: datacomp.ai/dclm/ abs: arxiv.org/abs/2406.11794
Aran Komatsuzaki tweet media
English
1
42
202
33.9K
Kushal Arora retweetledi
Achal Dave
Achal Dave@achalddave·
Check out DataComp for language models! Open data, open code, open training recipe, and close to Llama3-8B performance. This has been a labor of love over the last year, a huge thanks to all the collaborators for helping make this happen!
Vaishaal Shankar@Vaishaal

I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x

English
1
10
27
4.4K
Kushal Arora retweetledi
Vaishaal Shankar
Vaishaal Shankar@Vaishaal·
I am really excited to introduce DataComp for Language Models (DCLM), our new testbed for controlled dataset experiments aimed at improving language models. 1/x
Vaishaal Shankar tweet media
English
7
79
274
120.1K
Kushal Arora
Kushal Arora@karora4u·
Sedrick is an amazing researcher and has done amazing work on pre-training, scaling, evaluation, Japanese LMs, code models, VLMs, and more in the last year. If you are at NAACL, do get a coffee with him!
Sedrick Keh@sedrickkeh2

I'm attending #NAACL2024 at Mexico City this week! Excited to chat about pre-training, evaluation, and multimodality! (also excited for🌮🌯🫔)

English
0
1
6
673
Kushal Arora retweetledi
Sedrick Keh
Sedrick Keh@sedrickkeh2·
Recurrent models like RWKV and Mamba have gained attention recently, but these can be costly to train and iterate on. What if we could simply... turn Mistral/Llama/Gemma into an RNN? 🎩🪄 Presenting our work, Linearizing Large Language Models! arxiv.org/abs/2405.06640
English
4
32
165
19.6K