Arjun Virk

756 posts

Arjun Virk banner
Arjun Virk

Arjun Virk

@virkvarjun

I research ML for Generalists Robots and share my learnings here | Researcher @UCLA's BAIR Lab | Software Engineering @UWaterloo

SF + Waterloo Katılım Eylül 2023
1.2K Takip Edilen1.1K Takipçiler
Sabitlenmiş Tweet
Arjun Virk
Arjun Virk@virkvarjun·
Day 9/30: Learning RL in Robotics Today, I'm proud to release my new research: FAACT (Failure-Aware Action Chunking Transformer). Through lots of testing and design, I was able to create an embedding space for my model to understand failure states and exploit ACT to self-correct its motion. Today, I demoed this at @newsystems_ to @join_ef and @KrishivThakuria, @thenadsusanto More to come in the next few days at @socraticainfo...
English
13
15
138
9.5K
Anton
Anton@notavp·
Waterloo I am in you
Anton tweet mediaAnton tweet media
English
3
0
44
1.3K
Jibraan
Jibraan@KadriJibraan·
hosting an intimate dinner on saturday in waterloo (few spots only) bringing together exceptional founders, builders, and creators. leave a comment if you want an invite. co-hosted by a16z speedrun, headstarter, and @powelldotst
English
40
3
55
3.9K
Arjun Virk retweetledi
Physical Intelligence
Physical Intelligence@physical_int·
We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.
English
13
165
1.4K
130.3K
Arjun Virk
Arjun Virk@virkvarjun·
Day 11/30: Learning RL in Robotics I built FAACT: Failure-Aware Action Chunking Transformer as a wrapper around @physical_int's pi0 to identify and self-correct on failures with minimal training (100K steps). This hypothesis is true as FAACT was able to complete more successful trials than pi0 alone by 20%! This is proof that FAACT has potential in improving results in robotics when randomness is involved, compared to other frameworks. Check out the demo below in-sim with ALOHA.
English
6
10
84
4.5K
theobot
theobot@Theonash_·
who wants free merch at waterloo?
English
6
0
4
598
Rodney
Rodney@992rodney·
Who wants to meet up with me and @0xTraderJoes @emm4x3 at Symposium! We will buy you Ice capps and Lazeez with venture capital dollars
English
23
1
33
4.3K
Adib
Adib@adibvafa·
I'll be in Waterloo this weekend for Socratica! Anyone around for coffee? Would love to chat
Adib tweet media
English
3
0
25
1K
chris
chris@chrislevan·
who's going to @socraticainfo next week? want to meet some folks!
English
15
0
34
1.7K
Jerry Jiang
Jerry Jiang@TheMingjie·
Your next Waterloo intern is already here You just didn't know it yet
Arjun Virk@virkvarjun

Day 10/30: Learning RL in Robotics As soon as I saw @karpathy's autoresearch, I knew I had to test it on my research, FAACT: Failure-Aware Action Chunking Transformer. While it's running, I'll share the technical overview: Failure predictor: AUROC~0.95 Embedding dimension: 512-d (decoder mean), 32-d (latent) Action chunk size: 100 × 14 Failure horizon K: 10 steps Dataset: 30 episodes, 60% success Intervention candidates: N perturbed chunks, pick the lowest risk Tomorrow, I'll share the results, which should hopefully bring up the current 75% accuracy.

English
3
0
130
18.4K
Arjun Virk
Arjun Virk@virkvarjun·
Day 10/30: Learning RL in Robotics As soon as I saw @karpathy's autoresearch, I knew I had to test it on my research, FAACT: Failure-Aware Action Chunking Transformer. While it's running, I'll share the technical overview: Failure predictor: AUROC~0.95 Embedding dimension: 512-d (decoder mean), 32-d (latent) Action chunk size: 100 × 14 Failure horizon K: 10 steps Dataset: 30 episodes, 60% success Intervention candidates: N perturbed chunks, pick the lowest risk Tomorrow, I'll share the results, which should hopefully bring up the current 75% accuracy.
Andrej Karpathy@karpathy

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. github.com/karpathy/nanoc… All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

English
1
8
147
33.5K
Krishiv
Krishiv@KrishivThakuria·
Toronto Demo Night was a dream come true There's something different about getting to see all your friends from Canada in 1 packed room while watching some of the most inspiring demos ever I'm so grateful that Canada showed up last night
Krishiv tweet mediaKrishiv tweet mediaKrishiv tweet mediaKrishiv tweet media
English
11
12
68
4.7K
Arjun Virk retweetledi