Greg Tarr

178 posts

Greg Tarr

@Greg_Tarr

ai researcher, cto @markovrobotics

Katılım Eylül 2019

1.8K Takip Edilen1.8K Takipçiler

Greg Tarr@Greg_Tarr·20 Kas

@_gundawar jumpscare!

English

Greg Tarr@Greg_Tarr·23 Eki

@BenAybar @agi_inc The approach certainly does. We have the best computer use, mobile use, and web agents in the world.

English

Ben Aybar 🦑🧲@BenAybar·23 Eki

@Greg_Tarr @agi_inc Is your impression that it generalizes well outside of osworld tasks?

English

Greg Tarr@Greg_Tarr·23 Eki

we @agi_inc just achieved 76.3% on the OSWorld benchmark taking the #1 spot from ByteDance (53.1%)

English

599

Greg Tarr@Greg_Tarr·23 Eki

@BenAybar @agi_inc I'm not sure that's public information right now. I can tell you that it doesn't matter enormously which base model one chooses as long as you have really good data and RL.

English

Ben Aybar 🦑🧲@BenAybar·23 Eki

@Greg_Tarr @agi_inc Cool, interesting. I saw that training used a base reasoning model. Can you say what that base reasoning model was?

English

Greg Tarr@Greg_Tarr·23 Eki

@BenAybar @agi_inc Yes, it's a general model. It can perform coding tasks, but it's not optimized for programming - tools like Codex or Claude are much faster and better suited for that. However, it can still assist in areas like automated UI testing when used alongside those specialised agents!

English

Ben Aybar 🦑🧲@BenAybar·23 Eki

@Greg_Tarr @agi_inc Is this a general model? Can it perform coding tasks and whatever as well? It's listed that way on the Osworld-verified leaderboard but just want to confirm

English

135

Greg Tarr@Greg_Tarr·23 Eki

Link: #benchmark" target="_blank" rel="nofollow noopener">os-world.github.io/#benchmark

English

248

Greg Tarr@Greg_Tarr·11 Eki

we're hiring as well, come join us!

AGI, Inc.@agi_inc

AGI, Inc. is now the global leader on the AndroidWorld benchmark, with state-of-the-art verified performance of 97.4% This is a huge milestone for Android use, and just a sneak preview of what's coming - bringing trustworthy, reliable agents to every screen 🚀

English

440

Greg Tarr@Greg_Tarr·29 Eyl

@TacoCohen that’s where world models come in!

English

459

Taco Cohen@TacoCohen·29 Eyl

The two main issues with GRPO: 1) No credit assignment, unless you do rollouts from each state (VinePPO-style), which is super expensive. 2) Doing multiple rollouts from the same state requires state resetting / copying capabilities. This is fine for question answering, and simple virtual environments, but quickly becomes unrealistic in more complex envs. Even exactly cloning a running docker container (including the state of running processes, etc.) is nearly impossible.

English

287

30.8K

Greg Tarr@Greg_Tarr·18 Oca

As promised here's a TPA implementation in PyTorch w/ KV cache: github.com/Greg-Tarr/tpa-… Didn't reference the authors' code so it might deviate a bit. Next step is to add a kernel to compute attn scores without materializing a ⊗ b in memory

English

415

Greg Tarr@Greg_Tarr·17 Oca

Homepage: tensorgi.github.io/T6/

English

249

Greg Tarr@Greg_Tarr·17 Oca

TPA (arxiv.org/pdf/2501.06425) is another banger paper. It has better (10x) KV cache compression than MLA ~and~ it's RoPE compatible. Haven't read the repo yet as I want to do a blind implementation tomorrow but I hear the authors are working on a kernel that computes attn scores without materializing QKV (directly from their factorized forms).

English

383

Greg Tarr@Greg_Tarr·11 Oca

Titans (arxiv/2501.00663) is an all-round great paper. It reads almost like a blog post: probes prior research, asks pertinent questions, and naturally leads to a few elegant architectures that perform really well! I'd have missed it if not for gh/lucidrains as I haven't seen anyone mentioning it here.

English

678

Greg Tarr@Greg_Tarr·6 Oca

@yoavgo Awesome, 100% using this!

English

(((ل()(ل() 'yoav))))👾@yoavgo·6 Oca

i was annoyed at having many chrome tabs with PDF papers having uninformative titles, so i created a small chrome extension to fix it. i'm using it for a while now, works well. today i put it on github. enjoy. github.com/yoavg/pdf-tab-…

English

874

44K

Greg Tarr@Greg_Tarr·5 Oca

@scaling01 We’ll unlock better latent reasoning by lifting some unnecessary burdens we’ve been placing on our models. Best I can think of is dense MLPs vs Memory+ layers. All those flops can go back to world modelling!

English

Lisan al Gaib@scaling01·5 Oca

Don't you fucking dare make both options 50-50 😭 I know that scaling TTC is much better economically and probably also better for model performance. It's the easier path. But I think getting much deeper and larger models might also unlock better latent reasoning.

English

1.3K

Lisan al Gaib@scaling01·5 Oca

I'm so irrational about scaling laws. I would love to see 100 trillion parameters before we see 1 million token CoTs. What scaling are you more excited about?

English

14.8K

Greg Tarr@Greg_Tarr·4 Oca

@0xluffy @scaling01 It’s more important to deeply understand the landmark papers (>3m old and still influential) than it is to skim >10 bleeding edge papers everyday day. Pick a landmark paper, set full reimplementation as your project for the week, and read as much as you can in the downtime.

English

493

luffy@0xluffy·4 Oca

how do you guys keep up with frontier research rn? is there an easier way (than arxiv) to find all these papers or is there a knowledge graph that aggregates it? rn only following people like @scaling01 closely

English

242

38.9K

Greg Tarr@Greg_Tarr·3 Oca

@shxf0072 @teortaxesTex @hwchung27 Structure tends to be temporary. I remember going from handwritten features to one-shot YOLO, and we’ll go from post-model MCTS to intra-model unstructured search spaces. It’s just a consequence of identifying a problem and hacking a quick solution on top.

English

Joey (e/λ)@shxf0072·3 Oca

@teortaxesTex @hwchung27 fixing problems, instead of letting model learn to fix problem is structure wild guess but i think its same story with o1, ocra/step by step solving like dataset adding human cot structure to lm, let them think free

English

312

Joey (e/λ)@shxf0072·3 Oca

this is how i feel about ++ tricks, its easy to add structure, its hard to remove it

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

I think the best modern Transformer+++ design (diff transformer, gated deltanet, sparse MoE, NTP+n, some memory etc etc) might be only one OOM away in efficiency from the Optimal Architecture, unless we cheat and search for strong humanlike inductive biases (which isn't The Way).

English

109

10.6K

Greg Tarr@Greg_Tarr·3 Oca

@gwern @jxmnop yeah, with the same training set the models will converge, performing similarly (and also platonic representation hypo)

English

162

dr. jack morris@jxmnop·2 Oca

startup idea: Ramanujan AI premise: humans all have similar brain structures, but only one in a billion is a true genius hypothesis: maybe this is true for LLMs too Step 1: train a billion 7B llama models from scratch w random initializations Step 2: search through to find the Ramanjuan LLM Step 3: ??? Step 4: profit

English

89.1K

Greg Tarr@Greg_Tarr·1 Oca

@_arohan_ I use runpod too (not for notebooks). If you write a script to ssh and setup everything (e.g. git creds, clone, download buckets or whatever) you’ll save so much time. Still a pain to setup every morning - was thinking of making a dev containers equivalent for my projects

English

196

rohan anil@_arohan_·1 Oca

Got myself runpod.io and a40 for prototyping. So far its smooth, I just need to figure out persistent storage across pods now.

rohan anil@_arohan_

Colab A100 experience is pretty awful for prototyping. The sessions take lot of time to connect. Are there other services out there which is not a DIY for a notebook+h100 experience?

English

15.9K

Greg Tarr@Greg_Tarr·1 Oca

@Springcoil Living between York (gf's uni) and Ireland - 90/10. Planning to apply to an AI lab next year which will probably bring me back to London!

English

121

Peadar Coyle (🇪🇺 eu/acc)@Springcoil·31 Ara

@Greg_Tarr You in London now?

English

154

Greg Tarr@Greg_Tarr·31 Ara

@finbarrtimbers unfortunately discovered this by training 100-800m param models thinking my MTP implementation was faulty. Turns out 1B+ is where models prepare +1 token context and MTP starts paying off. Mitigated slightly by sequential MTP modules and scaling loss contrib. during training

English

104

finbarr@finbarrtimbers·30 Ara

there must be many similar results that weren't discovered bc research stopped due to poor results on small models

English

2.4K

finbarr@finbarrtimbers·30 Ara

I'm reading the multi-token prediction paper (which is great, writeup coming soon) and one of the troublesome results from the paper is that their results are much better with larger models

English

323

42.2K

Keşfet

@_gundawar @BenAybar @agi_inc @TacoCohen @yoavgo @scaling01 @elonmusk @BarackObama