Greg Tarr

178 posts

Greg Tarr banner
Greg Tarr

Greg Tarr

@Greg_Tarr

ai researcher, cto @markovrobotics

Katılım Eylül 2019
1.8K Takip Edilen1.8K Takipçiler
Greg Tarr
Greg Tarr@Greg_Tarr·
@BenAybar @agi_inc The approach certainly does. We have the best computer use, mobile use, and web agents in the world.
English
0
0
1
31
Greg Tarr
Greg Tarr@Greg_Tarr·
we @agi_inc just achieved 76.3% on the OSWorld benchmark taking the #1 spot from ByteDance (53.1%)
English
2
1
9
599
Greg Tarr
Greg Tarr@Greg_Tarr·
@BenAybar @agi_inc I'm not sure that's public information right now. I can tell you that it doesn't matter enormously which base model one chooses as long as you have really good data and RL.
English
1
0
1
40
Ben Aybar 🦑🧲
Ben Aybar 🦑🧲@BenAybar·
@Greg_Tarr @agi_inc Cool, interesting. I saw that training used a base reasoning model. Can you say what that base reasoning model was?
English
1
0
1
60
Greg Tarr
Greg Tarr@Greg_Tarr·
@BenAybar @agi_inc Yes, it's a general model. It can perform coding tasks, but it's not optimized for programming - tools like Codex or Claude are much faster and better suited for that. However, it can still assist in areas like automated UI testing when used alongside those specialised agents!
English
1
0
1
53
Ben Aybar 🦑🧲
Ben Aybar 🦑🧲@BenAybar·
@Greg_Tarr @agi_inc Is this a general model? Can it perform coding tasks and whatever as well? It's listed that way on the Osworld-verified leaderboard but just want to confirm
English
1
0
1
135
Taco Cohen
Taco Cohen@TacoCohen·
The two main issues with GRPO: 1) No credit assignment, unless you do rollouts from each state (VinePPO-style), which is super expensive. 2) Doing multiple rollouts from the same state requires state resetting / copying capabilities. This is fine for question answering, and simple virtual environments, but quickly becomes unrealistic in more complex envs. Even exactly cloning a running docker container (including the state of running processes, etc.) is nearly impossible.
English
12
19
287
30.8K
Greg Tarr
Greg Tarr@Greg_Tarr·
As promised here's a TPA implementation in PyTorch w/ KV cache: github.com/Greg-Tarr/tpa-… Didn't reference the authors' code so it might deviate a bit. Next step is to add a kernel to compute attn scores without materializing a ⊗ b in memory
Greg Tarr tweet media
English
0
0
5
415
Greg Tarr
Greg Tarr@Greg_Tarr·
TPA (arxiv.org/pdf/2501.06425) is another banger paper. It has better (10x) KV cache compression than MLA ~and~ it's RoPE compatible. Haven't read the repo yet as I want to do a blind implementation tomorrow but I hear the authors are working on a kernel that computes attn scores without materializing QKV (directly from their factorized forms).
English
1
0
4
383
Greg Tarr
Greg Tarr@Greg_Tarr·
Titans (arxiv/2501.00663) is an all-round great paper. It reads almost like a blog post: probes prior research, asks pertinent questions, and naturally leads to a few elegant architectures that perform really well! I'd have missed it if not for gh/lucidrains as I haven't seen anyone mentioning it here.
Greg Tarr tweet media
English
1
0
5
678
(((ل()(ل() 'yoav))))👾
i was annoyed at having many chrome tabs with PDF papers having uninformative titles, so i created a small chrome extension to fix it. i'm using it for a while now, works well. today i put it on github. enjoy. github.com/yoavg/pdf-tab-…
(((ل()(ل() 'yoav))))👾 tweet media
English
23
64
874
44K
Greg Tarr
Greg Tarr@Greg_Tarr·
@scaling01 We’ll unlock better latent reasoning by lifting some unnecessary burdens we’ve been placing on our models. Best I can think of is dense MLPs vs Memory+ layers. All those flops can go back to world modelling!
English
0
0
1
48
Lisan al Gaib
Lisan al Gaib@scaling01·
Don't you fucking dare make both options 50-50 😭 I know that scaling TTC is much better economically and probably also better for model performance. It's the easier path. But I think getting much deeper and larger models might also unlock better latent reasoning.
English
3
0
17
1.3K
Lisan al Gaib
Lisan al Gaib@scaling01·
I'm so irrational about scaling laws. I would love to see 100 trillion parameters before we see 1 million token CoTs. What scaling are you more excited about?
English
11
1
37
14.8K
Greg Tarr
Greg Tarr@Greg_Tarr·
@0xluffy @scaling01 It’s more important to deeply understand the landmark papers (>3m old and still influential) than it is to skim >10 bleeding edge papers everyday day. Pick a landmark paper, set full reimplementation as your project for the week, and read as much as you can in the downtime.
English
0
0
10
493
luffy
luffy@0xluffy·
how do you guys keep up with frontier research rn? is there an easier way (than arxiv) to find all these papers or is there a knowledge graph that aggregates it? rn only following people like @scaling01 closely
English
31
13
242
38.9K
Greg Tarr
Greg Tarr@Greg_Tarr·
@shxf0072 @teortaxesTex @hwchung27 Structure tends to be temporary. I remember going from handwritten features to one-shot YOLO, and we’ll go from post-model MCTS to intra-model unstructured search spaces. It’s just a consequence of identifying a problem and hacking a quick solution on top.
English
0
0
2
45
Joey (e/λ)
Joey (e/λ)@shxf0072·
@teortaxesTex @hwchung27 fixing problems, instead of letting model learn to fix problem is structure wild guess but i think its same story with o1, ocra/step by step solving like dataset adding human cot structure to lm, let them think free
English
2
0
10
312
Greg Tarr
Greg Tarr@Greg_Tarr·
@gwern @jxmnop yeah, with the same training set the models will converge, performing similarly (and also platonic representation hypo)
English
0
0
1
162
dr. jack morris
dr. jack morris@jxmnop·
startup idea: Ramanujan AI premise: humans all have similar brain structures, but only one in a billion is a true genius hypothesis: maybe this is true for LLMs too Step 1: train a billion 7B llama models from scratch w random initializations Step 2: search through to find the Ramanjuan LLM Step 3: ??? Step 4: profit
English
79
35
1K
89.1K
Greg Tarr
Greg Tarr@Greg_Tarr·
@_arohan_ I use runpod too (not for notebooks). If you write a script to ssh and setup everything (e.g. git creds, clone, download buckets or whatever) you’ll save so much time. Still a pain to setup every morning - was thinking of making a dev containers equivalent for my projects
English
0
0
3
196
Greg Tarr
Greg Tarr@Greg_Tarr·
@Springcoil Living between York (gf's uni) and Ireland - 90/10. Planning to apply to an AI lab next year which will probably bring me back to London!
English
0
0
2
121
Greg Tarr
Greg Tarr@Greg_Tarr·
@finbarrtimbers unfortunately discovered this by training 100-800m param models thinking my MTP implementation was faulty. Turns out 1B+ is where models prepare +1 token context and MTP starts paying off. Mitigated slightly by sequential MTP modules and scaling loss contrib. during training
English
1
0
2
104
finbarr
finbarr@finbarrtimbers·
there must be many similar results that weren't discovered bc research stopped due to poor results on small models
English
3
1
89
2.4K
finbarr
finbarr@finbarrtimbers·
I'm reading the multi-token prediction paper (which is great, writeup coming soon) and one of the troublesome results from the paper is that their results are much better with larger models
finbarr tweet media
English
15
17
323
42.2K