Alex Dimakis

4.5K posts

Alex Dimakis

Alex Dimakis

@AlexGDimakis

Professor, UC berkeley | Founder @bespokelabsai |

Berkeley, CA Katılım Nisan 2009
2.6K Takip Edilen23.1K Takipçiler
Bespoke Labs
Bespoke Labs@bespokelabsai·
We are excited to welcome Avinash Arjavalingam as a Member of Technical Staff at Bespoke Labs. In his previous role, Avinash was a Software Engineer at LinkedIn working on their Relational Databases team. He also holds a Masters and Bachelors degree in computer science from UC Berkeley, specializing in databases and distributed systems.
Bespoke Labs tweet media
English
2
2
25
3.9K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
Sorry for mentioning papers I’m involved in, but the Datacomp projects focused on making data curation a first class citizen. Each one took about one year: 1. Datacomp for multimodal clip data, 2. Datacomp for language models (DCLM) for pretraining data curation 3. Openthoughts (Datacomp for reasoning post-training) and 4. Openthoughts-agent (ongoing) for terminal-bench Rl environments. datacomp.ai
English
1
0
20
1.1K
(((ل()(ل() 'yoav))))👾
The big dilemma with teaching an "LLM course" is that it is really easy to get drawn into teaching the various technical things like efficiency tricks, attention variants, PPO vs GRPO, etc etc. But the real "meat" is not there, but in the data: data for pre-training, for mid-training, for SFT, for RL and for "reasoning", synthetic data, curated data, annotated data... cleaning, evaluating, improving, mixing, ... lots of stuff. but "data" is so much harder to teach: it is not "mathematic" or "algorithmic" like the technical things, and it is not clear what is the teachable thing there. it is also a lot less transparent than the technical topics, both because it is semi-secret, and also because it is also not appealing for publishing, for roughly the same reasons it is not appealing for teaching. so, what would you teach about data? what are the key lessons and insights one should know? any good papers or resources? good existing classes? blogs? hit me with what you have
English
54
56
830
57K
Alex Dimakis retweetledi
Lakshya A Agrawal
Lakshya A Agrawal@LakshyAAAgrawal·
Thrilled to present GEPA as an Oral Talk and Poster at ICLR 2026 this Friday in Rio! 🇧🇷 Apr 24 Oral Session 3A (Agents), 10:30 AM BRT, Amphitheater Poster Session 4, 3:15 PM, Pavilion 3 x.com/LakshyAAAgrawa… Let's recap what's happened since we released GEPA last year 🧵
Lakshya A Agrawal tweet media
Lakshya A Agrawal@LakshyAAAgrawal

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

English
11
38
221
56.7K
Alex Dimakis retweetledi
Sara Hooker
Sara Hooker@sarahookr·
A really excellent book. A few people independently told me this was one of their favorite books over a decade ago. I bought it, and it became one of the textbooks on my shelf I revisit from time to time to spark the joy of holding ideas to a different light. It brings to life the elegance of information theory. A good day to recognize 10 years from the passing of David McKay.
Sara Hooker tweet media
English
22
147
1.4K
65K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
Check our new cool workshop for Agents, Discovery and Optimization: CAIS AI Agents for Discovery in the Wild. We have a pretty good speaker lineup. Submit your papers by: May 1st. (1/2)
Alex Dimakis tweet media
English
2
19
80
21K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
@erichorvitz You’re welcome Eric. I’m very happy that Microsoft continues to open top research for the world and the scientific community.
English
0
0
5
675
Danny Wallace
Danny Wallace@maestroalvarez·
@AlexGDimakis @claudeai Wait so the advisor can be an open-weights model you fine-tune yourself? That completely changes the economics for solo builders. You get the quality boost without paying for Opus on every call. How much does that personalized advice transfer across task types?
English
1
0
1
30
Claude
Claude@claudeai·
We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.
Claude tweet media
English
1K
2.8K
38.5K
4.7M
Alex Dimakis
Alex Dimakis@AlexGDimakis·
Great post and thanks for highlighting the connection. There are two design choices: 1. Is the advisor model stronger or weaker than the base model, and 2. when do you get the advice? Does the base model ask for advice (advisor model is a tool) or does the advisor actively inject advice in the context of the base model. For 1: Its clear that a haiku base model can benefit from advice from Opus. Still useful because it saves expensive tokens and its great Anthropic implemented this in production. Our paper studies the more interesting case where the advisor is trained to *personalize* the big model. This can be trained with RL (or SFT) as we do and can allow you to personalize a black-box model like Haiku or Opus. It's a new way of finetuning small models to collaborate with frontier models for personalization or increase engagement etc. For 2: Its unclear if the advice should be requested by the base model or injected by advisor, depends on the use case. Both can make sense in different applications.
English
0
0
6
879
Akshay 🚀
Akshay 🚀@akshay_pachaar·
this is one of the most important ideas in AI right now, and it just got two independent validations. yesterday, Anthropic shipped an "advisor tool" in the Claude API that lets Sonnet or Haiku consult Opus mid-task, only when the executor needs help. the benefit is straightforward: you get near Opus-level intelligence on the hard decisions while paying Sonnet or Haiku rates for everything else. frontier reasoning only kicks in when it's actually needed, not on every token. back in February, UC Berkeley published a paper called "Advisor Models" that trains a small 7B model with RL to generate per-instance advice for a frozen black-box model. same idea. two very different implementations. the paper's approach: take Qwen2.5 7B, train it with GRPO to generate natural language advice, and inject that advice into the prompt of a black-box model. the black-box model never changes. the advisor learns what to say to make it perform better. GPT-5 scores 31.2% on a tax-filing benchmark. add the trained advisor, it jumps to 53.6%. on SWE agent tasks, a trained advisor cuts Gemini 3 Pro's steps from 31.7 to 26.3 while keeping the same resolve rate. training is cheap too. you train with GPT-4o Mini, then swap in GPT-5 at inference. the advisor even transfers across families: a GPT-trained advisor improves Claude 4.5 Sonnet. Anthropic's advisor tool takes a different path to the same idea. Sonnet runs as executor, handles tools and iteration. when it hits something it can't resolve, it consults Opus, gets a plan or correction, and continues. Sonnet with Opus as advisor gained 2.7 points on SWE-bench Multilingual over Sonnet alone, while costing 11.9% less per task. Haiku with Opus scored 41.2% on BrowseComp, more than double its solo 19.7%. it's a one-line API change. advisor tokens bill at Opus rates, and the advisor typically generates only 400-700 tokens per call. blended cost stays well below running Opus end-to-end. both approaches point at the same thing: you don't need the most powerful model on every token. you need it at the right moments, for the right inputs. Paper: arxiv.org/abs/2510.02453 Code: github.com/az1326/advisor…
Akshay 🚀 tweet media
Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English
24
86
422
44.4K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
Very excited that Microsoft is using our dataset OpenThoughts and summarizes it in this super-clever way to make reasoning much more efficient. Most people do not understand how verbose reasoning models can get: They often produce 30k tokens to answer one math question (that is 1/3 of a Harry Potter novel). In earlier research, we tried summarizing the reasoning traces and SFTing on that, but it killed reasoning performance. Microsoft did many clever tricks to break the reasoning traces in pieces (with dynamic programming!) and summarized them separately in self-contained nuggets they called mementos . They release these compactified reasoning traces in a new dataset called OpenMementos that is 6 times more compact on average. Very cool work on efficient reasoning.
Eric Horvitz@erichorvitz

A core dimension of intelligence is learning how to optimize learning and thinking under constraints of architecture, compute, and data resources. There are numerous challenges to solve in the pursuit of such “bounded optimality.” One question and opportunity is: “What should be remembered and recalled?” We’ve just published our paper on one piece of the memory challenge—on the effective compression of test-time reflection to reduce the size of context while keeping an eye on the coherence of the string of contextual memory. Read more here about our Memento project. Enjoyed the collaboration! @MSFTResearch @vkontonis @DimitrisPapail

English
1
11
110
15.5K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
The advisor injects advice into the context , prompting the model. The model can ask for this advice or the advice can be injected periodically. The biggest difference is that we train the advisor to produce personalised advice, whereas Anthropic calls a stronger model as advisor. (Opus advises Haiku). This is a natural way to save tokens from the bigger model which is great, but our trained advisors can further personalize and be trained with RL.
English
1
1
3
1K
Emre Coklar
Emre Coklar@EmreCoklar·
@AlexGDimakis @claudeai Hi Alex, if I remember correctly. I thought that the paper's proposed approach was to have the Advisor Model sit between the user input and the working model. Anthropic's approach is to have the Worker call the Advisor. I find that completely different, what am I missing?
English
2
0
3
2.3K
Alex Dimakis
Alex Dimakis@AlexGDimakis·
The production implementation is significantly simpler: It says that a haiku model can benefit from having a strong advisor (opus). Our main finding in the paper is that you can further make the advisor to be an open-weights model (e.g. a qwen) and train it to give personalized advice.
English
5
1
5
978
Danny Wallace
Danny Wallace@maestroalvarez·
@AlexGDimakis @claudeai Fair ask. Research paves the road; platform teams pave it over and call it a feature. At least the naming stuck. Do you see the production implementation diverging from what the paper laid out, or does it track pretty close?
English
1
0
2
924
Alex Dimakis
Alex Dimakis@AlexGDimakis·
@koushik77 @pgasawa yup saw it- cool work. We know people can train to the test set. The defense is that when Terminal Bench 3 comes out, those who overfit will be clearly shown.
English
0
0
0
91