Alex Dimakis

4.5K posts

Alex Dimakis

@AlexGDimakis

Professor, UC berkeley | Founder @bespokelabsai |

Berkeley, CA Katılım Nisan 2009

2.6K Takip Edilen23.1K Takipçiler

Alex Dimakis@AlexGDimakis·4d

lol

Kosta Derpanis@CSProfKGD

#ICML2026 decision day (Apr 30 AoE). Good luck!🤞

QST

7.5K

Bespoke Labs@bespokelabsai·5d

We are excited to welcome Avinash Arjavalingam as a Member of Technical Staff at Bespoke Labs. In his previous role, Avinash was a Software Engineer at LinkedIn working on their Relational Databases team. He also holds a Masters and Bachelors degree in computer science from UC Berkeley, specializing in databases and distributed systems.

English

3.9K

Alex Dimakis@AlexGDimakis·5d

@bespokelabsai Welcome Avi!

English

490

Alex Dimakis@AlexGDimakis·6d

Sorry for mentioning papers I’m involved in, but the Datacomp projects focused on making data curation a first class citizen. Each one took about one year: 1. Datacomp for multimodal clip data, 2. Datacomp for language models (DCLM) for pretraining data curation 3. Openthoughts (Datacomp for reasoning post-training) and 4. Openthoughts-agent (ongoing) for terminal-bench Rl environments. datacomp.ai

English

1.1K

(((ل()(ل() 'yoav))))👾@yoavgo·27 Nis

The big dilemma with teaching an "LLM course" is that it is really easy to get drawn into teaching the various technical things like efficiency tricks, attention variants, PPO vs GRPO, etc etc. But the real "meat" is not there, but in the data: data for pre-training, for mid-training, for SFT, for RL and for "reasoning", synthetic data, curated data, annotated data... cleaning, evaluating, improving, mixing, ... lots of stuff. but "data" is so much harder to teach: it is not "mathematic" or "algorithmic" like the technical things, and it is not clear what is the teachable thing there. it is also a lot less transparent than the technical topics, both because it is semi-secret, and also because it is also not appealing for publishing, for roughly the same reasons it is not appealing for teaching. so, what would you teach about data? what are the key lessons and insights one should know? any good papers or resources? good existing classes? blogs? hit me with what you have

English

830

57K

Alex Dimakis@AlexGDimakis·26 Nis

@madiator Imagine if the train loss went up

GIF

English

832

Mahesh Sathiamoorthy@madiator·25 Nis

My test loss went down by 600%!

English

2.2K

Alex Dimakis@AlexGDimakis·26 Nis

Welcome Rohan to Bespoke!

Bespoke Labs@bespokelabsai

We are excited to welcome Rohan Rao as Head of Data Operations in Bespoke Labs! Rohan joins us after spending 7 years in Grammarly as General Manager and Chief of Staff, where he supported transformations and helped support 7x growth in team size and revenue.

English

3.3K

Alex Dimakis retweetledi

Lakshya A Agrawal@LakshyAAAgrawal·23 Nis

Thrilled to present GEPA as an Oral Talk and Poster at ICLR 2026 this Friday in Rio! 🇧🇷 Apr 24 Oral Session 3A (Agents), 10:30 AM BRT, Amphitheater Poster Session 4, 3:15 PM, Pavilion 3 x.com/LakshyAAAgrawa… Let's recap what's happened since we released GEPA last year 🧵

Lakshya A Agrawal@LakshyAAAgrawal

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

English

221

56.7K

Alex Dimakis@AlexGDimakis·19 Nis

@nhaghtal @SimonsInstitute @dhadfieldmenell @Diyi_Yang It was great fun, thanks for inviting me.

English

Nika Haghtalab@nhaghtal·16 Nis

Join us for today's panel at the @SimonsInstitute with panelists Tania Bedrax-Weiss, Alex Dimakis (@AlexGDimakis ), Dylan Hadfield-Menell (@dhadfieldmenell), and Diyi Yang (@Diyi_Yang).

English

6.6K

Alex Dimakis@AlexGDimakis·16 Nis

Yup, David Mackay wrote a wonderful book I still enjoy picking up and reading randomly.

Sara Hooker@sarahookr

A really excellent book. A few people independently told me this was one of their favorite books over a decade ago. I bought it, and it became one of the textbooks on my shelf I revisit from time to time to spark the joy of holding ideas to a different light. It brings to life the elegance of information theory. A good day to recognize 10 years from the passing of David McKay.

English

3.2K

Alex Dimakis retweetledi

Sara Hooker@sarahookr·15 Nis

English

147

1.4K

65K

Alex Dimakis@AlexGDimakis·14 Nis

@RahelJhirad caisconf.org This is the new conference CAIS.

English

225

Rahel Jhirad@RahelJhirad·14 Nis

@AlexGDimakis Where is the registration ? Is it free ? Cost ?

English

211

Alex Dimakis@AlexGDimakis·14 Nis

Check our new cool workshop for Agents, Discovery and Optimization: CAIS AI Agents for Discovery in the Wild. We have a pretty good speaker lineup. Submit your papers by: May 1st. (1/2)

English

21K

Alex Dimakis@AlexGDimakis·14 Nis

(2/2) Call for papers, program etc here: ai-discovery-in-the-wild.github.io

English

1.4K

Alex Dimakis@AlexGDimakis·11 Nis

@erichorvitz You’re welcome Eric. I’m very happy that Microsoft continues to open top research for the world and the scientific community.

English

675

Eric Horvitz@erichorvitz·10 Nis

Thanks @AlexGDimakis and colleagues for your efforts to create and share OpenThoughts with the community. An valuable & enabling resource.

Alex Dimakis@AlexGDimakis

Very excited that Microsoft is using our dataset OpenThoughts and summarizes it in this super-clever way to make reasoning much more efficient. Most people do not understand how verbose reasoning models can get: They often produce 30k tokens to answer one math question (that is 1/3 of a Harry Potter novel). In earlier research, we tried summarizing the reasoning traces and SFTing on that, but it killed reasoning performance. Microsoft did many clever tricks to break the reasoning traces in pieces (with dynamic programming!) and summarized them separately in self-contained nuggets they called mementos . They release these compactified reasoning traces in a new dataset called OpenMementos that is 6 times more compact on average. Very cool work on efficient reasoning.

English

2.8K

Alex Dimakis@AlexGDimakis·11 Nis

@maestroalvarez @claudeai It transfers very well actually , we study this in the paper

English

Danny Wallace@maestroalvarez·11 Nis

@AlexGDimakis @claudeai Wait so the advisor can be an open-weights model you fine-tune yourself? That completely changes the economics for solo builders. You get the quality boost without paying for Opus on every call. How much does that personalized advice transfer across task types?

English

Claude@claudeai·9 Nis

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

2.8K

38.5K

4.7M

Alex Dimakis@AlexGDimakis·10 Nis

Great post and thanks for highlighting the connection. There are two design choices: 1. Is the advisor model stronger or weaker than the base model, and 2. when do you get the advice? Does the base model ask for advice (advisor model is a tool) or does the advisor actively inject advice in the context of the base model. For 1: Its clear that a haiku base model can benefit from advice from Opus. Still useful because it saves expensive tokens and its great Anthropic implemented this in production. Our paper studies the more interesting case where the advisor is trained to *personalize* the big model. This can be trained with RL (or SFT) as we do and can allow you to personalize a black-box model like Haiku or Opus. It's a new way of finetuning small models to collaborate with frontier models for personalization or increase engagement etc. For 2: Its unclear if the advice should be requested by the base model or injected by advisor, depends on the use case. Both can make sense in different applications.

English

879

Akshay 🚀@akshay_pachaar·10 Nis

this is one of the most important ideas in AI right now, and it just got two independent validations. yesterday, Anthropic shipped an "advisor tool" in the Claude API that lets Sonnet or Haiku consult Opus mid-task, only when the executor needs help. the benefit is straightforward: you get near Opus-level intelligence on the hard decisions while paying Sonnet or Haiku rates for everything else. frontier reasoning only kicks in when it's actually needed, not on every token. back in February, UC Berkeley published a paper called "Advisor Models" that trains a small 7B model with RL to generate per-instance advice for a frozen black-box model. same idea. two very different implementations. the paper's approach: take Qwen2.5 7B, train it with GRPO to generate natural language advice, and inject that advice into the prompt of a black-box model. the black-box model never changes. the advisor learns what to say to make it perform better. GPT-5 scores 31.2% on a tax-filing benchmark. add the trained advisor, it jumps to 53.6%. on SWE agent tasks, a trained advisor cuts Gemini 3 Pro's steps from 31.7 to 26.3 while keeping the same resolve rate. training is cheap too. you train with GPT-4o Mini, then swap in GPT-5 at inference. the advisor even transfers across families: a GPT-trained advisor improves Claude 4.5 Sonnet. Anthropic's advisor tool takes a different path to the same idea. Sonnet runs as executor, handles tools and iteration. when it hits something it can't resolve, it consults Opus, gets a plan or correction, and continues. Sonnet with Opus as advisor gained 2.7 points on SWE-bench Multilingual over Sonnet alone, while costing 11.9% less per task. Haiku with Opus scored 41.2% on BrowseComp, more than double its solo 19.7%. it's a one-line API change. advisor tokens bill at Opus rates, and the advisor typically generates only 400-700 tokens per call. blended cost stays well below running Opus end-to-end. both approaches point at the same thing: you don't need the most powerful model on every token. you need it at the right moments, for the right inputs. Paper: arxiv.org/abs/2510.02453 Code: github.com/az1326/advisor…

Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

422

44.4K

Alex Dimakis@AlexGDimakis·10 Nis

Eric Horvitz@erichorvitz

A core dimension of intelligence is learning how to optimize learning and thinking under constraints of architecture, compute, and data resources. There are numerous challenges to solve in the pursuit of such “bounded optimality.” One question and opportunity is: “What should be remembered and recalled?” We’ve just published our paper on one piece of the memory challenge—on the effective compression of test-time reflection to reduce the size of context while keeping an eye on the coherence of the string of contextual memory. Read more here about our Memento project. Enjoyed the collaboration! @MSFTResearch @vkontonis @DimitrisPapail

English

110

15.5K

Alex Dimakis@AlexGDimakis·10 Nis

The advisor injects advice into the context , prompting the model. The model can ask for this advice or the advice can be injected periodically. The biggest difference is that we train the advisor to produce personalised advice, whereas Anthropic calls a stronger model as advisor. (Opus advises Haiku). This is a natural way to save tokens from the bigger model which is great, but our trained advisors can further personalize and be trained with RL.

English

Emre Coklar@EmreCoklar·10 Nis

@AlexGDimakis @claudeai Hi Alex, if I remember correctly. I thought that the paper's proposed approach was to have the Advisor Model sit between the user input and the working model. Anthropic's approach is to have the Worker call the Advisor. I find that completely different, what am I missing?

English

2.3K

Alex Dimakis@AlexGDimakis·10 Nis

The production implementation is significantly simpler: It says that a haiku model can benefit from having a strong advisor (opus). Our main finding in the paper is that you can further make the advisor to be an open-weights model (e.g. a qwen) and train it to give personalized advice.

English

978

Danny Wallace@maestroalvarez·10 Nis

@AlexGDimakis @claudeai Fair ask. Research paves the road; platform teams pave it over and call it a feature. At least the naming stuck. Do you see the production implementation diverging from what the paper laid out, or does it track pretty close?

English

924

Alex Dimakis@AlexGDimakis·10 Nis

@koushik77 @pgasawa yup saw it- cool work. We know people can train to the test set. The defense is that when Terminal Bench 3 comes out, those who overfit will be clearly shown.

English

Parth Asawa@pgasawa·9 Nis

Me when all my lab mates are congratulating me on Anthropic deploying our work but actually they didn't cite us...

Claude@claudeai

We're bringing the advisor strategy to the Claude Platform. Pair Opus as an advisor with Sonnet or Haiku as an executor, and get near Opus-level intelligence in your agents at a fraction of the cost.

English

139

1.9K

142.8K

Keşfet

@bespokelabsai @madiator @nhaghtal @SimonsInstitute @dhadfieldmenell @Diyi_Yang @RahelJhirad @erichorvitz