Konstantin Dobler ✈️ ICLR (@konstantdobler) - Twitter Profili

Sabitlenmiş Tweet

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

A plot of ICLR papers by country is making the rounds, showing no EU + Japan papers and people are drawing all kinds of conclusions. ..but the plot excludes all (!) EU institutions due to a cutoff. China + USA still dominant of course but the full picture looks a bit different.

ℏεsam@Hesamation

someone analyzed all 5000+ accepted papers at ICLR 2026, and it's a good signal who's pushing the research of AI: > China has surpassed the US with 43.7% of the papers > Europe's contribution is surprisingly small (5.3% including UK)

English

7

63

330

64.1K

Konstantin Dobler ✈️ ICLR retweetledi

Tokenization Workshop (TokShop) @COLM2026@tokshop2025·4d

TokShop will be at #COLM2026! 🗓️ October 9th, 2026 📍 San Francisco, USA More details and a call for papers coming soon.

Tokenization Workshop (TokShop) @COLM2026 tweet media

English

0

5

13

1.9K

Konstantin Dobler ✈️ ICLR retweetledi

poolside@poolsideai·11 May

As agents get more clever, so do their attempts at benchmark hacking. Last Monday, we found one of our RL runs jumped ~20% on SWE-Bench-Pro over a weekend, reaching ~64% which would make it #1 on the leaderboard. This was clearly benchmark hacking and we patched the exploit. But this revealed deeper hacks across multiple public benchmarks, some of which were impossible to fix through environment design alone. Evals need to evolve beyond just outcome based pass rates to better observability into how the agent is arriving at them. These were our findings: poolside.ai/blog/through-t… Examples below 👇 1/

English

8

22

105

16K

Konstantin Dobler ✈️ ICLR@konstantdobler·10 May

see my comments here: x.com/konstantdobler… and x.com/konstantdobler… original chart gives one "credit" for each institution on a paper and then sums, e.g. if you have 20 different institutions from country A on a single paper it counts the same as 20 different papers from country A. Instead this uses fractional counting, one credit per paper is spread proportionally among institutions of authors. If we see different proportions between the two, in this case for China, it could perhaps be because there is higher cross-institution collaboration in China or trend to papers with more collaborators -- this is speculative Also I added missing region metadata of a bunch of institutions that was missing in the raw data but easy to infer.

Konstantin Dobler ✈️ ICLR@konstantdobler

@Hesamation Version using the same unique counting scheme (i.e. each institution is counted once per paper even if there's 50 institutions on a single paper). China's share grows. Maybe more cross-institution collaboration or larger projects involving many labs + industry?

English

1

0

4

1.2K

冷寒落@LeZhengX·10 May

@konstantdobler 你能解释一下为什么中国的份额从43.7%下降到36.8%了吗？并且美国份额却没有变化，这个图很明显是错的

中文

1

0

2

1K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

A plot of ICLR papers by country is making the rounds, showing no EU + Japan papers and people are drawing all kinds of conclusions. ..but the plot excludes all (!) EU institutions due to a cutoff. China + USA still dominant of course but the full picture looks a bit different.

ℏεsam@Hesamation

someone analyzed all 5000+ accepted papers at ICLR 2026, and it's a good signal who's pushing the research of AI: > China has surpassed the US with 43.7% of the papers > Europe's contribution is surprisingly small (5.3% including UK)

English

7

63

330

64.1K

Konstantin Dobler ✈️ ICLR@konstantdobler·10 May

Yes, other chart keeps only top 50 institutions by paper count and renormalizes percentages afterwards looking at only top50 institutions does make sense actually (e.g. Europe's comparative lack of big institutions or companies) but not if you want to make claims about overall research output

English

1

0

5

1.3K

Wajdi 📈@wajdi_bs·9 May

thanks for the work to update the chart. by cutoff, you mean the "below 50" threshold right? (top50 = full_df.head(50).copy())

English

1

0

1.6K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

This based on scraped data by @dlopushanskyy (github.com/DmytroLopushan…) w/ additional region metadata I filled in using websearch-enabled Codex. The scraped data is not perfect but should be pretty good on an aggregate level.

English

0

1

12

2.7K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

@Hesamation used the data by @dlopushanskyy from github.com/DmytroLopushan… + filled in a large block of missing country / region metadata for institutions with web-search enabled Codex.

English

0

2

5

3.4K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

@Hesamation Better version without arbitrary institution cutoff, some data cleaning and splitting contribution of each paper among institutions. China + USA dominant ofc, but looks a bit different, doesn't it?

English

6

62

309

93.4K

ℏεsam@Hesamation·8 May

someone analyzed all 5000+ accepted papers at ICLR 2026, and it's a good signal who's pushing the research of AI: > China has surpassed the US with 43.7% of the papers > Europe's contribution is surprisingly small (5.3% including UK)

English

76

376

1.9K

907.5K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

@Hesamation Version using the same unique counting scheme (i.e. each institution is counted once per paper even if there's 50 institutions on a single paper). China's share grows. Maybe more cross-institution collaboration or larger projects involving many labs + industry?

English

1

6

28

6.8K

Konstantin Dobler ✈️ ICLR@konstantdobler·9 May

@Hesamation The data is distorted, seems like you filtered out all institutions below 50 accepted papers? Many European countries are less centralized, so smaller paper counts from individual institutions

English

0

1

41

4.2K

Konstantin Dobler ✈️ ICLR@konstantdobler·23 Nis

I’ll be presenting our work on Token Distillation for tokenizer transfer tomorrow at #ICLR! 🇧🇷 Com by our poster on Friday 3:15pm in Pavilion 3, Poster 113 Also come chat with me anytime about tokenizer transfer, embeddings, multilingual models or efficient architectures!

Konstantin Dobler ✈️ ICLR@konstantdobler

Add tokens to an LLM without retraining the whole model. We introduce Token Distillation: attention-aware input embeddings for new tokens that match the model’s original behavior. How does it work? Check out the thread!

English

0

6

29

2.5K

Konstantin Dobler ✈️ ICLR retweetledi

Charlie O'Neill@oneill_c·1 Nis

x.com/i/article/2039…

ZXX

11

31

213

77.4K

Konstantin Dobler ✈️ ICLR@konstantdobler·27 Mar

@A_K_Nain I‘m counting rephrasing-style synthetic data as augmentation here

English

1

0

1

186

Aakash Kumar Nain@A_K_Nain·27 Mar

@konstantdobler I would say even today it is very weak

English

1

0

2

589

Aakash Kumar Nain@A_K_Nain·27 Mar

Coming from the CV background, one pass over the data without heavy augmentation for pretraining was the most surprising thing to me.

Samip@industriaalist

here's @JeffDean talking about how labs will do multi-epoch pretraining with heavy regularization to keep scaling even with limited data. no wonder slowrun gets so much attention from pretraining teams at big labs. pretraining is about to look very very different.

English

7

158

22K

Konstantin Dobler ✈️ ICLR retweetledi

İlker Kesen@ilker_kesen·17 Mar

📢I'm organizing a BoF session at #EACL2026 called Tokenization & Beyond, aiming to gather researchers exploring tokenization and alternatives such as byte-level and pixel-based approaches. Sign up using the form if you're interested! #NLProc @eaclmeeting

English

1

16

48

4K

Konstantin Dobler ✈️ ICLR@konstantdobler·22 Mar

mAceReason-Math: - github.com/apple/ml-macer… - arxiv.org/abs/2603.10767

English

0

1

96

Konstantin Dobler ✈️ ICLR@konstantdobler·22 Mar

This is joint work with Simon Lehnerer, @FScozzafava, Jonathan Janke, and Mohamed Ali. Multilingual Reasoning Gym: - github.com/apple/ml-multi… - arxiv.org/abs/2603.10793

English

1

3

162

Konstantin Dobler ✈️ ICLR@konstantdobler·22 Mar

Releasing two multilingual reasoning resources from my internship at Apple: (1) 𝗠𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗚𝘆𝗺: reasoning puzzle generation across 94 tasks in 14 languages (2) 𝗺𝗔𝗰𝗲𝗥𝗲𝗮𝘀𝗼𝗻-𝗠𝗮𝘁𝗵: 140k translations of challenging math problems

English

1

7

23

1.7K

Konstantin Dobler ✈️ ICLR

Keşfet