Maxim Bobrin

14 posts

Maxim Bobrin

@maxsbob21

Interests: Optimal Transport, RL, Generative Models BSc & MSc in Mathematics;

Katılım Nisan 2025

120 Takip Edilen38 Takipçiler

Maxim Bobrin@maxsbob21·6d

@heyrimsha can we pla stop already? There are a lot of services for those kind of things… Research community needs to heal asap

English

307

Rimsha Bhardwaj@heyrimsha·27 Mar

🚨 BREAKING: Someone built an AI that replaces 6 months of literature review with a single workspace. It's called Moara. You dump your research papers in. The AI organizes, analyzes, and synthesizes everything across hundreds of millions of academic sources. You come back and there's a structured literature review waiting. Not a chatbot answer. Not a vague summary. The actual research synthesis. A full research workspace that thinks like a scientist. Works for medicine, engineering, social sciences, law, business, humanities — every discipline. Here's what it does on its own: → Searches hundreds of millions of papers across PubMed, arXiv, Google Scholar, Semantic Scholar, ClinicalTrials .gov, Cochrane, and more in one shot → Screens and tags results automatically, pulling out methodologies, findings, and contributions from every paper → Lets you chat with your entire library at once, find themes, spot gaps, and compare findings across dozens of papers simultaneously → Integrates with Zotero, LibKey, RIS, and BibTeX so your existing research workflow doesn't break → Traces every AI insight back to its original source so nothing is hallucinated and everything is verifiable → Supports full team collaboration so entire research groups can screen and synthesize together Try it here: moara.io/?utm_source=rb

English

107

529

48.4K

Maxim Bobrin@maxsbob21·27 Mar

@daniel_mac8 they tested this on public games. Private is much harder

English

Dan McAteer@daniel_mac8·27 Mar

arc-agi-3 will be solved by Saturday.

Agentica@agenticasdk

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

6.6K

Maxim Bobrin@maxsbob21·27 Mar

@agenticasdk gg

Agentica@agenticasdk·27 Mar

We scored 36.08% on ARC-AGI-3 in one day using the Agentica SDK.

English

130

1.4K

411.9K

Maxim Bobrin@maxsbob21·26 Mar

@KeyTryer You can check several replays and see that gemini is the only model which tries to understand what to do through reasoning traces, while other models just return action

English

279

Key 🗝 🦊@KeyTryer·25 Mar

How is Gemini 3.1 simultaneously the best and worst model?

Lisan al Gaib@scaling01

ARC-AGI-3 scores for GPT-5.4, Gemini 3.1 Pro and Opus 4.6 Gemini 3.1 Pro: 0.37% GPT-5.4: 0.26% Opus 4.6: 0.25% Grok 4.2: 0%

English

755

79.5K

Maxim Bobrin@maxsbob21·25 Mar

@loveofdoing How you came up with this approach? somekind of blogpost would be beneficial

English

578

loveofdoing@loveofdoing·25 Mar

316 ARC-AGI tasks solved with zero learning. No neural net, no training, no DSL — just 19th-century projective geometry. Encode grid cell relationships as Plücker lines in P³, find transversals via Schubert calculus, score candidates by geometric incidence. 95% solve rate on the eval set (of non-timeout tasks). Single C file, runs in seconds.

Beff (e/acc)@beffjezos

The masculine urge to try to hack a new solution to ARC-AGI benchmarks

English

961

188.3K

Maxim Bobrin@maxsbob21·23 Mar

@ChenTessler Seems like there is no method that can do this even in simulation without spending hours on RL only for this single level

English

Chen Tessler@ChenTessler·23 Mar

Locomotion isn't solved until our humanoids can do this

Indie Game Joe@IndieGameJoe

This indie dev is making a third-person parkour runner game. If we can’t have Mirror’s Edge 2, then indies will. - Play as Kaia - Use parkour abilities to traverse the world - Move in any way you see fit It's called Tachyon Flow. Would you play this?

English

8.4K

Maxim Bobrin@maxsbob21·20 Mar

@pengzhangzhi1 @Anru_Zhang @AlexanderTong7 Where to find recordings?

English

Fred Zhangzhi Peng@pengzhangzhi1·19 Mar

We recently taught a short course at the ENAR 2026 Spring Meeting on generative models for protein, cell, and biomedical data. We’re excited to share the course materials here for anyone interested: pengzhangzhi.github.io/ENAR26-Course-… with @Anru_Zhang, @AlexanderTong7

English

121

11.4K

Maxim Bobrin@maxsbob21·19 Mar

@itsolelehmann what is 56%? How it is measured? Agent can basically just find some adverserial solution, get improvement on metric that he chose by himself and you will never notice. dafuq?

English

132

Ole Lehmann@itsolelehmann·17 Mar

x.com/i/article/2033…

ZXX

517

5.4K

2.5M

Maxim Bobrin@maxsbob21·17 Mar

@mikeknoop But what about NetHack challenge? It measures the same skills ARC-AGI-3 aims to check

English

Mike Knoop@mikeknoop·17 Mar

I am very excited about ARC-AGI-3. It's shaping up to be the best (and only, as far as I know) unsaturated general AI agent benchmark. The team has done great work. Launch next week.

English

151

12.3K

Maxim Bobrin@maxsbob21·11 Mar

@deliprao @karpathy As far as i understand, the program.md needs to be different for other domains (e.g for RL/robotics)

English

247

Delip Rao e/σ@deliprao·11 Mar

Annotated Autoresearch: The heart of @karpathy's autoresearch is a single elegantly constructed prompt file -- program.md. I put together this "mini app" as a learning/teaching opportunity. Perhaps AGI is already here, just not evenly prompted. delip.github.io/mini-apps/anno…

English

303

21.6K

Maxim Bobrin retweetledi

Arip@machinestein·3 Mar

Zero-Shot Off-Policy Learning Behavioral foundation models are pretrained on large, reward-free transition datasets. At deployment time, they can be "prompted" to infer a policy for a new reward in a zero-shot manner, without any fine-tuning. This falls under offline or off-policy RL: once the inferred policy is executed, its state-action visitation may diverge from the dataset, leading to distribution shift, value overestimation, and other typical off-policy issues. The missing ingredient is a principled off-policy correction—specifically, stationary occupancy (density-ratio) correction. In this paper, we show that by using Forward–Backward successor representations, this density-ratio correction can also be performed in a zero-shot manner! Paper: alphaxiv.org/abs/2602.01962 Code: github.com/machinestein/Z…

English

183

10.8K

Maxim Bobrin@maxsbob21·28 Ara

@ChenTessler Is this a sample video replicating some motion from training set of AMASS? If so, how it was prompted as inference time? Or agent was trained with this particular option only?

English

248

Chen Tessler@ChenTessler·27 Ara

Crazy we can learn to track the entire AMASS dataset in 24h. For MaskedMimic, 1.5 years ago, it took us 2 weeks to train the tracker and another MONTH(!!!) to train the generative model. Now it takes 24h for the tracker and another 24h for the generative model 🤯

Chen Tessler@ChenTessler

20 hours -- 99.93% PPO is all you need. 4 GPUs with 8k envs each. (Slightly better parameters than the current default in ProtoMotions, will update after verifying results are stable)

English

290

29.2K

Maxim Bobrin retweetledi

Arip@machinestein·2 Ara

While we are going back to the era of research… Introducing 𝗗𝗲𝗲𝗽 𝗜𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁 𝗦𝘂𝗽𝗲𝗿𝘃𝗶𝘀𝗶𝗼𝗻 (𝗗𝗜𝗦) – a new learning method for recursive reasoning. DIS builds on the elegant Tiny Recursive Model (TRM)(@jm_alexia) but makes recursion radically simpler: - 𝟏𝟖× 𝗳𝗲𝘄𝗲𝗿 𝗳𝗼𝗿𝘄𝗮𝗿𝗱 𝗽𝗮𝘀𝘀𝗲𝘀 - 𝗡𝗼 𝗵𝗮𝗹𝘁𝗶𝗻𝗴 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺 - And a tiny 0.8M-parameter model reaching 24% accuracy on ARC-AGI-1 (@arcprize) Paper: arxiv.org/pdf/2511.16886 Code: github.com/machinestein/D…

English

5.2K

Maxim Bobrin retweetledi

Ilya Zisman@suessmannn·23 May

🔥 Zero-shot generalization is the dream: adapt instantly, no fine-tuning. It's why LLMs blew up—but it's not just a language modeling thing. It’s happening in RL too. 🚨 @maxsbob21's new paper dives deep into zero-shot RL under shifting dynamics—and why current methods break.

English

143

14.3K

Keşfet

@heyrimsha @daniel_mac8 @agenticasdk @KeyTryer @loveofdoing @ChenTessler @pengzhangzhi1 @Anru_Zhang