Petr Baudis

7.2K posts

Petr Baudis banner
Petr Baudis

Petr Baudis

@xpasky

CTO @RossumAi, AlphaGo baseline pachi, git, elinks & other oss... "The world is awful. The world is much better. The world can be much better."

Prague Katılım Ağustos 2008
1.7K Takip Edilen4.2K Takipçiler
Sabitlenmiş Tweet
Petr Baudis
Petr Baudis@xpasky·
> be @RossumAi > take all the AI advances we are hyped about here > gloriously plug them in a ✨B2B SaaS✨ > run AI agents on many Mpages/week > automate a common business process (transactional paperwork) for 100s enterprises > fix a menial clerical job that people hated to do
English
3
2
26
11.7K
Petr Baudis retweetledi
Petr Baudis
Petr Baudis@xpasky·
In NYC next week. Anything I shouldn't miss?
English
1
1
0
261
Petr Baudis retweetledi
Nick
Nick@nickcammarata·
i think this meme is hilarious. my take on all this: the point of introspection is to end up thinking less, not more, to be more in the flow, more productive, to dissolve into being itself. if your introspection is making you think more i recommend getting another one
Pablo A. Penietzsche@PabloPeniche

English
14
12
334
20.5K
Lisan al Gaib
Lisan al Gaib@scaling01·
that looks pretty fucking good
Lisan al Gaib tweet media
English
16
7
313
45K
Petr Baudis retweetledi
Thomas Wolf
Thomas Wolf@Thom_Wolf·
This is really cool. It got me thinking more deeply about personalized RL: what’s the real point of personalizing a model in a world where base models can become obsolete so quickly? The reality in AI is that new models ship every few weeks, each better than the last. And the pace is only accelerating, as we see on the Hugging Face Hub. We are not far away from better base models dropping daily. There’s a research gap in RL here that almost no one is working on. Most LLM personalization research assumes a fixed base model, but very few ask what happens to that personalization when you swap the base model. Think about going from Llama 3 to Llama 4. All the tuned preferences, reward signals, and LoRAs are suddenly tied to yesterday’s model. As a user or a team, you don’t want to reteach every new model your preferences. But you also don’t want to be stuck on an older one just because it knows you. We could call this "RL model transferability": how can an RL trace, a reward signal, or a preference representation trained on model N be distilled, stored, and automatically reapplied to model N+1 without too much user involvement? We solved that in SFT where a training dataset can be stored and reused to train a future model. We also tackled a version of that in RLHF phases somehow but it remain unclear more generally when using RL deployed in the real world. There are some related threads (RLTR for transferable reasoning traces, P-RLHF and PREMIUM for model-agnostic user representations, HCP for portable preference protocols) but the full loop seems under-studied to me. Some of these questions are about off-policy but other are about capabilities versus personalization: which of the old customizations/fixes does the new model already handle out of the box, and which ones are actually user/team-specific to ever be solved by default? That you would store in a skill for now but that RL allow to extend beyond the written guidance level. I have surely missed some work so please post any good work you’ve seen on this topic in the comments.
Ronak Malde@rronak_

This paper is almost too good that I didn't want to share it Ignore the OpenClaw clickbait, OPD + RL on real agentic tasks with significant results is very exciting, and moves us away from needing verifiable rewards Authors: @YinjieW2024 Xuyang Chen, Xialong Jin, @MengdiWang10 @LingYang_PU

English
28
35
467
70.8K
Taelin
Taelin@VictorTaelin·
To elaborate - I never liked the "Linux way": > ship a half assed software, and let the user modify it Instead, I always bought the Apple way: > pay us and we'll give you the best possible defaults This worked for me, because I wanted to spend my time writing compilers, NOT fixing driver issues. So, when people told me "Pi comes with just the bare essentials and you can add what you want", that definitely did NOT paint a good picture. But it is different. The time to modify is minimal. "Pi, extend yourself so that I can spawn sub agents in a specific way that works Bend2's prelude" One or two prompts later, and it is done. It modifies itself for what I need and I suddenly have a new tool to help me get things done. It just works. You can't do that with Claude Code and Codex. That said, I'm still not sure that'll always work. How the hell do I make my Pi browse the web, for example? Seems like the author doesn't want it, it is definitely important, and there's no easy / satisfactory way to add 🤔
English
18
1
136
9.5K
Petr Baudis retweetledi
Taelin
Taelin@VictorTaelin·
Ok so I thought that was a dumb gimmick but now I'm completely sold on how pi is a self-modifiable software. It literally knows how to modify itself very cleanly and that's extremely useful in practice I'm not using Codex / Claude Code anymore Bend2 should definitely be like this! I mean, constructed in a way that AI's can easily navigate it and know how to modify it to add any feature the user wants. Perhaps we're past the era of open source software and into the era of forkable software, where the most hackable project wins?
English
58
34
794
56.7K
Lisan al Gaib
Lisan al Gaib@scaling01·
@xpasky gonna run lisanbench for GPT-5.4, Mistral Small 4 and M2.7 this weekend
English
1
0
12
679
Petr Baudis retweetledi
OpenRouter
OpenRouter@OpenRouter·
Stealth Model Reveal: Hunter and Healer Alpha are @XiaomiMiMo MiMo-V2-Pro and MiMo-V2-Omni Both models are live now on OpenRouter, and free to use in @OpenClaw via the OpenRouter provider for the next week!
OpenRouter tweet media
English
67
127
1.4K
100.2K
fallpeak
fallpeak@_fallpeak·
@xpasky @scaling01 I like it, feels a lot like M2.5 but with a bit less of that autistic tendency to take you extremely literally and get to work without asking for clarification. Definitely not frontier-level smart, and it degrades above ~80k context, but it's fast and takes direction well.
English
1
0
1
39
François Fleuret
François Fleuret@francoisfleuret·
1. What are the best open source coding / general purpose models? 2. What hardware to run them comfortably? 3. How do they compare to the flagships?
English
11
2
27
9.5K
Petr Baudis
Petr Baudis@xpasky·
@scaling01 doesn't seem to be a good model, at least unusable in its current form x.com/xpasky/status/…
Petr Baudis@xpasky

@MiniMax_AI @OpenRouter It's been a long time since I had a (near-)frontier model stop on me like that w/o calling any tools when it said it's going to do something. It's hard to believe it's posttrained so poorly - looks more like the deployment got botched in some way?

English
0
0
1
191
Petr Baudis
Petr Baudis@xpasky·
@scaling01 what the heck is this response, it's the first time i see randomly interleaved thinking/text blocks (and i called *lots* of models)
Petr Baudis tweet media
English
1
0
0
152
Petr Baudis
Petr Baudis@xpasky·
@MiniMax_AI @OpenRouter It's been a long time since I had a (near-)frontier model stop on me like that w/o calling any tools when it said it's going to do something. It's hard to believe it's posttrained so poorly - looks more like the deployment got botched in some way?
Petr Baudis tweet media
English
0
0
0
276
Petr Baudis
Petr Baudis@xpasky·
@MiniMax_AI Hey guys, looks like a nice release, but this kind of interleaving thinking/text blocks in the middle of a sentence on @OpenRouter seems pretty unreasonable (and is breaking API clients - what even *is* the correct behavior, should text blocks be pasted together w/o separators?)
Petr Baudis tweet media
English
1
0
2
1.5K
MiniMax (official)
MiniMax (official)@MiniMax_AI·
Introducing MiniMax-M2.7, our first model which deeply participated in its own evolution, with an 88% win-rate vs M2.5 - Production-Ready SWE: With SOTA performance in SWE-Pro (56.22%) and Terminal Bench 2 (57.0%), M2.7 reduced intervention-to-recovery time for online incidents to 3-min on certain occasions. - Advanced Agentic Abilities: Trained for Agent Teams and tool search tool, with 97% skill adherence across 40+ complex skills. M2.7 is on par with Sonnet 4.6 in OpenClaw. - Professional Workspace: SOTA in professional knowledge, supports multi-turn, high-fidelity Office file editing. MiniMax Agent: agent.minimax.io API: platform.minimax.io Token Plan: platform.minimax.io/subscribe/toke…
MiniMax (official) tweet media
English
183
393
3.2K
1.6M
Petr Baudis
Petr Baudis@xpasky·
sure, web search using ollama, your local model execution tool * sure, the web search is all wrapping ollama cloud API, oh, we didn't mention that? well, isn't it kinda obvious for an ollama skill mentioned right next to "local models", silly!
ollama@ollama

Ollama 0.18.1 is here! 🌐 Web search and fetch in OpenClaw Ollama now ships with web search and web fetch plugin for OpenClaw. This allows Ollama's models (local or cloud) to search the web for the latest content and news. This also allows OpenClaw with Ollama to be able to fetch the web and extract readable content for processing. This feature does not execute JavaScript. If you have OpenClaw already running: openclaw plugins install @ollama/openclaw-web-search 🤖 Non-interactive (headless) mode for ollama launch ollama launch command can now run in non-interactive mode. This is perfect for: - Docker/containers: spin up an integration as a pipeline step to run evals, test prompts, or validate model behavior as part of your build. Tear it down when the job ends. - CI/CD: Generate code reviews, security checks, and other tasks within your CI - Scripts/automation: Kick off automated tasks with Ollama and claude code Try with: ollama launch claude --model kimi-k2.5:cloud --yes -- -p "how does this repository work?"

English
0
0
2
339