Gregor Mitscha-Baude

755 posts

Gregor Mitscha-Baude

@mitschabaude

Co-founder @zksecurityXYZ Math & crypto. Agentic coding addict. TypeScript magician. Lean4 enthusiast.

Wien Katılım Şubat 2021

1.4K Takip Edilen1.4K Takipçiler

Gregor Mitscha-Baude@mitschabaude·6 Mar

@github JS and TS are the same language though, you need to combine them

English

GitHub@github·4 Mar

It's true: TypeScript surpassed Python and JavaScript to become the most-used language on GitHub. 📈

English

249

612

5.6K

522K

Gregor Mitscha-Baude@mitschabaude·5 Mar

@natolambert SI-adjacent benchmark where Claude leads: x.com/i/status/20284…

Lisan al Gaib@scaling01

He's back with an improved "BullshitBench V2" Anthropic models are still dominating everything

English

Gregor Mitscha-Baude@mitschabaude·5 Mar

@natolambert I think it can be framed as social intelligence, and IMO there should be more benchmarks for that. Claude probably has a huge lead

English

Nathan Lambert@natolambert·4 Mar

We need to transition the conversation from Claude being the first company to go all in on code to how they clearly were way ahead on general agent behavior. Could be a bigger deal, as I suspect all the labs will “solve” coding. Not sure what the agent secret sauce is.

English

269

18.1K

Gregor Mitscha-Baude@mitschabaude·28 Şub

@AnthropicAI Amazing, huge respect

English

Anthropic@AnthropicAI·27 Şub

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

English

4.3K

9.5K

56.2K

16.4M

Gregor Mitscha-Baude@mitschabaude·28 Şub

Huge respect to @AnthropicAI for standing up to this

Secretary of War Pete Hegseth@SecWar

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English

230

Gregor Mitscha-Baude@mitschabaude·27 Şub

@jxnlco please please please train it to have better social intelligence. sure it's good at coding but it's such a pain to just talk to. it'll constantly misunderstand you or make weird alien judgments. claude opus is light-years better at "getting humans".

English

jason liu@jxnlco·27 Şub

Ok but like what can we be doing better on codex.

Didier Lopes@didier_lopes

I wouldn't be surprised to see OpenAI winning the coding agent race over the next few weeks

English

162

231

30.6K

Gregor Mitscha-Baude@mitschabaude·24 Şub

@cryptodavidw I think all of those prompt "hacking" ideas are outdated and/or so random that I wouldn't even bother trying. Just focus on giving good context

English

166

David Wong@cryptodavidw·24 Şub

We don't understand much about improving prompts in some weird and fundamental ways. For example, should you start your prompt with "you are a Rust engineer" or "you are the world's best Rust engineer"? Should you tell the agent to role play as the world's best Rust engineer or should you tell it to *be* a Rust engineer? It's not clear. Some people will use prompts like "You've read all the Rust books that exist, ..." does it work better? I've heard people saying you should insult your agent, and it will perform better, but is it really better to start your prompts with "you MOTHER FUCKER you better do good to me this time"? I do it sometimes, feels good, but does it really work? I've also heard things like "if you don't write good Rust code, 10 puppies will die". If I were to choose, I would say this is the most likely to provide good results, although the agent most often don't believe you. So yeah, we have no idea.

English

Gregor Mitscha-Baude@mitschabaude·23 Şub

peak 2026 (from @TechCrunch)

English

108

Gregor Mitscha-Baude@mitschabaude·21 Şub

@__tosh + personality + memory

English

Thomas Schranz 🍄@__tosh·21 Şub

agent: loop + llm + tools claw: agent + cron + auth + messaging gui ?

Simon Willison@simonw

I guess "Claw" is becoming a term of art now for the entire category of OpenClaw-like agent systems

English

332

Gregor Mitscha-Baude@mitschabaude·21 Şub

now that code is cheap, doesn't someone want to build a good, open Linux-based OS? 😄

English

177

Gregor Mitscha-Baude@mitschabaude·19 Şub

@banteg it also works in pi/openclaw. it's just not authorized

English

banteg@banteg·18 Şub

still works in takopi. how does it do it?! bunny keeps getting away with it (by actually studying how things work)

Numman Ali@nummanali

Explicitly confirmed, no authorised usage of Claude subscription in: - OpenClaw - Pi Agent - OpenCode - Any 3rd party tool - Agents SDK No OAuth flow is allowed bar within Claude official tools If you do - you’re at high risk of a ban Was good while it lasted, be careful!

English

15.3K

Gregor Mitscha-Baude@mitschabaude·19 Şub

@koeppelmann both :) via github.com/badlogic/pi-mo…

English

koeppelmann.eth 🦉💳@koeppelmann·18 Şub

Claude or Codex?

English

13.4K

Gregor Mitscha-Baude@mitschabaude·19 Şub

@thsottiaux how to make a more high agency/human-spirit version of codex that's as pleasant to use as a co-worker as claude

English

Tibo@thsottiaux·18 Şub

Codex team is fairly distributed, but most of the team is gathering in person over next 48 hours to take a step back and align on what’s next this year. What should we discuss?

English

648

1.6K

600.3K

Gregor Mitscha-Baude@mitschabaude·19 Şub

@kimmonismus ich muss sagen ich fand das interview nicht painful. richtig gut gelungen eigentlich, speziell peter's antworten. armin wolf ist bekannt für scharfen interview stil, hier war er verhältnismäßig freundlich :D und die fragen sollen ja auch jene des publikums repräsentieren

Deutsch

Chubby♨️@kimmonismus·18 Şub

As most of you know, I'm from Germany, so I was able to watch the entire interview with Peter Steinberger on "Zeit im Bild." It was incredibly painful. Not because of Peter's answer, but because the journalist's questions-typically German-Austrian - almost exclusively revolved around whether we should be afraid of AI, whether data privacy is being respected, what dangers OpenClaw poses, and so on. The hottest topic in the world was talked down. Instead of sparking curiosity and enthusiasm among viewers, the program ultimately only stirred up more anxiety and resentment. A damning indictment of Europe.

Peter Steinberger 🦞@steipete

In der USA sind die meisten Menschen enthusiastisch. In Europa werde ich beschimpft, Leute schreien REGULIERUNG und VERANTWORTUNG. Und wenn ich wirklich hier eine Firma baue dann kann ich mich mit Themen wie Investitionsschutzgesetz, Mitarbeiterbeteiligung und lähmenden Arbeitsregulierungen abkämpfen. Bei OAI arbeiten die meisten Leute 6-7 Tage die Woche und werden depentsprechend bezahlt. Be uns ist das illegal.

English

255

203

3.6K

359.5K

Gregor Mitscha-Baude@mitschabaude·17 Şub

@Thom_Wolf who else is excited about applying this insight to JS/TS? x.com/mitschabaude/s…

Gregor Mitscha-Baude@mitschabaude

I think the ideal programming language for agents will - look roughly like TS - transpile to JS - have dependent types like Lean (plus a good built-in model of mutation/effects), so that we can enforce invariants at will I'd love to build that language 😄

English

350

Thomas Wolf@Thom_Wolf·16 Şub

Shifting structures in a software world dominated by AI. Some first-order reflections (TL;DR at the end): Reducing software supply chains, the return of software monoliths – When rewriting code and understanding large foreign codebases becomes cheap, the incentive to rely on deep dependency trees collapses. Writing from scratch ¹ or extracting the relevant parts from another library is far easier when you can simply ask a code agent to handle it, rather than spending countless nights diving into an unfamiliar codebase. The reasons to reduce dependencies are compelling: a smaller attack surface for supply chain threats, smaller packaged software, improved performance, and faster boot times. By leveraging the tireless stamina of LLMs, the dream of coding an entire app from bare-metal considerations all the way up is becoming realistic. End of the Lindy effect – The Lindy effect holds that things which have been around for a long time are there for good reason and will likely continue to persist. It's related to Chesterton's fence: before removing something, you should first understand why it exists, which means removal always carries a cost. But in a world where software can be developed from first principles and understood by a tireless agent, this logic weakens. Older codebases can be explored at will; long-standing software can be replaced with far less friction. A codebase can be fully rewritten in a new language. ² Legacy software can be carefully studied and updated in situations where humans would have given up long ago. The catch: unknown unknowns remain unknown. The true extent of AI's impact will hinge on whether complete coverage of testing, edge cases, and formal verification is achievable. In an AI-dominated world, formal verification isn't optional—it's essential. The case for strongly typed languages – Historically, programming language adoption has been driven largely by human psychology and social dynamics. A language's success depended on a mix of factors: individual considerations like being easy to learn and simple to write correctly; community effects like how active and welcoming a community was, which in turn shaped how fast its ecosystem would grow; and fundamental properties like provable correctness, formal verification, and striking the right balance between dynamic and static checks—between the freedom to write anything and the discipline of guarding against edge cases and attacks. As the human factor diminishes, these dynamics will shift. Less dependence on human psychology will favor strongly typed, formally verifiable and/or high performance languages.³ These are often harder for humans to learn, but they're far better suited to LLMs, which thrive on formal verification and reinforcement learning environments. Expect this to reshape which languages dominate. Economic restructuring of open source – For decades, open-source communities have been built around humans finding connection through writing, learning, and using code together. In a world where most code is written—and perhaps more importantly, read—by machines, these incentives will start to break down.⁴ Communities of AIs building libraries and codebases together will likely emerge as a replacement, but such communities will lack the fundamentally human motivations that have driven open source until now. If the future of open-source development becomes largely devoid of humans, alignment of AI models won't just matter—it will be decisive. The future of new languages – Will AI agents face the same tradeoffs we do when developing or adopting new programming languages? Expressiveness vs. simplicity, safety vs. control, performance vs. abstraction, compile time vs. runtime, explicitness vs. conciseness. It's unclear that they will. In the long term, the reasons to create a new programming language will likely diverge significantly from the human-driven motivations of the past. There may well be an optimal programming language for LLMs—and there's no reason to assume it will resemble the ones humans have converged on. TL; DR: - Monoliths return – cheap rewriting kills dependency trees; smaller attack surface, better performance, bare-metal becomes realistic - Lindy effect weakens – legacy code loses its moat, but unknown unknowns persist; formal verification becomes essential - Strongly typed languages rise – human psychology mattered for adoption; now formal verification and RL environments favor types over ergonomics - Open source restructures – human connection drove the community; AI-written/read code breaks those incentives; alignment becomes decisive - New languages diverge – AI may not share our tradeoffs; optimal LLM programming languages may look nothing like what humans converged on ¹ x.com/mntruell/statu… ² x.com/anthropicai/st… ³ wesmckinney.com/blog/agent-erg… ⁴ #issuecomment-3717222957" target="_blank" rel="nofollow noopener">github.com/tailwindlabs/t…

English

286

1.8K

Gregor Mitscha-Baude@mitschabaude·17 Şub

exactly! never been a better time to bring formal methods to more developers

Andrej Karpathy@karpathy

I think it must be a very interesting time to be in programming languages and formal methods because LLMs change the whole constraints landscape of software completely. Hints of this can already be seen, e.g. in the rising momentum behind porting C to Rust or the growing interest in upgrading legacy code bases in COBOL or etc. In particular, LLMs are *especially* good at translation compared to de-novo generation because 1) the original code base acts as a kind of highly detailed prompt, and 2) as a reference to write concrete tests with respect to. That said, even Rust is nowhere near optimal for LLMs as a target language. What kind of language is optimal? What concessions (if any) are still carved out for humans? Incredibly interesting new questions and opportunities. It feels likely that we'll end up re-writing large fractions of all software ever written many times over.

English

429

Gregor Mitscha-Baude@mitschabaude·13 Şub

@levelsio nah it's not, it was intended as a "good first issue" for onboarding people, not as an actual problem to solve

English

@levelsio@levelsio·13 Şub

The bot is right here I think

calle@callebtc

An OpenClaw bot pressuring a matplotlib maintainer to accept a PR and after it got rejected writes a blog post shaming the maintainer.

English

1.3K

195.3K

Gregor Mitscha-Baude@mitschabaude·11 Şub

@0_Aakash_0 @thsottiaux 1 + 2 => use agents md 3 => use git and revert if you don't like a diff

English

Aakash Harish@0_Aakash_0·10 Şub

Biggest gap: Codex treats every task like a greenfield problem. In reality, 90% of real dev work is modifying existing code within strict constraints. What would change everything: 1. Better repo-level context. Let me tell Codex "this folder is sacred, never touch it" or "always match the patterns in /lib/utils" 2. Persistent memory across sessions. Right now each task starts from zero. If I corrected Codex on a style preference yesterday, it should remember today. 3. A "dry run" mode. Show me the plan and file diffs before executing. The biggest trust killer is when it edits 15 files and you have to reverse-engineer what changed. The model quality is already great. The workflow around the model is where the wins are hiding.

English

1.2K

Tibo@thsottiaux·10 Şub

What could we do better on Codex? App, model, strategy and features… what’s wrong in how we approach things that we should improve immediately?

English

1.2K

947

101.1K

Gregor Mitscha-Baude@mitschabaude·11 Şub

@thsottiaux Model: still bad at writing clean, minimal, elegant code, and keeping a code base long-term maintainable. Introducing (only) the right abstractions, deduplicating and unifying, stuff like that. Feels like the RL is just geared towards immediate problem solving

English

Gregor Mitscha-Baude@mitschabaude·10 Şub

Armin Ronacher ⇌@mitsuhiko

This weekend I was thinking about programming languages. Programming languages for agents. Will we see them? I believe people will (and should!) try to build some. lucumr.pocoo.org/2026/2/9/a-lan…

English

711

Keşfet

@github @natolambert @AnthropicAI @jxnlco @cryptodavidw @TechCrunch @__tosh @banteg