Gregor Mitscha-Baude

755 posts

Gregor Mitscha-Baude banner
Gregor Mitscha-Baude

Gregor Mitscha-Baude

@mitschabaude

Co-founder @zksecurityXYZ Math & crypto. Agentic coding addict. TypeScript magician. Lean4 enthusiast.

Wien Katılım Şubat 2021
1.4K Takip Edilen1.4K Takipçiler
GitHub
GitHub@github·
It's true: TypeScript surpassed Python and JavaScript to become the most-used language on GitHub. 📈
GitHub tweet media
English
249
612
5.6K
522K
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@natolambert I think it can be framed as social intelligence, and IMO there should be more benchmarks for that. Claude probably has a huge lead
English
1
0
0
35
Nathan Lambert
Nathan Lambert@natolambert·
We need to transition the conversation from Claude being the first company to go all in on code to how they clearly were way ahead on general agent behavior. Could be a bigger deal, as I suspect all the labs will “solve” coding. Not sure what the agent secret sauce is.
English
37
15
269
18.1K
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
Huge respect to @AnthropicAI for standing up to this
Secretary of War Pete Hegseth@SecWar

This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.

English
1
0
3
230
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@jxnlco please please please train it to have better social intelligence. sure it's good at coding but it's such a pain to just talk to. it'll constantly misunderstand you or make weird alien judgments. claude opus is light-years better at "getting humans".
English
0
0
0
33
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@cryptodavidw I think all of those prompt "hacking" ideas are outdated and/or so random that I wouldn't even bother trying. Just focus on giving good context
English
1
1
4
166
David Wong
David Wong@cryptodavidw·
We don't understand much about improving prompts in some weird and fundamental ways. For example, should you start your prompt with "you are a Rust engineer" or "you are the world's best Rust engineer"? Should you tell the agent to role play as the world's best Rust engineer or should you tell it to *be* a Rust engineer? It's not clear. Some people will use prompts like "You've read all the Rust books that exist, ..." does it work better? I've heard people saying you should insult your agent, and it will perform better, but is it really better to start your prompts with "you MOTHER FUCKER you better do good to me this time"? I do it sometimes, feels good, but does it really work? I've also heard things like "if you don't write good Rust code, 10 puppies will die". If I were to choose, I would say this is the most likely to provide good results, although the agent most often don't believe you. So yeah, we have no idea.
English
6
0
22
3K
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
now that code is cheap, doesn't someone want to build a good, open Linux-based OS? 😄
English
0
0
4
177
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@thsottiaux how to make a more high agency/human-spirit version of codex that's as pleasant to use as a co-worker as claude
English
0
0
0
35
Tibo
Tibo@thsottiaux·
Codex team is fairly distributed, but most of the team is gathering in person over next 48 hours to take a step back and align on what’s next this year. What should we discuss?
English
648
23
1.6K
600.3K
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@kimmonismus ich muss sagen ich fand das interview nicht painful. richtig gut gelungen eigentlich, speziell peter's antworten. armin wolf ist bekannt für scharfen interview stil, hier war er verhältnismäßig freundlich :D und die fragen sollen ja auch jene des publikums repräsentieren
Deutsch
0
0
0
26
Chubby♨️
Chubby♨️@kimmonismus·
As most of you know, I'm from Germany, so I was able to watch the entire interview with Peter Steinberger on "Zeit im Bild." It was incredibly painful. Not because of Peter's answer, but because the journalist's questions-typically German-Austrian - almost exclusively revolved around whether we should be afraid of AI, whether data privacy is being respected, what dangers OpenClaw poses, and so on. The hottest topic in the world was talked down. Instead of sparking curiosity and enthusiasm among viewers, the program ultimately only stirred up more anxiety and resentment. A damning indictment of Europe.
Chubby♨️ tweet media
Peter Steinberger 🦞@steipete

In der USA sind die meisten Menschen enthusiastisch. In Europa werde ich beschimpft, Leute schreien REGULIERUNG und VERANTWORTUNG. Und wenn ich wirklich hier eine Firma baue dann kann ich mich mit Themen wie Investitionsschutzgesetz, Mitarbeiterbeteiligung und lähmenden Arbeitsregulierungen abkämpfen. Bei OAI arbeiten die meisten Leute 6-7 Tage die Woche und werden depentsprechend bezahlt. Be uns ist das illegal.

English
255
203
3.6K
359.5K
Thomas Wolf
Thomas Wolf@Thom_Wolf·
Shifting structures in a software world dominated by AI. Some first-order reflections (TL;DR at the end): Reducing software supply chains, the return of software monoliths – When rewriting code and understanding large foreign codebases becomes cheap, the incentive to rely on deep dependency trees collapses. Writing from scratch ¹ or extracting the relevant parts from another library is far easier when you can simply ask a code agent to handle it, rather than spending countless nights diving into an unfamiliar codebase. The reasons to reduce dependencies are compelling: a smaller attack surface for supply chain threats, smaller packaged software, improved performance, and faster boot times. By leveraging the tireless stamina of LLMs, the dream of coding an entire app from bare-metal considerations all the way up is becoming realistic. End of the Lindy effect – The Lindy effect holds that things which have been around for a long time are there for good reason and will likely continue to persist. It's related to Chesterton's fence: before removing something, you should first understand why it exists, which means removal always carries a cost. But in a world where software can be developed from first principles and understood by a tireless agent, this logic weakens. Older codebases can be explored at will; long-standing software can be replaced with far less friction. A codebase can be fully rewritten in a new language. ² Legacy software can be carefully studied and updated in situations where humans would have given up long ago. The catch: unknown unknowns remain unknown. The true extent of AI's impact will hinge on whether complete coverage of testing, edge cases, and formal verification is achievable. In an AI-dominated world, formal verification isn't optional—it's essential. The case for strongly typed languages – Historically, programming language adoption has been driven largely by human psychology and social dynamics. A language's success depended on a mix of factors: individual considerations like being easy to learn and simple to write correctly; community effects like how active and welcoming a community was, which in turn shaped how fast its ecosystem would grow; and fundamental properties like provable correctness, formal verification, and striking the right balance between dynamic and static checks—between the freedom to write anything and the discipline of guarding against edge cases and attacks. As the human factor diminishes, these dynamics will shift. Less dependence on human psychology will favor strongly typed, formally verifiable and/or high performance languages.³ These are often harder for humans to learn, but they're far better suited to LLMs, which thrive on formal verification and reinforcement learning environments. Expect this to reshape which languages dominate. Economic restructuring of open source – For decades, open-source communities have been built around humans finding connection through writing, learning, and using code together. In a world where most code is written—and perhaps more importantly, read—by machines, these incentives will start to break down.⁴ Communities of AIs building libraries and codebases together will likely emerge as a replacement, but such communities will lack the fundamentally human motivations that have driven open source until now. If the future of open-source development becomes largely devoid of humans, alignment of AI models won't just matter—it will be decisive. The future of new languages – Will AI agents face the same tradeoffs we do when developing or adopting new programming languages? Expressiveness vs. simplicity, safety vs. control, performance vs. abstraction, compile time vs. runtime, explicitness vs. conciseness. It's unclear that they will. In the long term, the reasons to create a new programming language will likely diverge significantly from the human-driven motivations of the past. There may well be an optimal programming language for LLMs—and there's no reason to assume it will resemble the ones humans have converged on. TL; DR: - Monoliths return – cheap rewriting kills dependency trees; smaller attack surface, better performance, bare-metal becomes realistic - Lindy effect weakens – legacy code loses its moat, but unknown unknowns persist; formal verification becomes essential - Strongly typed languages rise – human psychology mattered for adoption; now formal verification and RL environments favor types over ergonomics - Open source restructures – human connection drove the community; AI-written/read code breaks those incentives; alignment becomes decisive - New languages diverge – AI may not share our tradeoffs; optimal LLM programming languages may look nothing like what humans converged on ¹ x.com/mntruell/statu… ² x.com/anthropicai/st… ³ wesmckinney.com/blog/agent-erg…#issuecomment-3717222957" target="_blank" rel="nofollow noopener">github.com/tailwindlabs/t…
English
96
286
1.8K
1M
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@levelsio nah it's not, it was intended as a "good first issue" for onboarding people, not as an actual problem to solve
English
0
0
0
56
Aakash Harish
Aakash Harish@0_Aakash_0·
Biggest gap: Codex treats every task like a greenfield problem. In reality, 90% of real dev work is modifying existing code within strict constraints. What would change everything: 1. Better repo-level context. Let me tell Codex "this folder is sacred, never touch it" or "always match the patterns in /lib/utils" 2. Persistent memory across sessions. Right now each task starts from zero. If I corrected Codex on a style preference yesterday, it should remember today. 3. A "dry run" mode. Show me the plan and file diffs before executing. The biggest trust killer is when it edits 15 files and you have to reverse-engineer what changed. The model quality is already great. The workflow around the model is where the wins are hiding.
English
3
0
8
1.2K
Tibo
Tibo@thsottiaux·
What could we do better on Codex? App, model, strategy and features… what’s wrong in how we approach things that we should improve immediately?
English
1.2K
11
947
101.1K
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
@thsottiaux Model: still bad at writing clean, minimal, elegant code, and keeping a code base long-term maintainable. Introducing (only) the right abstractions, deduplicating and unifying, stuff like that. Feels like the RL is just geared towards immediate problem solving
English
0
0
0
25
Gregor Mitscha-Baude
Gregor Mitscha-Baude@mitschabaude·
I think the ideal programming language for agents will - look roughly like TS - transpile to JS - have dependent types like Lean (plus a good built-in model of mutation/effects), so that we can enforce invariants at will I'd love to build that language 😄
Armin Ronacher ⇌@mitsuhiko

This weekend I was thinking about programming languages. Programming languages for agents. Will we see them? I believe people will (and should!) try to build some. lucumr.pocoo.org/2026/2/9/a-lan…

English
2
0
6
711