Halvar Flake

65.4K posts

Halvar Flake

Halvar Flake

@halvarflake

Choose disfavour where obedience does not bring honour. I do math. And was once asked by R. Morris Sr. : "For whom?" @[email protected]

Sumali Haziran 2008
2.6K Sinusundan44.4K Mga Tagasunod
Halvar Flake
Halvar Flake@halvarflake·
@aviramyh A typo is an interesting thing because it'd imply a typo'ed token?
English
1
0
1
61
Aviram Hassan 🐻
Aviram Hassan 🐻@aviramyh·
anyone else encountered agent having typos? a friend told me it never happened to them and to me it happened twice same day with two different harnesses+models (CC, Codex).
English
2
0
0
111
Halvar Flake nag-retweet
eyitemi
eyitemi@eeyitemi·
Looks like I’m ahead of schedule already
eyitemi tweet media
English
0
1
29
1.7K
Halvar Flake nag-retweet
Ian Livingstone
Ian Livingstone@ianlivingstone·
Incredibly excited to announce Keycard for Coding Agents - no more copy & pasting credentials or approving individual tool calls. Agents get task-scoped access, so you can stay in flow and actually build. You’re only pulled in when it matters. Yolo mode, without compromise.
Keycard@KeycardLabs

Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵

English
19
7
94
36K
Halvar Flake nag-retweet
Aakash Gupta
Aakash Gupta@aakashgupta·
The entire AI industry spent a week convinced DeepSeek had secretly launched V4. Reuters reported it. Developers debated it. OpenRouter usage charts broke. It was Xiaomi. A smartphone and electric vehicle company just shipped a 1-trillion-parameter model that topped the world's largest API aggregation platform, and nobody guessed the origin because the model was too good to be associated with a hardware company. The stealth launch as "Hunter Alpha" on March 11 was the most elegant product validation in recent AI history. No brand, no attribution, no expectations. Just raw performance. The model processed over 1 trillion tokens in 8 days. Developers organically chose it over every labeled frontier model on the platform. When Reuters tested the chatbot, it identified itself only as "a Chinese AI model primarily trained in Chinese" with a May 2025 knowledge cutoff, the exact same cutoff DeepSeek reports. The person behind this is Luo Fuli. Born in 1995. Eight papers at ACL as a graduate student at Peking University. Alibaba DAMO Academy. Then DeepSeek, where she co-developed V2 and contributed to R1. Lei Jun reportedly offered tens of millions of yuan to recruit her. She joined Xiaomi in November 2025. Four months later, she's shipping a model that benchmarks alongside Claude Sonnet 4.6 and GPT-5.2 at one-fifth the API cost. The detail that tells you everything about how this team operates: when Luo first experienced a complex agentic scaffold, she tried to convince the MiMo team to adopt it. They resisted. So she issued a mandate. Anyone on the team with fewer than 100 conversations with the system by tomorrow can quit. They all stayed. The imagination converted into research velocity. The architectural bets matter. Hybrid Attention for long-context efficiency. MTP inference for low latency. 1M context window. 42B activated parameters out of 1T total. These are infrastructure decisions optimized for agents that run autonomously for hours, not chatbots that answer one question at a time. Pricing: $1/$3 per million tokens up to 256K context. $2/$6 for 256K to 1M. Claude Sonnet 4.6 costs roughly 5x that. Xiaomi's shares rose 5.8% on the announcement. The real DeepSeek V4 still hasn't shipped. The model everyone mistook for it already has a trillion tokens of real-world usage data.
Fuli Luo@_LuoFuli

MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era. I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a process that was thrilling, painful, and fascinating all at once. The 1T base model started training months ago. The original goal was long-context reasoning efficiency. Hybrid Attention carries real innovation, without overreaching — and it turns out to be exactly the right foundation for the Agent era. 1M context window. MTP inference for ultra-low latency and cost. These architectural decisions weren't trendy. They were a structural advantage we built before we needed it. What changed everything was experiencing a complex agentic scaffold — what I'd call orchestrated Context — for the first time. I was shocked on day one. I tried to convince the team to use it. That didn't work. So I gave a hard mandate: anyone on MiMo Team with fewer than 100 conversations tomorrow can quit. It worked. Once the team's imagination was ignited by what agentic systems could do, that imagination converted directly into research velocity. People ask why we move so fast. I saw it firsthand building DeepSeek R1. My honest summary: — Backbone and Infra research has long cycles. You need strategic conviction a year before it pays off. — Posttrain agility is a different muscle: product intuition driving evaluation, iteration cycles compressed, paradigm shifts caught early. — And the constant: curiosity, sharp technical instinct, decisive execution, full commitment — and something that's easy to underestimate: a genuine love for the world you're building for. We will open-source — when the models are stable enough to deserve it. From Beijing, very late, not quite awake.

English
24
119
986
115.2K
Jacob B ⚜️🦧
Jacob B ⚜️🦧@BenevOrang·
@halvarflake I would love to read your slightly more translated thoughts for the lay person who read and enjoyed your rec for The Story of Magic. Is it surmountable by your assessment due to tech debt? Any timelines?
English
1
0
2
426
Halvar Flake
Halvar Flake@halvarflake·
People say CUDA is a moat, but if you stare into this moat, it's an abyss with lovecraftian horrors in it. People say the moat is deep, and man, technically it is a great old one.
English
5
1
58
14K
Halvar Flake nag-retweet
Lester Mackey
Lester Mackey@LesterMackey·
Qiang Liu, Chris Oates, and I are writing a monograph on Probabilistic Inference and Learning with Stein’s Method, and we’d love to get your feedback on the first draft
Lester Mackey tweet media
English
3
22
160
10.9K
JD Work
JD Work@HostileSpectrum·
The only real test of theories of war is in operational practice. It follows then that when so many seek despite all other reasons to avoid acknowledgment of live cases, the theory in question cannot be sustained. But they are just hoping you won’t notice, and that events will remain opaque for long enough to bury what would otherwise be very public failure.
English
1
0
5
694
Halvar Flake nag-retweet
chompie
chompie@chompie1337·
Wonder what I mean? Well, for one, even with seamless tool integration, the frontier models are still pretty poor at debugging for xdev purposes. It makes sense — the public training data for that is inexistent…
chompie@chompie1337

@seanhn Im a sceptic for now. I’m building out an agent based system and while im extremely impressed, my benchmarks aren’t being met. Human experts are still way better.

English
4
10
81
9.8K
Halvar Flake nag-retweet
David Crawshaw
David Crawshaw@davidcrawshaw·
@halvarflake All real moats contain lovecraftian horrors.
English
1
4
9
1.7K
Halvar Flake nag-retweet
Erik Bernhardsson
Erik Bernhardsson@bernhardsson·
I love this. Tests are a class of “embarrassingly parallel” computer problem and scaling out makes so much sense. Next step: GitHub Actions replacement
Imbue@imbue_ai

Your parallel agents needed scalable test coverage yesterday Introducing Offload: a Rust CLI that spreads your test suite across 200+ @Modal sandboxes, freeing your CPU to keep your agents shipping. On our Playwright suite, it took a 12 min run to 2, at $0.08 a run

English
12
5
245
35.4K
Halvar Flake nag-retweet
David Bessis
David Bessis@davidbessis·
Attention is all we have: a conjectural theory of cognitive inequality — I never expected this to become the most-liked piece of my Substack, thank you everyone🥰! davidbessis.substack.com/p/attention-is…
David Bessis tweet media
English
1
20
107
7K
Sean Heelan
Sean Heelan@seanhn·
Using CC/Codex in interactive sessions has given me more empathy for scepticism about their use in hard exploit dev scenarios. You are working with a fundamentally diff category of system when you treat agents as a primitive for building search algorithms versus interactive tools
English
6
3
37
6.4K
Halvar Flake nag-retweet
Marcel van Oost
Marcel van Oost@oost_marcel·
🚨𝘽𝙍𝙀𝘼𝙆𝙄𝙉𝙂: European Commission President Ursula von der Leyen unveiled EU–INC, a new framework that lets you launch a company in 48 hours for under €100 Starting a company across the EU today = 27 legal systems, 60+ company structures 🤯 That might be about to change… The European Commission just introduced 𝗘𝗨 𝗜𝗻𝗰., a new optional corporate framework designed to make Europe actually function like one market. Here’s what stands out: → Set up a company in 48 hours → Cost: < €100 → Fully online, no minimum capital → One single framework across all EU countries → Easier share transfers & fundraising → EU-wide employee stock options (huge for talent) Especially the EU-wide stock option plans, taxed only when employees actually sell (instead of when granted) is huge. This makes it far easier for startups to attract and retain top talent, finally putting Europe closer to the US playbook. Source/More info: ec.europa.eu/commission/pre… In short: This is Europe trying to compete with the simplicity of a Delaware C-Corp 🇺🇸 And honestly… it’s long overdue. For years, European founders had 2 choices: 1. Stay local and deal with fragmentation 2. Move to the US to scale 𝗘𝗨 𝗜𝗻𝗰. is trying to remove that trade-off. If executed well, this could be one of the most important structural changes for European startups in decades. What do you think?
English
561
945
6.6K
848.3K
Halvar Flake
Halvar Flake@halvarflake·
@eeyitemi @thegrugq It is almost always possible to express almost anything in any language, sometimes more cumbersome, sometimes more easily.
English
0
0
0
41
Halvar Flake
Halvar Flake@halvarflake·
@eeyitemi @thegrugq So I think the questions for any terminology or language are: (1) Does it help me understand something better? (2) Does it help me connect it to something else so I can understand both better? (3) Does it improve the way I am doing it?
English
2
0
0
64
eyitemi
eyitemi@eeyitemi·
Olá 👋 @halvarflake @thegrugq I have a question that I fear may be badly posed, but I think you both are the right people to ask. I’ve been doing a bit of pressure-testing on whether ideas borrowed from measure theory can actually sharpen vulnerability research methodology, or whether they mostly give elegant language to something that is still fundamentally craft, intuition, and situational judgment. What keeps pulling me toward the analogy is that a lot of serious bugs I’ve found and reported recently seem to survive in interestingly low-coverage, high-consequence, but still reachable regions of behavior, especially where a lot of assumptions are relied on but never actually enforced. So concepts like observability, rarity, and shifts in sampling pressure somehow feel pretty relevant. But then, the more I try to operationalize it, the more I worry the formal vocabulary creates fake precision. So I’m curious: where do you think the analogy genuinely produces sustainable, scalable vuln research leverage, and at what point does it collapse into intellectual decoration?
English
2
0
3
643