Halvar Flake

65.4K posts

Halvar Flake

@halvarflake

Choose disfavour where obedience does not bring honour. I do math. And was once asked by R. Morris Sr. : "For whom?" @[email protected]

Sumali Haziran 2008

2.6K Sinusundan44.4K Mga Tagasunod

Naka-pin na Tweet

Halvar Flake@halvarflake·4 Kas

I will have to print & frame this tweet :-))

Rob Joyce@RGB_Lights

If you watch one talk this month, make it this one. Profoundly insightful on the complexity threat to security by @halvarflake given at NATOs CYCON. Video: err.ee/836236/video-g… And slides: docs.google.com/presentation/d…

English

239

Halvar Flake@halvarflake·1h

@aviramyh A typo is an interesting thing because it'd imply a typo'ed token?

English

Aviram Hassan 🐻@aviramyh·4h

anyone else encountered agent having typos? a friend told me it never happened to them and to me it happened twice same day with two different harnesses+models (CC, Codex).

English

111

Halvar Flake nag-retweet

Suhail@Suhail·11h

I am now at 5 GPU providers being completely sold out for a single node of 8xH100s. I don’t think people understand the gravity of what is about to come.

Suhail@Suhail

The run on inference capacity is coming. You have been warned.

English

1.6K

321.2K

Halvar Flake nag-retweet

eyitemi@eeyitemi·2h

Looks like I’m ahead of schedule already

English

1.7K

Halvar Flake nag-retweet

Ian Livingstone@ianlivingstone·21h

Incredibly excited to announce Keycard for Coding Agents - no more copy & pasting credentials or approving individual tool calls. Agents get task-scoped access, so you can stay in flow and actually build. You’re only pulled in when it matters. Yolo mode, without compromise.

Keycard@KeycardLabs

Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵

English

36K

Halvar Flake nag-retweet

Aakash Gupta@aakashgupta·8h

The entire AI industry spent a week convinced DeepSeek had secretly launched V4. Reuters reported it. Developers debated it. OpenRouter usage charts broke. It was Xiaomi. A smartphone and electric vehicle company just shipped a 1-trillion-parameter model that topped the world's largest API aggregation platform, and nobody guessed the origin because the model was too good to be associated with a hardware company. The stealth launch as "Hunter Alpha" on March 11 was the most elegant product validation in recent AI history. No brand, no attribution, no expectations. Just raw performance. The model processed over 1 trillion tokens in 8 days. Developers organically chose it over every labeled frontier model on the platform. When Reuters tested the chatbot, it identified itself only as "a Chinese AI model primarily trained in Chinese" with a May 2025 knowledge cutoff, the exact same cutoff DeepSeek reports. The person behind this is Luo Fuli. Born in 1995. Eight papers at ACL as a graduate student at Peking University. Alibaba DAMO Academy. Then DeepSeek, where she co-developed V2 and contributed to R1. Lei Jun reportedly offered tens of millions of yuan to recruit her. She joined Xiaomi in November 2025. Four months later, she's shipping a model that benchmarks alongside Claude Sonnet 4.6 and GPT-5.2 at one-fifth the API cost. The detail that tells you everything about how this team operates: when Luo first experienced a complex agentic scaffold, she tried to convince the MiMo team to adopt it. They resisted. So she issued a mandate. Anyone on the team with fewer than 100 conversations with the system by tomorrow can quit. They all stayed. The imagination converted into research velocity. The architectural bets matter. Hybrid Attention for long-context efficiency. MTP inference for low latency. 1M context window. 42B activated parameters out of 1T total. These are infrastructure decisions optimized for agents that run autonomously for hours, not chatbots that answer one question at a time. Pricing: $1/$3 per million tokens up to 256K context. $2/$6 for 256K to 1M. Claude Sonnet 4.6 costs roughly 5x that. Xiaomi's shares rose 5.8% on the announcement. The real DeepSeek V4 still hasn't shipped. The model everyone mistook for it already has a trillion tokens of real-world usage data.

Fuli Luo@_LuoFuli

MiMo-V2-Pro & Omni & TTS is out. Our first full-stack model family built truly for the Agent era. I call this a quiet ambush — not because we planned it, but because the shift from Chat to Agent paradigm happened so fast, even we barely believed it. Somewhere in between was a process that was thrilling, painful, and fascinating all at once. The 1T base model started training months ago. The original goal was long-context reasoning efficiency. Hybrid Attention carries real innovation, without overreaching — and it turns out to be exactly the right foundation for the Agent era. 1M context window. MTP inference for ultra-low latency and cost. These architectural decisions weren't trendy. They were a structural advantage we built before we needed it. What changed everything was experiencing a complex agentic scaffold — what I'd call orchestrated Context — for the first time. I was shocked on day one. I tried to convince the team to use it. That didn't work. So I gave a hard mandate: anyone on MiMo Team with fewer than 100 conversations tomorrow can quit. It worked. Once the team's imagination was ignited by what agentic systems could do, that imagination converted directly into research velocity. People ask why we move so fast. I saw it firsthand building DeepSeek R1. My honest summary: — Backbone and Infra research has long cycles. You need strategic conviction a year before it pays off. — Posttrain agility is a different muscle: product intuition driving evaluation, iteration cycles compressed, paradigm shifts caught early. — And the constant: curiosity, sharp technical instinct, decisive execution, full commitment — and something that's easy to underestimate: a genuine love for the world you're building for. We will open-source — when the models are stable enough to deserve it. From Beijing, very late, not quite awake.

English

119

986

115.2K

Halvar Flake@halvarflake·7h

@BenevOrang I think my thoughts are still evolving...

English

316

Jacob B ⚜️🦧@BenevOrang·9h

@halvarflake I would love to read your slightly more translated thoughts for the lay person who read and enjoyed your rec for The Story of Magic. Is it surmountable by your assessment due to tech debt? Any timelines?

English

426

Halvar Flake@halvarflake·16h

People say CUDA is a moat, but if you stare into this moat, it's an abyss with lovecraftian horrors in it. People say the moat is deep, and man, technically it is a great old one.

English

14K

Halvar Flake nag-retweet

Lester Mackey@LesterMackey·13h

Qiang Liu, Chris Oates, and I are writing a monograph on Probabilistic Inference and Learning with Stein’s Method, and we’d love to get your feedback on the first draft

English

160

10.9K

Halvar Flake@halvarflake·7h

@HostileSpectrum Like economics!

English

139

JD Work@HostileSpectrum·11h

The only real test of theories of war is in operational practice. It follows then that when so many seek despite all other reasons to avoid acknowledgment of live cases, the theory in question cannot be sustained. But they are just hoping you won’t notice, and that events will remain opaque for long enough to bury what would otherwise be very public failure.

English

694

Halvar Flake nag-retweet

chompie@chompie1337·10h

Wonder what I mean? Well, for one, even with seamless tool integration, the frontier models are still pretty poor at debugging for xdev purposes. It makes sense — the public training data for that is inexistent…

chompie@chompie1337

@seanhn Im a sceptic for now. I’m building out an agent based system and while im extremely impressed, my benchmarks aren’t being met. Human experts are still way better.

English

9.8K

Halvar Flake nag-retweet

David Crawshaw@davidcrawshaw·16h

@halvarflake All real moats contain lovecraftian horrors.

English

1.7K

Halvar Flake@halvarflake·16h

The number of OS primitives that CUDA breaks is pretty impressive. :-)

Halvar Flake@halvarflake

People say CUDA is a moat, but if you stare into this moat, it's an abyss with lovecraftian horrors in it. People say the moat is deep, and man, technically it is a great old one.

English

9.5K

Halvar Flake nag-retweet

Erik Bernhardsson@bernhardsson·19h

I love this. Tests are a class of “embarrassingly parallel” computer problem and scaling out makes so much sense. Next step: GitHub Actions replacement

Imbue@imbue_ai

Your parallel agents needed scalable test coverage yesterday Introducing Offload: a Rust CLI that spreads your test suite across 200+ @Modal sandboxes, freeing your CPU to keep your agents shipping. On our Playwright suite, it took a 12 min run to 2, at $0.08 a run

English

245

35.4K

Halvar Flake nag-retweet

David Bessis@davidbessis·23h

Attention is all we have: a conjectural theory of cognitive inequality — I never expected this to become the most-liked piece of my Substack, thank you everyone🥰! davidbessis.substack.com/p/attention-is…

English

107

Halvar Flake@halvarflake·23h

@seanhn I'm one of these sceptics ;)

English

224

Sean Heelan@seanhn·23h

@halvarflake ?

QAM

667

Sean Heelan@seanhn·1d

Using CC/Codex in interactive sessions has given me more empathy for scepticism about their use in hard exploit dev scenarios. You are working with a fundamentally diff category of system when you treat agents as a primitive for building search algorithms versus interactive tools

English

6.4K

Halvar Flake nag-retweet

Marcel van Oost@oost_marcel·1d

🚨𝘽𝙍𝙀𝘼𝙆𝙄𝙉𝙂: European Commission President Ursula von der Leyen unveiled EU–INC, a new framework that lets you launch a company in 48 hours for under €100 Starting a company across the EU today = 27 legal systems, 60+ company structures 🤯 That might be about to change… The European Commission just introduced 𝗘𝗨 𝗜𝗻𝗰., a new optional corporate framework designed to make Europe actually function like one market. Here’s what stands out: → Set up a company in 48 hours → Cost: < €100 → Fully online, no minimum capital → One single framework across all EU countries → Easier share transfers & fundraising → EU-wide employee stock options (huge for talent) Especially the EU-wide stock option plans, taxed only when employees actually sell (instead of when granted) is huge. This makes it far easier for startups to attract and retain top talent, finally putting Europe closer to the US playbook. Source/More info: ec.europa.eu/commission/pre… In short: This is Europe trying to compete with the simplicity of a Delaware C-Corp 🇺🇸 And honestly… it’s long overdue. For years, European founders had 2 choices: 1. Stay local and deal with fragmentation 2. Move to the US to scale 𝗘𝗨 𝗜𝗻𝗰. is trying to remove that trade-off. If executed well, this could be one of the most important structural changes for European startups in decades. What do you think?

English

561

945

6.6K

848.3K

Halvar Flake@halvarflake·1d

@eeyitemi @thegrugq It is almost always possible to express almost anything in any language, sometimes more cumbersome, sometimes more easily.

English

Halvar Flake@halvarflake·1d

@eeyitemi @thegrugq So I think the questions for any terminology or language are: (1) Does it help me understand something better? (2) Does it help me connect it to something else so I can understand both better? (3) Does it improve the way I am doing it?

English

eyitemi@eeyitemi·1d

Olá 👋 @halvarflake @thegrugq I have a question that I fear may be badly posed, but I think you both are the right people to ask. I’ve been doing a bit of pressure-testing on whether ideas borrowed from measure theory can actually sharpen vulnerability research methodology, or whether they mostly give elegant language to something that is still fundamentally craft, intuition, and situational judgment. What keeps pulling me toward the analogy is that a lot of serious bugs I’ve found and reported recently seem to survive in interestingly low-coverage, high-consequence, but still reachable regions of behavior, especially where a lot of assumptions are relied on but never actually enforced. So concepts like observability, rarity, and shifts in sampling pressure somehow feel pretty relevant. But then, the more I try to operationalize it, the more I worry the formal vocabulary creates fake precision. So I’m curious: where do you think the analogy genuinely produces sustainable, scalable vuln research leverage, and at what point does it collapse into intellectual decoration?

English

643

Halvar Flake nag-retweet

Vlad Ionescu@ucsenoi·1d

We built one hell of an 0day printer already, $40M will take us even farther

Sybil@runsybil

The way we hack is changing and we're building what comes next We've raised a total of $40M to create the AI-native platform for offensive security

English

17.4K

Tuklasin

@aviramyh @BenevOrang @HostileSpectrum @seanhn @elonmusk @BarackObama @taylorswift13 @cristiano