Florian Leibert 🎢

3.4K posts

Florian Leibert 🎢

@flo

GP @468capital. Founder @mesosphere, ex @twitter & @airbnb. prototype not meant for mass production. the 🇺🇸 dream is real.

ATX, NYC & rest of world Katılım Ocak 2009

3.7K Takip Edilen11.6K Takipçiler

Sabitlenmiş Tweet

Florian Leibert 🎢@flo·21 Nis

Serving config for Kimi 2.6 on 8x MI300X with DFlash speculative decoding (AMD). news.ycombinator.com/item?id=478468…

English

671

Florian Leibert 🎢 retweetledi

Poetiq@poetiq_ai·1d

Poetiq's Meta-System built its own coding harness from scratch. It got SOTA on LiveCodeBench Pro. No fine-tuning, no special model access. Just standard APIs. Using Gemini 3.1 Pro, it made a harness that beat all frontier models we tested.

English

497

1.2M

Florian Leibert 🎢@flo·8 May

@KyleHessling1 Thank you for calling out the “highly experimental” part. Had it halfway deployed to do fuel estimates on my nuclear power plant. I’ll use llama 3 then…

English

336

Kyle Hessling@KyleHessling1·8 May

Negentropy-claude-opus-4.7-9B and 4B are now LIVE! HIGHLY EXPERIMENTAL, but another HUGE improvement for the 9B Class and our fine-tuning workflows! Name inspired by the concept of negative entropy, this model is based on an experimental thesis that Jackrong and I have been developing for the last few weeks (link to inspiration paper in comments). In laymen's terms, we finetuned an inversion model that can extract full detailed thinking traces from large model dataset thinking summaries. We were skeptical ourselves, but it seems to work shockingly well! The biggest standout, was that it actually produced results on the creative HTML canvas prompts. They have some issues, but none of the other 9B models could even publish an output on those, so it definitely has some better capability for general programming than the other 9b models in it's class. The other day we released the deepseek distilled 9B and while it is still great, this one is smarter in general and wildly more thinking and token efficient (~2x more efficient or better in most cases). Please check out the full benchmark report in the comments I've put together that compares it with the deepseek distill and the base model! GGUF's for the 9B below, 4B in comments. Note that I have not tested the 4B at all yet, so bear that in mind, but the 9B is stellar! We're very excited about the method here, and will be applying it to larger models soon. In the meantime make sure to check out the new Qwopus 3.6 35b from the repo as well, and Qwopus 3.6 27B is undergoing some final long context training, and will be releasing soon! Jackrong and I are literally working 24 hours a day (thanks to a perfect time zone discrepency)! We're having a blast and we hope you all are too! Bench it and post your results below! Looking forward to chatting about it! GGUF's here: huggingface.co/Jackrong/Negen…

English

341

18.8K

Florian Leibert 🎢@flo·8 May

@JosephJacks_ AI charts are a wonderful thing. Love the crossover callout :)

English

8.5K

JJ@JosephJacks_·8 May

PREDICTION: Anthropic will surpass Alphabet in revenue by mid-2028. This is not a bull case or an acceleration scenario — it is a continuation of the curve already in evidence. Anthropic’s ARR went from $1B (Jan 2025) to $9B (Dec 2025) to $30B (Apr 2026) — a 3.3x step in a single four-month window, and the curve has been steepening, not flattening. My projection actually assumes deceleration from here: $100B by end of 2026, $340B in 2027, $850B in 2028, $1.4T in 2029, $2T by 2030. Crossover with Alphabet happens at ~$575B in mid-2028, not because Anthropic accelerates beyond today’s pace, but because Alphabet — locked at ~15% YoY in a mature ads-and-cloud business — cannot match enterprise AI’s adoption physics. As @rodriscoll intelligently observed recently, Gemini tokens served grew by only 60% in the last quarter … while Anthropic grew by 10X. Three drivers make the continuation structural, not speculative: customers spending >$1M/year with Anthropic doubled from 500 to 1,000 in under two months post-Series G (these are multi-year expanding contracts with near-zero churn — switching a deployed agent stack mid-flight is operationally untenable); Claude Code is the wedge, not the product, dragging the rest of the platform — agents, MCP, healthcare, biotech — into every Fortune 2000 deployment as an attach point; and compute supply is finally non-binding with the 3.5GW Google + Broadcom deal (2027+), this weeks SpaceX partnership, and 1GW of standing Google capacity for 2026. For most of 2024–2025 the bottleneck was supply, not demand. That constraint is releasing exactly when the demand curve is steepest. The standard objection — “no company has ever sustained this at scale” — applies a software-era frame to a labor-era business. AWS, Azure, and Meta decelerated at $50–100B because they sold tools to the economy. Anthropic is selling cognitive capacity into the economy. The TAM isn’t enterprise software ($800B). It’s labor ($50T+). When the denominator is two orders of magnitude larger, “deceleration at $100B ARR” stops being a law and starts being an assumption. The crossover isn’t a maybe. It’s a function of timing. Mid-2028 is when I think Anthropic surpasses Google.

JJ@JosephJacks_

Anthropic will have a higher valuation than Alphabet in < 18 months.

English

180

107

714

1.3M

Florian Leibert 🎢 retweetledi

Mateusz Mirkowski@llmdevguy·7 May

🇨🇳After testing Chinese models over the last few weeks, my coding ranking currently looks like this: 1. Kimi K2.6 2. GLM-5.1 3. MiMo V2.5 Pro 4. MiniMax 2.7 5. DeepSeek V4 Pro 👉But each of them has its own superpowers. Frontend/Design: K2.6 Backend: K2.6 / GLM-5.1 Code review: MiMo All-rounder: M2.7 Reasoning: DeepSeek Now I'm waiting for MiniMax 3.0, which I hope will take the number 1 spot!

English

157

194

2.4K

160.7K

Florian Leibert 🎢 retweetledi

Sully@SullyOmarr·5 May

99% chance this is fake In the 1% chance this is real short every semiconductor

Alexander Whedon@alex_whedon

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

English

19.2K

Florian Leibert 🎢 retweetledi

Bindu Reddy@bindureddy·6 May

SubQ , a new type of AI model, says they are 50x faster and 20x cheaper than Opus 4.7 and GPT 5.5 In fact, they also say they perform INSANELY WELL on benchmarks and have a 12M context This would be earth shattering, if true - Anthropic/OpenAI's valuation would go to zero 😱

English

107

742

64.5K

Florian Leibert 🎢 retweetledi

Milk Road AI@MilkRoadAI·6 May

This is one of the craziest AI launches of 2026 and it came out of basically nowhere (Save this). A company called Subquadratic just shipped SubQ, and the benchmarks are almost hard to believe. To understand why this is such a big deal, you have to understand the fundamental problem that has defined AI for the last decade. Every large language model in existence is built on transformer architecture, and transformers use a mechanism called standard attention that checks every single word in a sequence against every other word. Double the context length and compute doesn't double, it quadruples, triple it and compute goes up nine times. This quadratic scaling is why frontier models have been stuck at roughly 1 million tokens, why running them at those lengths gets expensive fast, and why the AI labs have essentially been printing money charging you more the longer you need the model to think. The industry has known this problem existed since 2017 but they scaled it anyway. SubQ is built from the ground up to solve it. Instead of processing every possible token relationship, SubQ's sparse attention architecture identifies which relationships actually matter and ignores the rest meaning compute is used where it counts and wasted nowhere else. The result is that compute scales linearly with context length instead of exponentially, and the implications of that one architectural shift are enormous. At 12 million tokens, SubQ reduces attention compute by nearly 1,000x compared to standard frontier models and at 1 million tokens, it runs 52x faster than FlashAttention. And it does all of this while posting frontier level accuracy, scoring 95% on the RULER 128K long-context benchmark versus Claude Opus 4.6's 94.8%, and an 81.8 on SWE-Bench Verified coding tasks, besting Opus 4.6 (80.8) and DeepSeek 4.0 Pro. The cost comparison is where it gets genuinely insane. SubQ runs at under $1.50 per million tokens less than 5% of what Claude Opus charges. On the RULER benchmark, running the test with SubQ cost $8, running the same test with Claude Opus cost $2,600 and that's a 300x cost reduction at equivalent or better accuracy.. Subquadratic launched with $29 million in funding, SubQ is available today for early access via API, and SubQ Code, a coding agent built on the architecture ships alongside it. The transformer has been the unchallenged foundation of every major AI system since 2017. SubQ is the first serious evidence that something structurally better might have just arrived.

Alexander Whedon@alex_whedon

English

111

890

275.4K

Florian Leibert 🎢 retweetledi

Dan McAteer@daniel_mac8·5 May

SubQ is either the biggest breakthrough since the Transformer... > 52x faster than FlashAttention at 1mm tok context > 20x cheaper than Opus ...or it's AI Theranos. Requested early access so hopefully can investigate soon.

Alexander Whedon@alex_whedon

English

929

116.1K

Florian Leibert 🎢 retweetledi

shirish@shiri_shh·5 May

sir...they just dropped a new LLM with 12 MILLION context window and 10x cheaper than Opus 4.7

Alexander Whedon@alex_whedon

English

130

294

6.7K

1.6M

Florian Leibert 🎢@flo·5 May

@HotAisle Is it my VC??? 😂

English

Hot Aisle@HotAisle·5 May

so cute. vc writes me today to tell me they have this great company they just invested in that they want to introduce me to. the company is already a customer of ours. i told them they should have invested in me. crickets.

English

1.4K

Florian Leibert 🎢@flo·5 May

@sethprattsf @runsonai still not anywhere near the performance of GLM 5.1

English

Florian Leibert 🎢@flo·5 May

@sethprattsf @runsonai i Have about 1.6 TB of VRAM and literally still hard to optimize for single session tok/sec... finally got a 198% improvement.

English

Thanh Pham@runsonai·3 May

I used Codex to fine-tune Qwen 3.6: /goal I want to increase the efficiency of my qwen 3.6 27b 4bit on this machine, do anything that increases it by at least 30% On my m3 ultra. Guess what...it worked! It found multiple tweaks to make it more efficient.

English

318

31.4K

Florian Leibert 🎢@flo·5 May

@sethprattsf @runsonai Thank you -- could you explain some of the choices? What do you think actually has the win? Why is it that this model is so hard to "optimize" ?

English

Seth Pratt@sethprattsf·5 May

Stack - Target: spicyneuron/Kimi-K2.6-MLX-3.6bit - Draft: SubSir/Kimi-K2.6-DFlash-tmp - Local draft path: /Users/sethcosmo/models/kimi-k26-dflash-tmp - Runtime: patched native MLX DFlash at /Users/Shared/model-speed-lab/dflash-mlx - Hardware tested: M3 Ultra, 512 GB unified memory - Peak memory: about 466 GB Best Result - Exact DFlash decode: 30.38 tok/s - Artifact: /Users/Shared/model-speed-lab/results/kimi_k26_36bit_native_dflash_tmp_lazy_code64_spec5.json - Spec tokens: 5 - Verify mode: parallel-replay - Avg accepted length: 4.27 - Quality: 18/24 with strict system prompt - Without the system prompt, quality was much worse: 6/24 Server Command Current runner is qwen-opencode-test/start-kimi-dflash-server.sh: DFLASH_TARGET_LAZY_LOAD=1 \ /Users/Shared/model-speed-lab/dflash-mlx/.venv/bin/dflash-mlx-openai-server \ --host 127.0.0.1 \ --port 8104 \ --model-id kimi-k26-dflash \ --target-model /Users/sethcosmo/models/kimi-k26-36bit-mlx \ --draft-model /Users/sethcosmo/models/kimi-k26-dflash-tmp \ --max-speculative-tokens 5 \ --verify-mode parallel-replay \ --default-system-prompt "Follow the user instruction exactly. Return only the requested format." Manual Benchmark Command Runner: qwen-opencode-test/run-kimi-dflash-bench.sh DFLASH_TARGET_LAZY_LOAD=1 \ /Users/Shared/model-speed-lab/dflash-mlx/.venv/bin/dflash-mlx \ --target-model /Users/sethcosmo/models/kimi-k26-36bit-mlx \ --draft-model /Users/sethcosmo/models/kimi-k26-dflash-tmp \ --prompt "Write a Python implementation of Dijkstra algorithm with a small inline example. No prose outside code." \ --max-new-tokens 64 \ --speculative-tokens 5 \ --verify-mode parallel-replay \ --warmup-runs 1 \ --warmup-max-new-tokens 8 \ --profile \ --json Important Patches/Notes - Added Kimi adapter support in patched dflash-mlx. - Patched OpenAI server for Kimi tool-call parsing and thinking=False handling. - Added --default-system-prompt; this was needed for hard-format quality. - spec5 was the narrow best. Higher spec values did not improve enough; spec10 effectively clamps to 8 in this draft path. - 4-bit draft quant was slower than the unquantized DFlash draft. - Main blocker: verifier forward dominates. Profile showed most time in target MoE/MLP, not draft generation.

English

109

Florian Leibert 🎢@flo·5 May

@sethprattsf @runsonai But what were the parameters??? What was the breakthrough?

English

Seth Pratt@sethprattsf·5 May

@flo @runsonai one side note is I did hit an OOM crash with this setup yesterday unclear if it was totally the setup fault I had way too much running on the machine. was only a 16k context bench though.

English

Florian Leibert 🎢@flo·5 May

@sethprattsf @runsonai What was the trick for k2.6? Hitting hard limits even on a 200 GB vram for single session around 20 t/s scales to c=256 but no speed improvement on single session

English

Seth Pratt@sethprattsf·4 May

@runsonai I did the same it got me up to 51 tokens per second decode. Single thread. Uses 35b for speculative decode! /goals spent 30 hours on this. No quality loss across 24 bench marks vs bf16.

English

552

Florian Leibert 🎢@flo·3 May

A german bought this old train car and turned it into a super dope bar…

English

108

Florian Leibert 🎢@flo·3 May

First time at jazz fest and I have to say I came with little expectations, but New Orleans has that something — call it cultural backbone which might be rooted in music but more likely the connectedness, result of the common suffering of people during one of the biggest natural disasters in a western city in my lifetime. None of the woke pretense of San Francisco, but real cultural glue. It feels like the place the beatniks wrote about when they talked about Lawrence Kansas, a rebellious place but not with the aggression of people who want to be known to be rebellious— more like we don’t give a fuck about what you think ... its authentic - anti group think place where you’re handed a beer at a bar at 3am and notice you can’t even pay for it as they shut down the registers and keep going just enjoying amazing music and vibes. Go read Zeitoun, simple prose but a powerful story about nola during Katrina. New Orleans ❤️ 🤘🏻

Florian Leibert 🎢@flo

First time at jazz fest and I have to say I came with little expectations, but New Orleans has that something — call it cultural backbone which might be rooted in music but more likely the connectedness, result of the common suffering of people during one of the biggest natural disasters in a western city in my lifetime. None of the woke pretense of San Francisco, but real cultural glue. It feels like the place the beatniks wrote about when they talked about Lawrence Kansas, a rebellious place but not with the aggression of people who want to be known to be rebellious— more like we don’t give a fuck about what you think bc we do us. This is the anti capitalist and anti group think place where you’re handed a beer at a bar at 3am and notice you can’t even pay for it as they shut down the registers and keep going just enjoying amazing music and vibes. Go read Zeitoun, simple prose but a powerful story about nola during Katrina. New Orleans ❤️ 🤘🏻

English

396

Florian Leibert 🎢 retweetledi

David Goldberg 🦙@davidrgoldberg·1 May

$199/month? I built 80% of this (local Miami only) in 30 days for $500 w AI and charge $19/month. OneGuyMiami.com

Casa@getcasa

After two years of building under wraps, today we're announcing Casa – your personal property manager. We've raised $27M to redefine the homeownership experience from the ground up. We believe your home is your most treasured asset, emotionally & financially. It shouldn't also be a second job. Most homeowners are on their own – expected to have the time, expertise, and relationships to keep things running. Finding a plumber you can trust. Remembering when the HVAC was last serviced. Knowing what's actually wrong before someone shows up to fix it. Casa gives every homeowner what used to be reserved for the few: a dedicated team that knows your home deeply, handles the work, and stays in your corner. We're enabling this by building a deep, technical understanding of every home we serve – something that's never existed before, across 100 million single-family homes in the country. For $199/mo, membership includes: - A complete inventory of your home, built using specialized hardware & software - 1.5 hours of handyman time every month (and it rolls over) - Unlimited Concierge requests to take on virtually any home project - Custom, proactive care plans built specifically for your home - Weekly package and donation pickups - Scheduling and payments for your regular vendors - Utility and property tax monitoring …and we’re just getting started, with more benefits on the way to make the experience of owning your home as magical as it always should have been. Available now in the SF Bay Area and Los Angeles. Reserve your spot everywhere else. → getcasa.com

English

400

136.5K

Florian Leibert 🎢 retweetledi

Peter Girnus 🦅@gothburz·28 Nis

@hpygoluki @SunshineSass2 Thompson would have taken three bottles and filed 4,000 words about it by morning. These people took the wine and filed nothing.

English

1.1K

48K

Keşfet

@KyleHessling1 @JosephJacks_ @rodriscoll @HotAisle @sethprattsf @runsonai @elonmusk @BarackObama