Layla CryptoWhiz
3.9K posts

Layla CryptoWhiz
@laybitcoin1
🌐 Blockchain fanatic & your guide to financial markets 💸






People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…





As a part of our research, we are releasing the fastest GPT-oss speculative decoding models out there, increasing throughput up to 50% on long context and large batches! huggingface.co/collections/Do… Available on day-0 with SGLang ! Huge thanks to @sgl_project @lmsysorg 🙏🏻


I've reached the point of no return. I'm officially doing 95% of my work on Codex.


We just crossed 1,000,000 public datasets on Hugging Face! That's petabytes of data available that millions of AI builders are downloading, analyzing, and training AI models on every day! What's interesting is that we see a clear acceleration since agents started to be good as the number of datasets doubled over the past 8 months (it took 4 years to reach the first 500k). It's becoming easier and faster to build, share and use your own datasets! Many are saying the next bottleneck for more people to build AI themselves (instead of relying on APIs) is better data so we're just getting started! Thanks everyone for your amazing contributions, we couldn't do it without you!



Our cofounder @iamtimdavis built an AI storybook app using @BlackForestLabs' FLUX2 and @googlegemma 4 on Modular Cloud. Pick a character, make choices, and the story branches endlessly, with every page written and illustrated in real time. Tim has spent his career obsessing over inference latency, first at Google, now at Modular. Building something his kids use settled it: in a real-time generative app, the inference platform determines the experience as much as the model. The numbers back that up. From 24 hours of production traffic: first prose in 420ms, a full illustration in under 6 seconds, 85% of page turns in 48ms. Create your own story with Inkwell and share it. We're sending swag to our favorites: inkwell.modular.com






New in Claude Code: agent view. One list of all your sessions, available today as a research preview.




Pi Agent vs OpenCode token usage A lot of people recommended Pi Agent so I decided to check Pi Agent took 1.1k tokens in first turn OpenCode took 11.5k Setup: 1) Trimmed OpenCode (from usual 30k first turn to 11.5k) - 0 MCPs - 2 lightweight plugins (opencode-env-protect and openslimedit) - 8k char AGENTS.md - 11585 deepseek-v4-flash input tokens - for $0.0016 2) Vanilla Pi - 0 MCPs - 0 system prompt - 8k char AGENTS.md - 1114 kimi-k2.6 input tokens - for $0.0008 I think using better models with capped tokens per turn can keep usage nearly the same as uncontrolled DeepSeek V4 Flash? The challenge now is finding the sweet spot. Can't cap tokens if quality drops. We'll see. Video below - OpenCode vs Pi side by side "say hi back" first turns test, with OpenCode Go usage for each







Narrative violation: with the right policy, new data centers can *lower* electricity prices by spreading the fixed costs of the grid out among more paying customers.






