
Introducing Kitten TTS V0.8: open-source TTS that fits in 25MB. Three variants: 80M | 40M | 14M (<25MB) Highly expressive. Runs on CPU. Built for edge. No GPU? No problem. Ship voice anywhere. Check it out:
Simon Vans-Colina
6.5K posts

@simonvc
I make banks. CTO and Co-Founder @pave_bank previously @monzo

Introducing Kitten TTS V0.8: open-source TTS that fits in 25MB. Three variants: 80M | 40M | 14M (<25MB) Highly expressive. Runs on CPU. Built for edge. No GPU? No problem. Ship voice anywhere. Check it out:










look hermes 4.3 36B has something going on. gave it the same octopus invaders prompt that qwen 3.5 built in one shot. but on 2x RTX 3090 with 128K context instead of 32K. on 1x it compacted 8 times in 24 minutes and gave up at 7 out of 10 files. intelligence was there but room wasn't. on 2x it just started grinding. 15 minutes of uninterrupted autonomous coding. zero compactions. HTML structure, full CSS, game engine, collision detection, particle system, 4 layer parallax background, enemy spawning logic, audio system. 29,000 tokens in one continuous session. both GPUs splitting load at 50%. never once stopped to compress and forget. the output quality feels different. the way it structures code, names functions, handles edge cases. dense architecture with 36 billion parameters all active on every token. you can feel the weight behind every line. qwen 3.5 built a clean game. hermes writes like it understands what it's building. 48GB of VRAM on two consumer GPUs in 2026. no h100. no cloud. two 3090s that cost less than one month of API bills for most startups. results and full game playthrough coming tonight. i want to see the design personality. NousResearch might have cooked something special here. hermes 4.3 36B. 2x RTX 3090. Q4_K_M. the full session is in the video.


i added a second RTX 3090 to hermes 4.3 36B and generation speed didn't change. 35.3 tok/s on 1x. 35.53 on 2x. zero overhead. every extra byte of VRAM went to context. not speed. on a single 3090 this model fills 21.8GB at Q4_K_M. leaves room for 32K context, 22K usable. ran octopus invaders on it last night. 8 compactions. 970 lines across 7 files before it got stuck in a loop. the intelligence was there but the room wasn't. added the second GPU. pushed context until it OOM'd. 162K at q8_0 KV is the ceiling. 192K dies. that's 5x more context, 7x more usable room, and the KV cache quality doubled from 4-bit to 8-bit. this model has 512K native context. on 1x you're using 6% of what it was trained for. on 2x, 31%. it's not slow. it's starved. full flags and specs in the chart below. same octopus invaders prompt goes in next. 7x more memory. zero compactions is the target.




We just turned WiFi signals into a radar that can see through walls and estimate exact poses of people. Surveillance just got order of magnitude more easy todo. No need for cameras. Git hub repo close to 12k ⭐️ github.com/ruvnet/wifi-de… x.com/BoWang87/statu…


Footage of an Iranian ballistic missile hitting the headquarters of the US Navy's 5th Fleet at Naval Support Activity (NSA) Bahrain earlier today.
