
Sidakov's Handfight System youtu.be/lV_8E3C8Bv8
Matt Krzus
566 posts

@mattkrzus
research enjoyoor. jiu jitsu black belt. roadhouse is the greatest movie ever made.

Sidakov's Handfight System youtu.be/lV_8E3C8Bv8

a significant % of ml researchers might be hooked by what happened in ONE day. ai seems to be doing a research loop fascinatingly well (understand the problem + propose a change + train/test it + measure results + keep the better version + repeat) and genuinely reducing research friction. we are early to automated experimentation, frontier scale could be an interesting watch.

When Ilya Sutskever once explained why next-word prediction leads to intelligence, he made a metaphor: if you can piece together the clues and deduce the criminal’s name on the last page, you have a real understanding of the story. 🕵️♂️ Inspired by that idea, we turned to Ace Attorney to test AI's reasoning. It’s the perfect stage: the AI plays as a detective to collect clues, expose contradictions, and uncover the truth. We put the latest top AI models—GPT-4.1, Gemini 2.5 Pro, Llama-4 Maverick, and more—to the test in Ace Attorney, to see if they could shout Objection! ⚖️, turn the case around, and uncover the truth behind the lies.


🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster ⚡ 40+ GiB/s peak throughput per client node for KVCache lookup 🧬 Disaggregated architecture with strong consistency semantics ✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1 📥 3FS → github.com/deepseek-ai/3FS ⛲ Smallpond - data processing framework on 3FS → github.com/deepseek-ai/sm…

In 2023 and 2024 labs perfected the listicle with post-training/rlhf. In 2025 the personality training of models is on center stage. There's almost 0 academic work on Character Training and almost 0 work on the web writ large. We need to change that - it starts with this post.

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 github.com/deepseek-ai/Du… ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗 github.com/deepseek-ai/ep… 📊 Analyze computation-communication overlap in V3/R1. 🔗 github.com/deepseek-ai/pr…

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800 🔗 Explore on GitHub: github.com/deepseek-ai/Fl…

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented, deployed and battle-tested in production. As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey. Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.