Manu_TechAndGames

16.6K posts

Manu_TechAndGames

@AndroidBlogger

Developer, interested in many different tech fields. Working at Ubisoft Paris.

Katılım Şubat 2009

204 Takip Edilen207 Takipçiler

Manu_TechAndGames retweetledi

Aleksa Gordić (水平问题)@gordic_aleksa·14h

New in-depth blog post time: "Inside TPU and GPU Clusters: The Anatomy of Collective Communication". If you want to deeply understand the core primitives behind scaling the training / inference for MoEs and dense transformers, going a level below FSDP, expert parallelism, data parallelism, model/tensor parallelism this might be a fun read. I cover: * TPU cluster topology: (super)pods, slices, DCN, PCIe, ICI * All-Gather: 1D/2D rings, and path algo (lots of visuals so should be crystal clear how these work even if you're not a perf engineer) * Reduce-Scatter (which is the dual of AG) and All-Reduce * All-to-All (used to dispatch tokens to target experts in MoEs) * NVIDIA GPU cluster topology (reference DGX architecture): nodes, scalable units, fat tree * GPU collectives within the node: rings, trees (log2 steps), and SHARP (in network compute unit) * GPU collectives across nodes, hierarchical algorithms over InfiniBand etc. I was heavily inspired to do this deep dive after reading the excellent Scaling book by an excellent group of people @jacobaustin132 @_sholtodouglas @reinerpope and others! What originally started as "let me maybe just make four figures covering All-Gather, Reduce-Scatter, All-Reduce, and All-to-All so I can understand them better, it shouldn't take more than a day, right, right?" somehow turned into this 40 figures later. Along the way, I realized that the collective algorithms only really make sense once you understand the underlying hardware topology. TPUs were a bit easier to reason about, but I couldn't skip GPUs, I love them too much. Rings are cool, but I also wanted to understand tree algorithms. But also SHARP, and fat trees, and hierarchical collectives. :') So the scope slowly expanded, and little by little, this blog post came to fruition. Just a side-quest. Hope you like it! :) --- Also a big thank you to my friends for reviewing the blog and providing feedback: * @ArunDemeure (prev GPU/AI stuff at Magic, GPU architect at Apple and Imagine, my llm.c buddy!) * @axel_s_feldmann (making GPUs go brrr at Jane Street, we met for the first time at @marksaroufim's excellent GPU mode event) * @pranjalssh (ex xAI GPU wizard, one of two people who inspired my original matmul blog!)

English

131

901

72.9K

Manu_TechAndGames retweetledi

Lumina@LuminaXspace·18h

🚨 GLM-5.5 could arrive in August Z.ai is reportedly preparing GLM-5.5: • Reportedly targeting an August release • Rumoured to contain more than 1 trillion parameters • Expected to build on GLM-5.2’s 1M-token context window • Likely to focus heavily on long-running coding agents to match other western companies No exact day of release have been announced as of yet. With how well GLM-5.2 performs this model should be incredible. Are you looking forward to GLM-5.5?

English

1.2K

85.1K

Manu_TechAndGames retweetledi

Math Pulse@themathpulse·14h

This is an elegant way to simulate falling sand: Define simple rules for all 16 possible 2×2 grids and let a cellular automaton do the rest! Watch the grains cascade, spread, and pile up realistically.

English

135

89.5K

Manu_TechAndGames@AndroidBlogger·10h

True

Tereza Tizkova@tereza_tizkova

Integrating with an SDK is now often as much work as calling the HTTP API directly

English

167

Manu_TechAndGames@AndroidBlogger·10h

I love those optic illusions

Nebraskangooner@Nebraskangooner

This is tripping me the hell out...

English

Manu_TechAndGames retweetledi

Pankaj Kumar@pankajkumar_dev·17h

Kimi K3 Leaks: Coming Tomorrow - Kimi K3 is to be launch tomorrow, according to leaked Kimi API platform page - It will launch alongside a limited-time API top-up promotion, offering 10-30% bonus credits depending on the recharge amount. - Kimi K3 is built on a new architectural innovation, rather than being just a larger version of previous Kimi models.

English

1.3K

98.3K

Manu_TechAndGames retweetledi

Frank Force 🌻@KilledByAPixel·13h

No AI. No Threejs. No bytes to spare. My new game is a 3D retro 90's fever dream that fits in only 1024 bytes! 🌈☁️ killedbyapixel.github.io/TinyCode/1K/Sk… Link to build with extra control options for accessibility (1k build has only mouse input)

English

169

1.9K

415.3K

Manu_TechAndGames retweetledi

Scott Miller - Apogee/3D Realms Founder ☢️@ScottApogee·16h

I was in the theater with friends and was telling them: "We've got that slow motion effect in Max Payne!" And we'd already trademarked "bullet time" for the game. After we release the game in 2001, Warner Bros. threatened to sue us for using "bullet time", but we were easily able to prove to them that we had the trademark first. So they backed all the way down :)

cinesthetic.@TheCinesthetic

Imagine sitting in a theater in 1999 and witnessing this for the very first time.

English

354

4.7K

209K

Manu_TechAndGames retweetledi

PrismML@PrismML·13h

Today, we’re announcing Bonsai 27B: the first 27B-class model to run on a phone. Bonsai 27B is the new multimodal flagship of the Bonsai family. Based on Qwen3.6 27B, it brings a new capability tier to local AI: multi-step reasoning, structured tool use, long-context workflows, and coherent agentic loops. Until now, models in this class have been impractical to deploy locally. A 27B model occupies roughly 54 GB in 16-bit precision, and even a strong 4-bit build is around 18GB - too large for a phone and for most laptops. Bonsai 27B changes that. It comes in two variants: • Ternary Bonsai 27B: 5.9 GB, 1.71 effective bits per weight, optimized for laptop-class quality. • 1-bit Bonsai 27B: 3.9 GB, 1.125 effective bits per weight, optimized for phone-class footprint. Everything is open-sourced today under the Apache 2.0 license.

English

217

708

4.5K

752.1K

Manu_TechAndGames retweetledi

Daniel Glejzner@DanielGlejzner·20h

Software hiring has become absurd. At work, you’re expected to use AI to offload manual coding and move faster. Then, to get your next contract, you’re asked to code from memory with no assistance. Pass the interview - and you’re expected to use AI again. It has never been this broken.

English

240

359

5.9K

234.5K

Manu_TechAndGames retweetledi

Hedgie@HedgieMarkets·1d

🦔Meta's Louisiana data center just crossed $250 billion in total expected cost. The company publicly disclosed $50 billion. Bloomberg reports the rest is largely for computing chips going into a nearly 4,000 acre campus. Blue Owl Capital owns 80% of the project through an SPV, with Meta leasing it back. Entergy Louisiana is building 10 new gas-fired power plants just to supply the campus with electricity. Once operational it will support 1,000 jobs. Meta is already floating the idea of renting excess compute to outside customers. My Take $250 billion on a single site, and 80% of it is owned by Blue Owl Capital, the same firm that froze withdrawals from its private credit fund earlier this year. Meta leases it back. If AI revenue doesn't materialize at the scale Zuckerberg is betting on, Blue Owl's investors are the ones holding a 4,000 acre liability in rural Louisiana powered by 10 new gas plants and employing 1,000 people. Meta also started talking about renting excess compute to outside customers. A company doesn't spend $250 billion on infrastructure and then float the idea of subleasing unless the original demand projections left room for doubt. Meta makes $50 billion a quarter in ads, so they can absorb a lot of pain. But "we'll rent the extra" applied to AI compute at this scale is a bet that enough tenants exist to fill a campus the size of a small town. I'm not sure they do. Hedgie🤗

English

268

23.1K

Manu_TechAndGames retweetledi

Tereza Tizkova@tereza_tizkova·15h

finally found time to look into the open-source X algo again. Yes you should quote and comment, not repost and like. my blog post in comment

English

1.6K

Manu_TechAndGames retweetledi

Tencent Hy@TencentHunyuan·22h

We’ve just released the 1-bit & 4-bit version of Hy3, a flagship-scale 295B model that can be served on a single GPU. 👌 Run Hy3 with llama.cpp, enable MTP, and experience powerful intelligence on dramatically lower hardware.🚀🚀🚀 Can’t wait to see what you build. #Hy3 #Hy #GGUF #llamacpp

Tencent Hy@TencentHunyuan

🚀Hy3 is here. 295B MoE. Best in its size class. Rivals trillion-scale flagships. Reliable and affordable for most agentic usecases. Apache 2.0. Friendly for commercial use. FREE API for 2 weeks → openrouter.ai/tencent/hy3:fr… 🤗 huggingface.co/tencent/Hy3 📖 hy.tencent.com/research/hy3

English

152

1.3K

287.8K

Manu_TechAndGames retweetledi

Tengfei Wang@DylanTFWang·1d

⚡️ Tencent HY-World 2.1 is HERE! Not a video. A world. 🪐 3 months just after 2.0, we've supercharged everything: ✨ Cleaner geometry ✨ Sharper rendering ✨ Larger explorable range It's a REAL 3D world you can Walk through and Interact with.

English

348

22.3K

Manu_TechAndGames retweetledi

Eitan Borgnia@EBorgnia·1d

We found compacting agent context on cache misses reduces token costs by over 50%. The graphic is a real Fable 5 trace from our CTO, where we simulate cost would before/after compacting after periods of over 5 mins of inactivity. By keeping context within 128k-256k tokens, spend goes from being quadratic to linear in the number of turns. The only problem is self-summary takes 1-2 minutes on big traces! We trained an absurdly fast compact model to get the cost savings without harming the user experience. Relace Compact is a modified LLM designed for token-level classification, which means we can run it at 50k tok/s and an order of magnitude lower than cache read cost ($0.20/million input and $0.20/million output tokens). Try it out in Jacq, or visit our docs to integrate into your own product!

English

349

69.2K

Manu_TechAndGames retweetledi

Entelligence AI@EntelligenceAI·1d

🚨 NEW MODEL ALERT Singapore just dropped a model that's putting up frontier-level numbers. Agnes 2.5 Pro: • 82.7 on SWE-bench Verified • 78.7 multilingual • Strong gains on SWE Atlas • Beating GLM 5.2 and DeepSeek V4 Pro on multiple cuts • Free API available today The biggest takeaway isn't the benchmark here. It's the country. NOW frontier is no longer just a US vs China story.

English

424

78.2K

Manu_TechAndGames@AndroidBlogger·21h

@AlexFinn In two years ? Somewhere in 2029 ? 🤔

English

Alex Finn@AlexFinn·1d

You are going to be able to run Fable 5 locally on your desk In 2 years Apple will be coming out with Mac Studios with 1.5TB of memory With just 300gb of memory you can run Opus 4.8 level intelligence Think of what you can do with 5x that You need to be preparing for this now Start getting familiar with local AI technology Go to your Hermes/OpenClaw and use this prompt: “I am brand new to local AI and want to get familiar. Look at the computer you are currently on. Understand the specs. Then go on Huggingface and find the best models I can run on it. Then, walk me through how these models work, how they will run locally, and use cases I can do with them. After walking me through all of that so I’m educated, you can then load it onto this computer and build an interface so I can use them” In 2 years EVERYONE on Earth will have a local model running on their desk The people who start preparing now will be WAY ahead of everyone else

AppleTrack@appltrack

Apple's M7 Ultra chip coming in 2029 is rumored to support 1.5TB of RAM. This would make the processor much more capable for on-device AI.

English

277

223

3.1K

555.3K

Manu_TechAndGames@AndroidBlogger·22h

This is incredible...

Chubby♨️@kimmonismus

One dose of a frog-gut bacterium completely eliminated colorectal tumors in every treated mouse. Not merely shrank them. Complete response. The bacterium, Ewingella americana, multiplied roughly 3,000-fold inside the tumors within 24 hours. It attacked cancer cells directly while recruiting T cells, B cells, and neutrophils. In the experiment, it outperformed four doses of anti-PD-L1 immunotherapy and liposomal doxorubicin. Then researchers rechallenged the cured mice with the tumor: 0/10 developed tumors. 10/10 untreated mice did. The bacterium disappeared from the bloodstream within 24 hours and wasn’t detected in healthy organs. It’s one small mouse study, not a human cancer cure. But the concept is remarkable: a living drug that finds the tumor, multiplies inside it, destroys it, and potentially teaches the immune system to remember. In the foreseeable future, we will cure all cancers.

English

Manu_TechAndGames@AndroidBlogger·22h

@Yun_HDY That's really great. I still believe it's not enough for games. Animations in games have much more constraints than 'just looking good' that this model won't be able to manage. But very promising anyway.

English

Manu_TechAndGames retweetledi

Hardy@Yun_HDY·1d

AI-native game engines will probably look like this. Stop farming endless walk/run/limp state machines. Give it text + paths + keyframes and the character keeps moving in real time. NVIDIA ARDY just dropped with code and model checkpoints. research.nvidia.com/labs/sil/proje…

English

232

22.3K

Keşfet

@jacobaustin132 @_sholtodouglas @reinerpope @ArunDemeure @axel_s_feldmann @marksaroufim @pranjalssh @AlexFinn