
Akash Mahajan
719 posts

Akash Mahajan
@akashmjn
now 🎧; prev chatting with PDFs @ContextualAI; transcription @Azure Speech; @Stanford @atherenergy @iitmadras


Today's autoregressive models generate one token at a time. Mercury 2 generates tokens in parallel. Over 1,000 tok/sec on standard GPUs, at comparable quality to speed-optimized models. Since launch, the community has been showing what diffusion LLMs can unlock. Thanks to the team at Clyep for the breakdown.

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…







Breaking LLM inference’s autoregressive bottleneck 🛠️ We've teamed up with @haozhangml, @YimingBob, and @aaronzhfeng, among others from UCSD to achieve a massive 3.13X speedup for LLM inference on Google Cloud TPUs using Diffusion-Style Speculative Decoding (DFlash). Read the blog: goo.gle/4naZ8Yv

This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc. More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage: 1) raw text (hard/effortful to read) 2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default 3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default ...4,5,6,... n) interactive neural videos/simulations Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral x.com/zan2434/status… There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen. TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.





We’re open-sourcing Perception Encoder Audiovisual (PE-AV), the technical engine that helps drive SAM Audio’s state-of-the-art audio separation. Built on our Perception Encoder model from earlier this year, PE-AV integrates audio with visual perception, achieving state-of-the-art results across a wide range of audio and video benchmarks. Its native multimodal support can assist people in everyday tasks, including sound detection and richer audio-visual scene understanding. 🔗 Read the paper: go.meta.me/e541b6 🔗 Download the code: go.meta.me/7fbef0


Introducing Real-time Transcription with Speakers! - Step change in accuracy, surpassing top cloud APIs - Faster than real-time on Mac and iPhone - Still under 3 watts when all features are enabled Available in Argmax SDK 2.0 for early access! Benchmarks and details in comments.

Today's a special day for @LightspeedIndia. Introducing INDIA ASCENDS'2026, a program purpose-built for India's youngest (>25yo), boldest, cohort of world shapers & change makers. If you are one of them, put your headphones on, turn to volume to max, click on the video, and read on :) Building something is hard but building something the world has never seen before is nigh impossible. There is this concept of not just building a kingdom, but building a kingdom at the edge of a precipice -- founders who want to go all the way to the edge of what’s possible, beyond which there is no land, there is no road, the compass stops working, and they look into the abyss, and say ‘yes, this is for me, this will be my life’s work’. These are the rarest of birds that take the plunge and know they’d fall before they fly, but when they fly, oh how glorious do they look. We @lightspeedindia have been fortunate to partner with several of these founders. We met @PixxelSpace when the founders @awaisahmedna @kshitijgokul were just 22yo. We backed @Airbound_Aero when @TheRealNamzoo was just 17. We’ve backed many others doing their life’s work at absolute cutting edge of what’s possible - @Arun_Vinayak_S of @ExponentEnergy, @pratykumar & @vivekrag of @SarvamAI @devdutdalal & @XaviLaguarta at @MittiLabs and more. Beyond our portfolio, there is some amazing founders doing their life’s work - @PawanKChandana @SkyrootA , @sohamsankaran @PopVaxIndia , @khushhhi_ @AsperaAero, @nagokul @CynLr00, @adrnschm @sarlaaviation, @deepigoyal @lataerospace & @temple, @Manu_J_Nair @EtherealXTech & many more. We need more of these founders coming out of India. Not just that, we need to fill gap that exists in this market which is in backing really young (<25yo) founders who are tinkering in school or college labs, or spending their weekends building, experimenting and failing fast, and are truly building globally competitive and de novo tech that, if it works, can have huge consequences in the world. To that end, we are proud to launch INDIA ASCENDS'2026 -- our flagship yearly program for the most cracked young builders in the country doing incredible cutting-edge research in robotics, quantum, space, energy, AI, bio or more. Our program applications open today and we’ll select 12-15 of the best, boldest ideas that we think has the potential to shape the future. We’ll bring them all to BLR for a 2-day program. Each participant will get ~$100K in support from our partners @AnthropicAI @GroqInc , @googlecloud @awscloud and we’ll also select 3-4 winners who will get venture funded to build their dream starting from $200K all the way to $3M and almost $500K of non-dilutive credits & grants from our partners. We look forward to seeing the boldest ideas you've been working on. Link to apply in the first comment:











