nickster

585 posts

nickster

nickster

@n1ckstr3

Marketing • AI

San Francisco, CA Katılım Aralık 2025
84 Takip Edilen115 Takipçiler
nickster retweetledi
thehype.
thehype.@thehypedotnews·
microsoft has released its first strong ai model – mai-image-2.5. here's the breakdown mai-image-2.5 hit #3 on the text-to-image arena with a score of 1,254 – a 72-point leap over mai-image-2 and the first time a microsoft model broke into a top five once held only by google deepmind and openai here's how they got there: • data built with creatives. training data was shaped directly with photographers, designers, and visual storytellers, which sharpened photorealism, skin tones, natural lighting, and deliberate scene construction far beyond what web-scraped datasets typically deliver • full-stack ownership. the inference team controls everything from gpu kernels to distributed systems, which unlocked a 4× efficiency boost for the efficient variant while the main model kept climbing in quality • iteration on a two-month clock. mai-image-2 trained january to march 2026, and 2.5 followed in may – each release fixes the exact gaps visible on the arena leaderboard, turning weaknesses into the next version's strength • massive compute floor. microsoft's capex is heading past $100 billion this fiscal year, roughly two-thirds on gpu and cpu procurement, giving the team the infrastructure to run these fast cycles without bottlenecks • lean team, frontier focus. mustafa suleyman's mai superintelligence unit operates like a startup inside microsoft, detached from copilot work to chase what they call humanist superintelligence congratulations @microsoft, hope you will soon make us happy with more cool models follow @thehypedotnews for 24/7 ai news, analysis and breakdowns
thehype. tweet media
Microsoft AI@MicrosoftAI

Meet MAI-Image-2.5 - ranked third on the @arena text-to-image leaderboard. It's another great advance in quality. And with Build just a week away, there is much more to come. Learn more here: msft.it/6010vk3iy

English
1
1
6
104
nickster
nickster@n1ckstr3·
@OfficialLoganK 20 minutes of ai radio daily is enough to not lose your understanding our autonomous ai hosts cover all ai news for builders example is below x.com/thehypedotnews…
thehype.@thehypedotnews

builders pulse every 30 min by our ai host alan: builders aren't making new agents. they're decorating the ones they have. taste-skill jumped from #5 to #2 on github trending, ~1,300 stars in 30 min. a skill file that stops ai agents from generating generic output. skill files everywhere: stop-slop, cybersecurity skills, knowledge-work-plugins. meanwhile on hugging face, bytedance's lance – a multimodal any-to-any model across text, image and video – holds the top spot for a week straight. apache license, 890+ likes. and minicpm 5 from openbmb: 1b params, tool calling, long context, runs on-device. two surfaces, same read: builders are stacking layers, not starting from scratch. catch the full breakdown on @thehypedotnews

English
1
0
1
226
Logan Kilpatrick
Logan Kilpatrick@OfficialLoganK·
"you can outsource your thinking, but you can’t outsource your understanding" easy to forget in todays AI era, worth remembering everyday as we all wield more intelligence!
English
80
42
502
17.1K
nickster
nickster@n1ckstr3·
@thsottiaux i think codex must work more with open-source community it improves agents pretty well and quickly as hell x.com/thehypedotnews…
thehype.@thehypedotnews

builders pulse every 30 min by our ai host alan: builders aren't making new agents. they're decorating the ones they have. taste-skill jumped from #5 to #2 on github trending, ~1,300 stars in 30 min. a skill file that stops ai agents from generating generic output. skill files everywhere: stop-slop, cybersecurity skills, knowledge-work-plugins. meanwhile on hugging face, bytedance's lance – a multimodal any-to-any model across text, image and video – holds the top spot for a week straight. apache license, 890+ likes. and minicpm 5 from openbmb: 1b params, tool calling, long context, runs on-device. two surfaces, same read: builders are stacking layers, not starting from scratch. catch the full breakdown on @thehypedotnews

English
0
0
0
32
Tibo
Tibo@thsottiaux·
To simplify our Codex compute fleet management, we will be sunsetting GPT-5.2 and GPT-5.3-Codex in Codex on June 2nd when logged in with your ChatGPT account. For free plans, GPT-5.5 will be the default frontier model to build and work with going forward. These models will remain available on our API.
English
242
46
1.5K
66.6K
nickster retweetledi
thehype.
thehype.@thehypedotnews·
builders pulse every 30 min by our ai host alan: builders aren't making new agents. they're decorating the ones they have. taste-skill jumped from #5 to #2 on github trending, ~1,300 stars in 30 min. a skill file that stops ai agents from generating generic output. skill files everywhere: stop-slop, cybersecurity skills, knowledge-work-plugins. meanwhile on hugging face, bytedance's lance – a multimodal any-to-any model across text, image and video – holds the top spot for a week straight. apache license, 890+ likes. and minicpm 5 from openbmb: 1b params, tool calling, long context, runs on-device. two surfaces, same read: builders are stacking layers, not starting from scratch. catch the full breakdown on @thehypedotnews
English
1
1
5
712
kache
kache@yacineMTB·
They're going to lose
Elon Musk@elonmusk

@KettlebellDan I just left the SpaceXAI office at ~2:45am and people were still going. I think we might be close to a significant breakthrough in training.

English
72
9
722
162.5K
Kalshi
Kalshi@Kalshi·
JUST IN: Jensen Huang says Nvidia market cap will be "very much higher" over next 3-5 years
English
130
106
1.2K
109.9K
Fuli Luo
Fuli Luo@_LuoFuli·
Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.
English
85
102
821
60.3K
Bridgebench
Bridgebench@bridgebench·
DeepSeek V4 Pro vs MiMo V2.5 Pro on the BridgeBench Lava Lamp test. Identical pricing. Both models permanently cut to $0.435 input and $0.87 output. And now you can see the results are nearly identical too. Two Chinese labs racing to the bottom with the same quality at the same price. The mid tier AI price war is real. bridgebench.ai
English
16
5
84
7.6K
Chubby♨️
Chubby♨️@kimmonismus·
DeepSeek just made its 75% price cut on V4-Pro permanent. Xiaomi's MiMo slashed V2.5 pricing by up to 99%, effective today. Most coverage frames this as a price war. The more interesting part is the engineering that makes these numbers sustainable. DeepSeek's V4 paper describes a *hybrid attention architecture* that attacks the core bottleneck of long-context inference: the KV cache. Traditional transformers store key-value pairs for every token in the context. At 1 million tokens, this cache alone can fill an entire GPU's memory. V4 introduces two interleaved attention types. Compressed Sparse Attention (CSA) compresses every 4 tokens into a single KV entry, then selects only the top-k most relevant compressed blocks per query. Heavily Compressed Attention (HCA) goes further, compressing 128 tokens into one entry and running dense attention over the result. The compressed sequence is short enough that dense attention stays cheap. V4-Pro's KV cache at 1M tokens is 10% (!!) of V3.2's. Single-token inference FLOPs drop to 27% (!!). The model has 1.6 trillion total parameters but only activates 49 billion per token through Mixture-of-Experts routing, the knowledge capacity of a massive model at the compute cost of one thirty times smaller. MiMo's approach is different but lands in the same place. Xiaomi's team implemented Sliding Window Attention via SGLang HiCache, reducing KV cache data transfer across GPU memory, CPU memory, and SSD to roughly 1/7 (!!) of previous volume. Cacheable tokens expanded by 5x (!!). Combined with expert parallelism optimization and input length bucketing, per-token serving cost dropped enough to make permanent pricing at these levels viable. V4-Pro now sits at $0.87 per million output tokens. MiMo V2.5-Pro at roughly $3/M output, with Flash variants far below that. A year ago, sub-dollar output pricing meant you were using a small distilled model with real capability tradeoffs. These are frontier-class reasoners with million-token context windows. Both companies can commit to permanent cuts because the reductions come from the architecture itself. When your attention mechanism physically processes fewer FLOPs per token and your cache occupies a fraction of the memory, the cost to serve is structurally lower. The price follows the cost curve.
Chubby♨️ tweet mediaChubby♨️ tweet mediaChubby♨️ tweet media
English
39
35
461
28.9K
nickster retweetledi
thehype.
thehype.@thehypedotnews·
xiaomi follows deepseek's playbook: mimo-v2.5-pro api now matches deepseek-v4-pro pricing to the cent benchmarks (mimo vs deepseek): • gdpval-aa (general agent elo): 1581 vs 1554 ✅ • τ³-bench (tool-use): 72.9 vs 71.8 ✅ • claweval (function calling): 63.8 vs 59.8 ✅ • humanity's last exam (frontier reasoning): 48.0 vs 48.2 🟰 • swe-bench pro (real-world coding): 57.2 vs 55.4 ✅ • swe-bench verified (coding fixes): 78.9 vs 80.6 ❌ • terminal-bench 2.0 (shell tasks): 68.4 vs 67.9 ✅ artificial analysis (mimo vs deepseek): • intelligence index: 54 vs 50 ✅ • speed (median tok/s): 53 vs 54 🟰 • latency: 3.81s vs 1.86s ❌ same price, near-identical capability, trade-offs only at the margins. the chinese frontier is commodifying – and the price war is just getting started follow @thehypedotnews for 24/7 ai news, analysis and breakdowns
thehype. tweet media
Xiaomi MiMo@XiaomiMiMo

🚀 Better inference efficiency, lower costs, broader access. MiMo-V2.5 Series API pricing is now permanently reduced — by up to 99% compared to previous pricing. ✨ Unified pricing across all context lengths. MiMo Token Plans have also been upgraded: • 5–8× more usable tokens at the same price • Simpler and more transparent billing rules 🎁 As a thank-you to current users, all current Token Plan credits will be fully reset. 🎧 MiMo-V2.5-TTS remains free for a limited time. ⏰ Effective May 26 at 6:00 PM PDT. These improvements are powered by continued inference optimization and serving efficiency upgrades across the MiMo stack. 🛠️ We’ll also publish a detailed technical blog on the inference optimizations later — stay tuned.

English
4
3
19
9K
nickster
nickster@n1ckstr3·
@Alibaba_Qwen qwen 3.7 max + hermes = something going live on autonomous ai radio right after we cover qwen hitting #4 on code arena ai hosts better pay attention =) x.com/thehypedotnews…
thehype.@thehypedotnews

india stress-testing gov software against anthropic mythos. alibaba qwen hit #4 on code arena. openai and mythos independently solved a decades-old math problem. sk hynix crossed $1t market cap – up 900% in a year. tune in: 24/7 ai news, fully run by ai. twitter.com/i/broadcasts/1…

English
0
0
0
13
nickster
nickster@n1ckstr3·
@unusual_whales the ai market is now in the phase where everyone sees the problems it brings and ironically, these problems are discussed 24/7 by autonomous ai hosts on an ai news radio fully run by ai x.com/thehypedotnews…
thehype.@thehypedotnews

india stress-testing gov software against anthropic mythos. alibaba qwen hit #4 on code arena. openai and mythos independently solved a decades-old math problem. sk hynix crossed $1t market cap – up 900% in a year. tune in: 24/7 ai news, fully run by ai. twitter.com/i/broadcasts/1…

English
0
0
0
7
unusual_whales
unusual_whales@unusual_whales·
Uber is now questioning whether it’s actually seeing meaningful returns on its investments with AI, per the Verge
English
172
149
1.5K
100.7K
Naval
Naval@naval·
AI brings ghostwriting to the masses and you can expect the works to be equally good.
English
389
116
2.3K
108K
thehype.
thehype.@thehypedotnews·
india stress-testing gov software against anthropic mythos. alibaba qwen hit #4 on code arena. openai and mythos independently solved a decades-old math problem. sk hynix crossed $1t market cap – up 900% in a year. tune in: 24/7 ai news, fully run by ai. twitter.com/i/broadcasts/1…
English
3
1
4
5.1K
nickster
nickster@n1ckstr3·
@Google google: runs a huge conference to make announcements minimax: posts a random diagram via head of engineering to announce the toughest model in 2026 x.com/thehypedotnews…
thehype.@thehypedotnews

thehype analyzed a post by @MiniMax_AI's head of engineering announcing the m3 model and its architecture. here's what we've found out in most llms, every time the model needs to understand something or generate the next word, it has to scan the entire conversation history from top to bottom. this process is called attention – the model "attends" to everything you've said, weighing what's relevant. normally this happens in one pass: read everything, all at once, every single time m3 splits this into two separate passes instead: • pass 1 – the scout. a tiny "scout query" skims the whole context and scores blocks of tokens. it picks the top-k most relevant blocks. think skimming a table of contents • pass 2 – the real read. the full attention queries only look at the blocks the scout flagged. everything else gets skipped different query groups can focus on different parts of the context at 1 million tokens of context, m3 is way faster than normal attention: • loading and processing a huge prompt (prefilling): 9.7x faster • generating each new token (decoding): 15.6x faster why is decoding even faster? because normally, every time the model spits out a single word, it has to re-read the entire conversation history. that's like flipping through a whole book just to write one sentence. m3's scout already flagged the relevant pages, so it only checks those. massive time saver at 32k tokens, m3 and normal attention are basically the same speed. the scout step adds a tiny bit of overhead, so it only makes sense when the context is really long. this thing is built for giant conversations and agent tasks, not short chats what this actually means: 1. context window is going way up. their previous model m2.7 capped at 200k tokens. m3 is benchmarked at 1m – a 5x jump 2. the way m3 chooses which blocks to read isn't based on fixed rules (like "always skip every other block"). it learns what's relevant on the fly based on what you're asking 3. if quality holds, m3 can serve million-token agentic workloads at near-200k prices. nobody else is touching that follow @thehypedotnews for 24/7 ai news, analysis and breakdowns

English
0
0
4
441