Calathea (@CalatheaAI) - Profil Twitter | Zamantika Mersobahis Locabet

Calathea@CalatheaAI·7h

Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai

English

0

1

36

Calathea@CalatheaAI·7h

Links continued: 8. arxiv.org/abs/2606.07054 9. x.com/huggingface/st… 10. arxiv.org/abs/2511.02748

Omar Sanseviero@osanseviero

Gemma 4 MTP just got officially merged into llama.cpp This means you can use Gemma 4 QAT + MTP for a lightweight + super fast setup. Excited to see what the community builds with it github.com/ggml-org/llama…

English

1

0

1

9

Calathea@CalatheaAI·7h

Daily AI Roundups | 08 Jun 2026 10 AI stories worth catching up on today, covering agent security, autonomous coding loops, AI Gateway infra, privacy leakage, enterprise Copilot, and agent sabotage detection.

English

1

0

1

29

Calathea@CalatheaAI·9h

/loop used to mean Ralph-style retry loops. Now it’s starting to mean continuous orchestration via cron jobs: agents supervising agents, spawning threads, checking work, recovering state, and looping until verified. Basically, AI systems building themselves. Great read.

Matt Van Horn@mvanhorn

x.com/i/article/2063…

English

0

1

190

Calathea@CalatheaAI·11h

@mvanhorn great read! was wondering what /loop meant on my timeline too

English

0

210

Matt Van Horn@mvanhorn·13h

x.com/i/article/2063…

ZXX

134

248

3.2K

738.8K

Calathea@CalatheaAI·12h

Giving massive Game Master vibes, but honestly? We are totally here for it 😉 Would be interested to see what kind of work would be classified as “impressive” or “incredibly useful”.

Tibo@thsottiaux

I have a new kind of big button that I can press for Codex. Over the next 100 days, we will select one person per day who does impressive or incredibly useful work with Codex and give them 10X usage limits for a month to see what they can do with it. First one tomorrow.

English

0

22

Calathea@CalatheaAI·2d

@GavinSBaker exactly! Huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. x.com/CalatheaAI/sta…

Calathea@CalatheaAI

Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.

English

0

5.3K

Gavin Baker@GavinSBaker·2d

Quite a week for open-source AI. Especially American open-source. Nemotron 3 Ultra is the most important release in quite some time. And some really cool RL and fine-tuning work from Harvey.

Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English

46

62

820

658.7K

Calathea@CalatheaAI·2d

Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.

Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English

1

2

5.7K

Calathea@CalatheaAI·2d

@SakanaAILabs Congrats!

English

0

32

Sakana AI@SakanaAILabs·3d

Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀 sakana.ai/rsi-lab Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI. While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality: ▪ LLM²: AI models automating research to invent better preference optimization algorithms. ▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance. ▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models. ▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning. ▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity. ▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature. Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve. Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI. We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good. We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.

English

47

141

1K

279.4K

Calathea@CalatheaAI·2d

@ResystLabs @OpenRouter @deepseek_ai @StepFun_ai Interesting but what’s the premise of the duel?

English

1

0

25

Resyst Labs@ResystLabs·5d

Resyst Arena is our new tactical LLM benchmark: models play a turn-based strategy duel, not just answer prompts. First replay: DeepSeek V4 Flash beats Step 3.7 Flash by core destruction after 63 turns. Both via @OpenRouter. Full match replay below. @deepseek_ai @StepFun_ai

English

1

4

33

Calathea@CalatheaAI·2d

Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai

English

0

84

Calathea@CalatheaAI·2d

Links continued: 4. x.com/OpenAIDevs/sta… 5. qbitai.com/2026/06/431287… 6. huggingface.co/blog/build-sma…

Francis Davidson@FDavidsonT

Thrilled to have worked closely with OpenAI to build @OdessiaTravel. In 5 months, we built a travel agent that can plan and book entire trips: flights, hotels, experiences, places to see – all personalized to you. Options come back in a few seconds with rich visuals. Grateful for @OpenAIDevs for the partnership.

English

1

0

61

Calathea@CalatheaAI·2d

Daily AI Roundups | 06 Jun 2026 6 AI stories worth catching up on today, covering AI policy, enterprise retrieval, agentic apps, creator contests, small-model economies, and AI-adjacent funding.

English

1

0

1

70

Calathea@CalatheaAI·2d

@0xfreestyler @DARCStandard @PharosWatch @SuperEarnX @instinct_xyz @TradeButterPro @tryKanarie @collectablefun hi, we are not a crypto project please remove the association. thank you

English

0

26

Free Styler | DeFi 🌹@0xfreestyler·2d

Early Alpha Compilation pt.28 Here's 25 projects i'm talking about: 👇 • @DARCStandard - Tools / Infra • @PharosWatch - DeFi / Infra • @SuperEarnX - DeFi / Stable • @instinct_xyz - DeFi • @TradeButterPro - Super Early • @tryKanarie - Super Early • @collectablefun - Super Early • @OfflineApp_org - Super Early • @packs_supply - NFT / DeFi • @CalatheaAI - AI Agents • @OverseePay - Super Early • @Agisa_io - Super Early • @atlasmotion - Super Early • @chronollm - Super Early • @BallistaApp - Super Early • @ParagonOTC - DeFi / DEX • @boonishnft - NFT • @basepegofficial - DeFi • @QMSNetwork - L1 / Privacy • @OrbscanHQ - Pm / Tool • @herdrdev - Coding Agent • @0x1token - DeFi / DEX • @automatahaus - AI Agents • @gnanasonape - NFT • @monxofficialx - NFT NFA / DYOR! 🌹

English

5

6

46

3.3K

Calathea@CalatheaAI·3d

@OpenAI limit reset? @thsottiaux

English

0

43

OpenAI@OpenAI·3d

An issue caused some user accounts to be incorrectly suspended. We’re restoring access and working through related subscription and credit issues. status.openai.com/incidents/ejj4…

English

489

336

3K

574.9K

Calathea@CalatheaAI·3d

Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai

English

0

92

Calathea@CalatheaAI·3d

Links continued: 7. x.com/swyx/status/20… 8. news.crunchbase.com/venture/scotch… 9. arxiv.org/abs/2602.05056 10. arxiv.org/abs/2606.05743 11. infoq.cn/article/VrBbVK…

swyx@swyx

Finally! the first eval ship from cog!!!!!!!!!! 👼🏼 To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯 METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth". rlog of 0.83 Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations > "We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers." rlog of 0.74 on held out set this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I'm really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!!

English

1

0

50

Calathea@CalatheaAI·3d

Daily AI Roundups | 05 Jun 2026 11 AI stories worth catching up on today, covering open image models, Brian Chesky starting a new AI startup, enterprise data agents, frontier coding, AI security, evals, funding, and agent safety.

English

1

0

2

94

Calathea

Découvrir