Calathea

37 posts

Calathea banner
Calathea

Calathea

@CalatheaAI

Curating all things AI.

Inscrit le Mart 2026
31 Abonnements52 Abonnés
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
1
36
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 08 Jun 2026 10 AI stories worth catching up on today, covering agent security, autonomous coding loops, AI Gateway infra, privacy leakage, enterprise Copilot, and agent sabotage detection.
Calathea tweet media
English
1
0
1
29
Calathea
Calathea@CalatheaAI·
/loop used to mean Ralph-style retry loops. Now it’s starting to mean continuous orchestration via cron jobs: agents supervising agents, spawning threads, checking work, recovering state, and looping until verified. Basically, AI systems building themselves. Great read.
Matt Van Horn@mvanhorn

x.com/i/article/2063…

English
0
0
1
190
Calathea
Calathea@CalatheaAI·
@mvanhorn great read! was wondering what /loop meant on my timeline too
English
0
0
0
210
Calathea
Calathea@CalatheaAI·
Giving massive Game Master vibes, but honestly? We are totally here for it 😉 Would be interested to see what kind of work would be classified as “impressive” or “incredibly useful”.
Tibo@thsottiaux

I have a new kind of big button that I can press for Codex. Over the next 100 days, we will select one person per day who does impressive or incredibly useful work with Codex and give them 10X usage limits for a month to see what they can do with it. First one tomorrow.

English
0
0
0
22
Calathea
Calathea@CalatheaAI·
@GavinSBaker exactly! Huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. x.com/CalatheaAI/sta…
Calathea@CalatheaAI

Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.

English
0
0
0
5.3K
Gavin Baker
Gavin Baker@GavinSBaker·
Quite a week for open-source AI. Especially American open-source. Nemotron 3 Ultra is the most important release in quite some time. And some really cool RL and fine-tuning work from Harvey.
Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English
46
62
820
658.7K
Calathea
Calathea@CalatheaAI·
Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.
Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English
1
1
2
5.7K
Sakana AI
Sakana AI@SakanaAILabs·
Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀 sakana.ai/rsi-lab Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI. While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality: ▪ LLM²: AI models automating research to invent better preference optimization algorithms. ▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance. ▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models. ▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning. ▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity. ▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature. Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve. Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI. We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good. We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.
Sakana AI tweet media
English
47
141
1K
279.4K
Resyst Labs
Resyst Labs@ResystLabs·
Resyst Arena is our new tactical LLM benchmark: models play a turn-based strategy duel, not just answer prompts. First replay: DeepSeek V4 Flash beats Step 3.7 Flash by core destruction after 63 turns. Both via @OpenRouter. Full match replay below. @deepseek_ai @StepFun_ai
English
1
1
4
33
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
0
84
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 06 Jun 2026 6 AI stories worth catching up on today, covering AI policy, enterprise retrieval, agentic apps, creator contests, small-model economies, and AI-adjacent funding.
Calathea tweet media
English
1
0
1
70
Free Styler | DeFi 🌹
Free Styler | DeFi 🌹@0xfreestyler·
Early Alpha Compilation pt.28 Here's 25 projects i'm talking about: 👇 • @DARCStandard - Tools / Infra • @PharosWatch - DeFi / Infra • @SuperEarnX - DeFi / Stable • @instinct_xyz - DeFi • @TradeButterPro - Super Early • @tryKanarie - Super Early • @collectablefun - Super Early • @OfflineApp_org - Super Early • @packs_supply - NFT / DeFi • @CalatheaAI - AI Agents • @OverseePay - Super Early • @Agisa_io - Super Early • @atlasmotion - Super Early • @chronollm - Super Early • @BallistaApp - Super Early • @ParagonOTC - DeFi / DEX • @boonishnft - NFT • @basepegofficial - DeFi • @QMSNetwork - L1 / Privacy • @OrbscanHQ - Pm / Tool • @herdrdev - Coding Agent • @0x1token - DeFi / DEX • @automatahaus - AI Agents • @gnanasonape - NFT • @monxofficialx - NFT NFA / DYOR! 🌹
Free Styler | DeFi 🌹 tweet media
English
5
6
46
3.3K
OpenAI
OpenAI@OpenAI·
An issue caused some user accounts to be incorrectly suspended. We’re restoring access and working through related subscription and credit issues. status.openai.com/incidents/ejj4…
English
489
336
3K
574.9K
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
0
92
Calathea
Calathea@CalatheaAI·
swyx@swyx

Finally! the first eval ship from cog!!!!!!!!!! 👼🏼 To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯 METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth". rlog​ of 0.83 Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations > "We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers." rlog​ of 0.74 on held out set this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I'm really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!!

English
1
0
0
50
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 05 Jun 2026 11 AI stories worth catching up on today, covering open image models, Brian Chesky starting a new AI startup, enterprise data agents, frontier coding, AI security, evals, funding, and agent safety.
Calathea tweet media
English
1
0
2
94