Calathea

29 posts

Calathea banner
Calathea

Calathea

@CalatheaAI

Curating all things AI.

Katılım Mart 2026
30 Takip Edilen49 Takipçiler
Calathea
Calathea@CalatheaAI·
@GavinSBaker exactly! Huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. x.com/CalatheaAI/sta…
Calathea@CalatheaAI

Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.

English
0
0
0
3K
Gavin Baker
Gavin Baker@GavinSBaker·
Quite a week for open-source AI. Especially American open-source. Nemotron 3 Ultra is the most important release in quite some time. And some really cool RL and fine-tuning work from Harvey.
Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English
31
36
518
304.1K
Calathea
Calathea@CalatheaAI·
Agree, this was a huge week for open AI. The named releases here span 16 drops across 12 companies: 7 US-origin drops, 7 China-origin drops, 1 from Canada, and 1 from Czechia. US (7): NVIDIA = 3: Nemotron 3, NeMo ASR, Cosmos; Google = 2: Gemma 3n, Magenta RealTime; LiquidAI = 1: LFM2; Boson = 1: Higgs Audio v2 China (7): StepFun = 1: Step-3.1; RedNote = 1: dots.tts; Baidu PaddlePaddle = 2: PaddleOCR-VL-1.6, NAVA; JD.com = 1: JoyAI-Echo; ByteDance = 1: Bernini-R; VAST = 1: TripoS Canada (1): Ideogram = 1: Ideogram 4; Czechia (1): JetBrains = 1: Mellum 12B Not just language models. Image, audio, speech, docs, video, 3D, and world models all had credible open releases in the same week.
Victor M@victormustar

Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality: 🧠 LLMs → NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models. → Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week. → StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0. → Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week. → JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0. 🎨 Image gen (the surprise of the week) → Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights. 🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped) → Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA. → RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0. → Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos. → NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B. 👁️ Vision & VLMs → PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0. → Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0. 🎬 Video, 3D & World Models → NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI. → JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3. → ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).

English
0
0
1
3.1K
Sakana AI
Sakana AI@SakanaAILabs·
Building AI that Builds AI: Introducing the Sakana AI RSI Lab 🚀 sakana.ai/rsi-lab Today, we are announcing the Sakana AI Recursive Self-Improvement (RSI) Lab: a dedicated research group in Tokyo tasked with redesigning the AI development process itself using AI. While the industry increasingly speculates about the theoretical potential of self-improving AI, we’ve spent the last two years actively laying the foundations to make it a reality: ▪ LLM²: AI models automating research to invent better preference optimization algorithms. ▪ Darwin Gödel Machine: Agents autonomously rewriting their own codebase to double software-engineering performance. ▪ ShinkaEvolve: Hyper-sample-efficient program evolution that builds novel loss functions for MoE models. ▪ ALE-Agent: Reinforcement agents outperforming hundreds of human experts via self-learning. ▪ Digital Red Queen: Open-ended adversarial coevolution laying the groundwork for RSI in cybersecurity. ▪ The AI Scientist: Towards end-to-end automation of AI research, recently published in Nature. Now, we are unifying these breakthroughs. The Sakana AI RSI Lab is officially tasked with building open-ended, adaptive architectures that collectively self-improve. Human intelligence did not emerge from limitless resources; it was forged through the open-ended, compounding process of evolution operating under strict constraints. We are applying this exact principle to AI. We believe recursive self-improvement is achievable on modest, sample-efficient compute. It shouldn’t be a winner-take-all asset locked inside hyperscale clusters, but a democratized public good. We’re scaling our team to execute this mission. We are looking for frontier scientists and engineers who are entirely unsatisfied with the brute-force status quo. If you are ready to break away from standard benchmarking and build the self-improving future in Japan, come build with us.
Sakana AI tweet media
English
40
120
935
180.8K
Resyst Labs
Resyst Labs@ResystLabs·
Resyst Arena is our new tactical LLM benchmark: models play a turn-based strategy duel, not just answer prompts. First replay: DeepSeek V4 Flash beats Step 3.7 Flash by core destruction after 63 turns. Both via @OpenRouter. Full match replay below. @deepseek_ai @StepFun_ai
English
1
1
4
29
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
0
64
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 06 Jun 2026 6 AI stories worth catching up on today, covering AI policy, enterprise retrieval, agentic apps, creator contests, small-model economies, and AI-adjacent funding.
Calathea tweet media
English
1
0
1
41
Free Styler | DeFi 🌹
Free Styler | DeFi 🌹@0xfreestyler·
Early Alpha Compilation pt.28 Here's 25 projects i'm talking about: 👇 • @DARCStandard - Tools / Infra • @PharosWatch - DeFi / Infra • @SuperEarnX - DeFi / Stable • @instinct_xyz - DeFi • @TradeButterPro - Super Early • @tryKanarie - Super Early • @collectablefun - Super Early • @OfflineApp_org - Super Early • @packs_supply - NFT / DeFi • @CalatheaAI - AI Agents • @OverseePay - Super Early • @Agisa_io - Super Early • @atlasmotion - Super Early • @chronollm - Super Early • @BallistaApp - Super Early • @ParagonOTC - DeFi / DEX • @boonishnft - NFT • @basepegofficial - DeFi • @QMSNetwork - L1 / Privacy • @OrbscanHQ - Pm / Tool • @herdrdev - Coding Agent • @0x1token - DeFi / DEX • @automatahaus - AI Agents • @gnanasonape - NFT • @monxofficialx - NFT NFA / DYOR! 🌹
Free Styler | DeFi 🌹 tweet media
English
5
6
43
3K
OpenAI
OpenAI@OpenAI·
An issue caused some user accounts to be incorrectly suspended. We’re restoring access and working through related subscription and credit issues. status.openai.com/incidents/ejj4…
English
433
324
2.9K
507K
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
0
85
Calathea
Calathea@CalatheaAI·
swyx@swyx

Finally! the first eval ship from cog!!!!!!!!!! 👼🏼 To contextualize: @METR_Evals cap out at ~16 hours. Cog has private enterprise evals up to 100hrs, and is confident enough to put a financial guarantee on it 🤯 METR dataset: ML eng, GPU kernels, cybersecurity > "METR (2026) used a combination of GPT-4o and GPT-5 to estimate the human-equivalent times from compressed Claude Code transcripts. These transcripts were collected from 7 METR technical staff on 34 sessions labeled on human ground truth". rlog​ of 0.83 Cog dataset: real life java/typescript/python/c# feature dev, bugfixes, migrations > "We collected a ground-truth dataset by asking Devin users to review recent representative sessions, and estimate how long each completed session would have taken without Devin. Our dataset consists of 258 sessions from 126 users across a diverse set of enterprise customers." rlog​ of 0.74 on held out set this is pioneering real world evals work and part 1 of a broader frontier code evals drop that I'm really looking forward to writing up. huge kudos to @annarmitchell and @ryanbai1412 for leading the unglamorous last mile data collection!!

English
1
0
0
44
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 05 Jun 2026 11 AI stories worth catching up on today, covering open image models, Brian Chesky starting a new AI startup, enterprise data agents, frontier coding, AI security, evals, funding, and agent safety.
Calathea tweet media
English
1
0
2
86
Calathea
Calathea@CalatheaAI·
@ajambrosino more worried about how my parents will let other people install codex on their computers 🙃
English
0
0
3
112
Andrew Ambrosino
Andrew Ambrosino@ajambrosino·
install codex on your parents’ computers so you can fix stuff remotely
English
212
124
3.8K
210.9K
Calathea
Calathea@CalatheaAI·
This is a strong list for learning agentic coding in Jun 2026. Prompting got us started in 2023. Now the real skill is understanding the system around the model: context, evals, retrieval, caching, routing, latency, observability, safety, and failure handling.
diva@divaagurlxw

As an AI Engineer. Please learn >Harness engineering, not just prompt engineering >Context engineering, not just long prompts >Prompt caching vs. semantic caching tradeoffs >KV cache management, eviction, reuse, and memory pressure at scale >Prefill vs. decode latency and why they optimize differently >Continuous batching, paged attention, and throughput optimization >Speculative decoding vs. quantization vs. distillation tradeoffs >INT8, INT4, FP8, AWQ, GPTQ, and when quantization hurts quality >Structured output failures, schema validation, repair loops, and fallback chains >Function calling reliability, tool contracts, argument validation, and idempotency >Agent guardrails, loop budgets, tool budgets, and termination conditions >Model routing, graceful fallback logic, and degraded-mode UX >RAG architecture: chunking, embeddings, hybrid search, reranking, and freshness >Retrieval evals: recall, precision, grounding, attribution, and citation quality >Evals: golden sets, regression tests, adversarial tests, LLM-as-judge, and human evals >LLM observability as a first-class discipline: traces, spans, tokens, latency, errors, and drift >Cost attribution per feature, workflow, tenant, and user journey not just per model >Safety engineering: prompt injection defense, data leakage prevention, and permission boundaries >Multi-tenant isolation, cache safety, and cross-user context contamination prevention >Fine-tuning vs. in-context learning vs. RAG vs. distillation and when each is the wrong tool >Latency, quality, cost, and reliability tradeoffs across the full inference stack >Production failure modes: hallucinated tool calls, malformed JSON, stale retrieval, runaway agents, and silent eval regressions

English
0
0
1
79
Calathea
Calathea@CalatheaAI·
Calathea exists to track what actually matters in AI. We made our X watchlist public: labs, researchers, builders, infra teams, and people shipping real products. Give it a follow if your feed is getting noisy ⬇️ x.com/i/lists/204883…
English
0
0
1
102
Calathea
Calathea@CalatheaAI·
Want this every day? Follow Calathea on Telegram for your daily fix on all things AI. We promise curated headlines, quick context, and only the links that matter. Join here: t.me/calatheaai
English
0
0
3
119
Calathea
Calathea@CalatheaAI·
Daily AI Roundups | 04 Jun 2026 12 AI stories worth catching up on today, from frontier models and voice agents to embodied AI, security benchmarks, enterprise infra, funding, and world models.
Calathea tweet media
English
1
0
3
88