Arthur Ostapenko

8.3K posts

Arthur Ostapenko banner
Arthur Ostapenko

Arthur Ostapenko

@arthurostapenko

explorer, researcher, builder

Spain Katılım Kasım 2009
5.3K Takip Edilen859 Takipçiler
Arthur Ostapenko retweetledi
Kevin Patrick Murphy
New paper: "Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs". Our system (BLF) matches human superforecasters on ForecastBench, and beats all the top methods (GPT-5, Cassi, Grok 4.20, and Foresight-32B). 🧵
Kevin Patrick Murphy tweet media
English
10
32
199
25.3K
Arthur Ostapenko retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
We’re launching the beta for our new commercial AI product: Sakana Fugu 🐡, a multi-agent orchestration system! Blog: sakana.ai/fugu-beta Fugu hits SOTA on SWE-Pro, GPQA-D, and ALE-Bench, and has been our internal secret weapon. It dynamically coordinates frontier models, autonomously selecting the optimal agent combinations and roles for each task. Available as an OpenAI-compatible API, you can seamlessly integrate Fugu into your existing workflows with minimal changes. 🐟 Fugu Mini: High-speed orchestration optimized for latency 🐡 Fugu Ultra: Full model pool utilization for deep, complex reasoning Apply for the beta test here: forms.gle/BtKkhc2CfLKk1d…
Sakana AI tweet media
English
14
140
562
214.8K
Arthur Ostapenko retweetledi
DeepSeek
DeepSeek@deepseek_ai·
🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n
DeepSeek tweet media
English
1.5K
7.5K
44K
8.8M
Arthur Ostapenko retweetledi
Qwen
Qwen@Alibaba_Qwen·
🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power! Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇 What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀 🔗👇 Blog: qwen.ai/blog?id=qwen3.… Qwen Studio: chat.qwen.ai/?models=qwen3.… Github: github.com/QwenLM/Qwen3.6 Hugging Face: huggingface.co/Qwen/Qwen3.6-2… huggingface.co/Qwen/Qwen3.6-2… ModelScope: modelscope.cn/models/Qwen/Qw… modelscope.cn/models/Qwen/Qw…
Qwen tweet media
English
501
1.7K
12.4K
3.6M
Arthur Ostapenko retweetledi
Xenova
Xenova@xenovacom·
NEW: OpenAI releases Privacy Filter, their first open model of 2026! 🤗 Apache-2.0! It's a bidirectional token-classification adaptation of GPT-OSS, trained to mask personally identifiable information (PII) in text. At only 1.5B params, it can even run locally in your browser!
English
31
61
779
215.9K
Arthur Ostapenko retweetledi
Zain Shah
Zain Shah@zan2434·
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
English
1K
3.3K
25.9K
5.5M
Arthur Ostapenko retweetledi
GLADIA Research Lab
GLADIA Research Lab@GladiaLab·
LLMs are injective and invertible. In our new paper, we show that different prompts always map to different embeddings, and this property can be used to recover input tokens from individual embeddings in latent space. (1/6)
GLADIA Research Lab tweet media
English
279
1.3K
11.1K
5.1M
Arthur Ostapenko retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
Discovering Novel LLM Experts via Task-Capability Coevolution Project: acdc-llm.github.io Paper: arxiv.org/abs/2604.14969 Can we build AI that is smarter than its parts? This week, our team will present AC/DC⚡ at #ICLR2026. The current paradigm in AI assumes that to solve more complex problems, we must train a single, ever-larger model. But no single model can excel at every task without massive computational costs. Instead of building one monolithic model, we asked what if we coevolved a diverse collective of specialized experts. We introduce Assessment Coevolving with Diverse Capabilities (AC/DC). It is a framework that simultaneously evolves a population of LLMs using evolutionary model merging and an archive of synthetic tasks generated by an AI scientist. As the tasks become more complex, the models must develop distinct, specialized skills to solve them. Crucially, AC/DC selects models based on Quality-Diversity. It keeps models not just because they score high on average, but because they solve different problems than the rest of the population. The results show that a collaborative task force of 8 small, evolved models can outperform a massive 72B parameter model, using significantly fewer total parameters. These models genuinely specialize, providing completely different, yet correct, approaches to complex problems. This suggests a new path forward for AI development, creating highly capable, parameter-efficient systems through collective intelligence rather than relying solely on brute-force scaling. OpenReview: openreview.net/forum?id=efNIN…
English
1
36
189
18.9K
Arthur Ostapenko retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
Can LLMs flip coins in their heads? When prompted to “Flip a fair coin” 100 times, the heads to tails ratio drifts far from 50:50. LLMs can understand what the target probability should be, but generating outputs that faithfully follow a given distribution is a separate problem. This bias extends beyond coin flips. When LLMs are asked to generate multiple story ideas or brainstorm solutions, the outputs tend to cluster around a narrow range. The same probabilistic skew that distorts coin flips limits diversity in creative generation, recommendations, and other tasks where varied outputs are needed. We discovered a prompting technique named String Seed of Thought (SSoT). The method is simple: instruct the LLM to generate a random string in its own output, then manipulate that string to derive its answer. It requires only a small addition to the prompt and no external random number generator. SSoT significantly reduces output bias across a wide range of LLMs, both open and closed. With reasoning models (such as DeepSeek-R1), it reaches accuracy close to that of actual random sampling. The method generalizes from binary choices to n-way selections and arbitrary probability distributions. On the NoveltyBench diversity benchmark, SSoT outperformed other approaches across all six categories while maintaining output quality. This work will be presented at #ICLR2026! Blog: pub.sakana.ai/ssot Paper: arxiv.org/abs/2510.21150 Openreview: openreview.net/forum?id=luXtb…
GIF
English
35
137
810
251.7K
Arthur Ostapenko retweetledi
Natalie Wolchover
Natalie Wolchover@nattyover·
Bacteria move around using a molecular machine called the flagellar motor that rotates faster than the flywheel of a race car engine and switches directions in an instant. After 50 yrs, scientists have finally figured out how it works. “My lifelong quest is now fulfilled.” Link⤵️
English
526
4.4K
32.9K
6M
Arthur Ostapenko retweetledi
Kimi.ai
Kimi.ai@Kimi_Moonshot·
Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on kimi.com in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: kimi.com/code - 🔗 API: platform.moonshot.ai 🔗 Tech blog: kimi.com/blog/kimi-k2-6 🔗 Weights & code: huggingface.co/moonshotai/Kim…
Kimi.ai tweet media
English
891
2.4K
18.1K
7.3M
Arthur Ostapenko retweetledi
Boris ✈️🇧🇷 ICLR 2026
🚨Why should one huge LLM know and solve everything? - No single human does, yet our civilization does endless innovation. Introducing AC/DC - it continually coevolves a population of small expert LLMs that collectively outperform GPT-4o. (ICLR 2026 w/ @SakanaAILabs) 👇🧵
English
4
23
202
21.7K
Arthur Ostapenko retweetledi
alphaXiv
alphaXiv@askalphaxiv·
“Think Anywhere in Code Generation” Most reasoning LLMs think before writing code. But coding often gets hard because the tricky parts only gets revealed mid-implementation when the edge cases or final return logic appear. So this paper introduces Think-Anywhere, where models can pause and reason at any token position while generating code, then strip those thoughts out to leave clean executable code. Trained with cold-start SFT + execution-based RL, this beats CoT, self-planning, interleaved thinking, GRPO, and recent code post-training methods. This lets the model learns to think exactly where uncertainty appears.
alphaXiv tweet media
English
9
58
367
19.1K
Arthur Ostapenko retweetledi
Yulu Gan
Yulu Gan@yule_gan·
Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/Rand… Website: thickets.mit.edu
Yulu Gan tweet media
English
87
435
3K
684.9K
Arthur Ostapenko retweetledi
Sakana AI
Sakana AI@SakanaAILabs·
What happens when you put competing neural networks in a Petri Dish and start changing the rules while they adapt? Last year we released Petri Dish NCA, where neural nets are the organisms that learn during simulation. Today we're releasing Digital Ecosystems: a browser-based platform for interactive artificial life research. The setup: several small CNNs share a 2D grid, each seeing only a 3x3 neighborhood. No global plan. They compete for territory by attacking neighbours and defending against incoming attacks, learning via gradient descent online while the simulation runs. What we didn't expect was the role of the learning itself. Gradient descent isn't just optimising each species' strategy. Instead, it acts to stabilize the whole system during simulation. Species that overextend get pushed back by the loss. Species that stagnate get nudged to grow. This means you can push parameters toward edge-of-chaos regimes: a zone characterised by emergent complexity. Letting the neural networks learn acts to hold the complex system together while you explore and interact. The platform lets you steer all of this interactively. You can draw walls to create niches, erase parts of the system online, and tune 40+ system parameters to explore the most interesting configurations. We find it mesmerizing to watch species carve out territories and reorganise when you perturb them. Everything runs client-side in your browser, no install needed. Blog: pub.sakana.ai/digital-ecosys… Code: github.com/SakanaAI/digit…
English
39
196
1.1K
212.2K
Arthur Ostapenko retweetledi
Chris Tate
Chris Tate@ctatedev·
Terminal automation + e2e testing solved Now as simple as snapshot, click, type: – wterm renders terminal-in-html, every cell in the a11y tree – agent-browser automates pages via the a11y tree Here's opencode in one browser driving Claude Code in another
English
107
215
3.4K
961.5K
Arthur Ostapenko retweetledi
Felix Rieseberg
Felix Rieseberg@felixrieseberg·
Hi! I'm here with *another launch*, it just happens to be extremely niche, nerdy, and probably only for a handful of people. In the desktop app, Claude Cowork and Code now have a little Bluetooth API for makers & developers, allowing you to build hardware devices that interact with Claude. I, for instance, built a little desk pet that alerts me whenever Claude is waiting for permission.
Felix Rieseberg tweet media
English
126
119
1.6K
180K
Arthur Ostapenko retweetledi
Higgsfield AI 🧩
Higgsfield AI 🧩@higgsfield·
We just made a 23-MINUTE sci-fi pilot in 4 days. And it is 100% AI.
English
749
734
4.3K
2.1M
Arthur Ostapenko retweetledi
Yuekun Yao
Yuekun Yao@yuekun_yao·
Claude Mythos is suspected of being a Looped transformer (LT), but why are LT-based LLMs so powerful? Our new finding: LT can perform implicit reasoning over their parametric knowledge, unlocking generalization to complex and unfamiliar questions compared to transformers ⤵️
Yuekun Yao tweet media
English
16
154
961
184K
Arthur Ostapenko retweetledi
Zane Koch
Zane Koch@zanehkoch·
ok actually insane paper published yesterday a research group in Korea built a gene switch you can control wirelessly using electromagnetic fields they exposed mice to 60 hz EMF (same frequency as your wall outlet) using a pair of large coils that generate a uniform magnetic field around the animal, for cyclic 3-day on / 4-day off pulses they showed this could: - activate OSK to do epigenetic reprogramming in progeroid and aged mice, extending lifespan and reversing aging markers across multiple tissues - conditionally switch on mutant amyloid genes only in aged mouse brains, letting them separate aging effects from amyloid effects to study AD biology in a way previous models couldn't no drugs, no impacts, just a magnetic field from outside the body
Zane Koch tweet media
English
274
882
7.2K
2.2M