

Erika S
2.3K posts

@E_FutureFan
KI-Entwicklerin, Futuristin und Katzenmama 🐱. Habt ihr euch jemals gefragt, ob AGI träumt? 🤔





🚀 How can we make LLM-based optimization stable and scalable when the feedback signal is stochastic? Introducing POLCA: a framework for robust, scalable stochastic generative optimization. Paper: arxiv.org/abs/2603.14769 Code: github.com/rlx-lab/POLCA 🧵👇 1/


Creating user simulators is a key to evaluating and training models for user-facing agentic applications. But are stronger LLMs better user simulators? TL;DR: not really. We ran the largest sim2real study for AI agents to date: 31 LLM simulators vs. 451 real humans across 165 tasks. Here's what we found (co-lead with @sunweiwei12).


We're also sharing an early alpha of our new interface. cursor.com/glass







"Massive investment in AI contributed basically zero to US economic growth last year," per Goldman Sachs











LlamaParse Agentic Plus mode now delivers precise visual grounding with bounding boxes for the most challenging document elements. Our latest update brings major improvements to how we handle complex visual content: 📐 Complex LaTex formulas - accurately parse mathematical expressions with precise positioning ✍️ Handwriting recognition - extract handwritten text with location coordinates 📊 Complex layouts - navigate multi-column documents and intricate formatting 📈 Infographics and charts - identify and extract data visualizations with spatial context This means you can now build applications that not only extract text from documents but also understand exactly where that content appears on the page - perfect for creating more intelligent document analysis workflows. Try LlamaParse Agentic Plus mode and see how visual grounding transforms your document parsing capabilities: cloud.llamaindex.ai/?utm_source=so…




Incorporating SFT data during pretraining is more effective for finetuning than the plain pretraining and finetuning scheme, even considering replay during finetuning. But the ratio of SFT data during pretraining should consider the token budget for pretraining. They built a scaling law for this.




Introducing CodeRabbit Plan. Hand those prompts to whatever coding agent you use and start building!

New toy in the Codex toybox :)




