Vim
25.2K posts

Vim
@vim_dzl
Presidential Fitness Test Award Winner | 🤘🌁🗽| Training/Finetuning @basetenco, ex-@Plaid | Professional Attender of Weddings | DM for consulting inquiries



Long-running agents accumulate context while model memory stays fixed. This leads to a tradeoff: either discard older information or compress it. New work by @oneill_c explores repeated KV-cache compression for persistent agents using Attention Matching. Our research shows one-shot compaction preserves detailed information remarkably well with 65–80% accuracy at 2–5× compression. This far outperforms text summarization. But what happens when you compress, add more context, and compress again repeatedly? baseten.co/research/repea…



Inference Engineering launches today. baseten.com/inference-engi…


"No other product lets you launch ten different training jobs on four different datasets." –Head of Clinical NLP, OpenEvidence Over 40% of U.S. physicians trust @EvidenceOpen's platform for fast, accurate medical information. Their secret: custom, specialized models built on Baseten Training. Here's how we helped them save $1.9M via model training and improved their latency 23x to power 100M+ clinical consultations per year. baseten.co/resources/cust…


Come join @baseten where you get to research and be part of an index on AI to expose yourself to all of it

We replicated Microsoft Research's Generative Adversarial Distillation (GAD) to distill Qwen3-4B from GPT-5.2. Standard black-box distillation teaches a student to copy teacher outputs, but at inference the student generates from its own prefixes, small errors compound and it drifts off the expert distribution. GAD reframes this as an on-policy distillation problem, training a co-evolving discriminator that provides adaptive reward signals on the student's own generations. Exploring methods like this are how our post-training team surfaces new training patterns. Read here: baseten.co/resources/rese…





