Roman

81 posts

Roman

Roman

@romansvet

Sky's the Limit.

Katılım Nisan 2008
4.6K Takip Edilen102 Takipçiler
Roman retweetledi
Aksel
Aksel@akseljoonas·
Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.
English
125
605
4.5K
1.1M
Roman retweetledi
Peter Hase
Peter Hase@peterbhase·
New Schmidt Sciences RFP on AI Interpretability: We need new tools for detecting and mitigating deceptive behaviors exhibited by LLMs. Funding for $300k-$1M projects Deadline: May 26th, AoE RFP: schmidtsciences.smapply.io/prog/2026_inte… Please share with anyone who may be interested!
English
1
35
175
12.7K
Roman retweetledi
Daniel Han
Daniel Han@danielhanchen·
If you find Claude Code with local models to be 90% slower, it's because CC prepends some attribution headers, and this changes per message causing it to invalidate the entire prompt cache / KV cache. So generation becomes O(N^2) not O(N) for LLMs.
Unsloth AI@UnslothAI

Note: Claude Code invalidates the KV cache for local models by prepending some IDs, making inference 90% slower. See how to fix it here: #fixing-90-slower-inference-in-claude-code" target="_blank" rel="nofollow noopener">unsloth.ai/docs/basics/cl…

English
41
134
1.6K
175.2K
Roman retweetledi
Varun
Varun@varun_mathur·
connecting autoresearcher agents in a volatile and hostile peer-to-peer network has led to intelligence emerging which no agent individually had. this, extrapolated across all domains, and across 100s of millions of nodes, is the true path to agi. agents.hyper.space/research-report
English
7
8
76
7.5K
Roman
Roman@romansvet·
I don’t know what kind of tasks others are using GPT-5.3-Codex for or what prompts they’re using, but what I’m trying works much worse than before. It feels like the model is looking for ways to save time. Now I’m adding instructions that the model has no time limits.
English
0
0
0
43
Roman retweetledi
alex zhang
alex zhang@a1zhang·
Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs). It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs! Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025! arxiv.org/pdf/2512.24601
alex zhang tweet media
English
252
1.1K
7.4K
2M
Roman retweetledi
David Wall
David Wall@DavidWall9987·
Scaling evolution strategies to hyperscale is impressive — but it exposes the same pattern we’re hitting across every training paradigm: we can scale the optimization method faster than we can scale the system’s stability. EGGROLL removes the backprop bottleneck and gives you massive throughput gains, but the failure modes of large models don’t come from slow gradients. They come from: • representation drift • unstable long-horizon behavior • loss landscapes that amplify noise • fragile coordination across layers • lack of structural anchors during training You can update parameters with rank-one, rank-k, or full-rank perturbations — the math changes the speed, but the architecture still determines whether the model stays coherent as it grows to billions or trillions of parameters. Optimization isn’t the real ceiling. Integrity is. Faster training helps, but without machine-native designs that manage entropy at scale, we’re just accelerating our way into the same instability faster.
English
0
4
16
950
Roman retweetledi
Bidipta Sarkar
Bidipta Sarkar@bidiptas13·
Introducing 🥚EGGROLL 🥚(Evolution Guided General Optimization via Low-rank Learning)! 🚀 Scaling backprop-free Evolution Strategies (ES) for billion-parameter models at large population sizes ⚡100x Training Throughput 🎯Fast Convergence 🔢Pure Int8 Pretraining of RNN LLMs
Bidipta Sarkar tweet media
English
20
146
949
263.6K
Roman retweetledi
Akshay 🚀
Akshay 🚀@akshay_pachaar·
Meta just solved the biggest problem in RAG! Most RAG systems waste your money. They retrieve 100 chunks when you only need 10. They force the LLM to process thousands of irrelevant tokens. You pay for compute you don't need. Meta AI just solved this. They built REFRAG, a new RAG approach that compresses and filters context before it hits the LLM. The results are insane: - 30.85x faster time-to-first-token - 16x larger context windows - 2-4x fewer tokens processed - Outperforms LLaMA on 16 RAG benchmarks Here's what makes REFRAG different: Traditional RAG dumps everything into the LLM. Every chunk. Every token. Even the irrelevant stuff. REFRAG works at the embedding level instead: ↳ It compresses each chunk into a single embedding ↳ An RL-trained policy scores each chunk for relevance ↳ Only the best chunks get expanded and sent to the LLM ↳ The rest stay compressed or get filtered out entirely The LLM only processes what matters. The workflow is straightforward: 1. Encode your docs and store them in a vector database 2. When a query arrives, retrieve relevant chunks as usual 3. The RL policy evaluates compressed embeddings and picks the best ones 4. Selected chunks are expanded into full token embeddings 5. Rejected chunks stay as single compressed vectors 6. Everything goes to the LLM together This means you can process 16x more context at 30x the speed with zero accuracy loss. I have shared link to the paper in the next tweet!
Akshay 🚀 tweet media
English
49
279
1.4K
103.3K
Roman retweetledi
Wenhu Chen
Wenhu Chen@WenhuChen·
Had some really interesting discoveries recently: If a model performs extremely stable on one benchmark. Let's say a model is always getting 62% on SWEBench no matter what prompts or scaffold you used. It DOES NOT mean that the model is robust. It actually means that the model is CONTAMINATED on Swebench, i.e. directly train on the test set or the paraphrase of the test set. This could possibly become a good metric for detecting contamination. We will provide more empirical results later on.
English
12
14
252
39.1K
Roman retweetledi
Robert Youssef
Robert Youssef@rryssf·
I just read this new paper that completely broke my brain 🤯 Researchers figured out how to transfer LoRA adapters between completely different AI models without any training data, and it works better than methods that require massive datasets. It's called TITOK, and here's the wild part: Instead of copying everything from the source model, they only transfer the tokens that actually matter. They do this by comparing the model with and without LoRA to find where the adapter adds real value. Think of it like this: if your tuned model is confident about a token but the base model isn't, that token contains the knowledge you want to transfer. The results are insane: +8% better than vanilla models +6% better than traditional knowledge distillation +4.4% better than TransLoRA And it works across wild scenarios: → Mistral to Llama (different families) → 3B to 8B models (different sizes) → Llama 2 to Llama 3 (different versions) The killer advantage? No extra models needed. TransLoRA requires training a separate discriminator just to filter synthetic data. TITOK uses the source model itself to identify important tokens. Even crazier: they handle different tokenizers automatically. When models split text differently, their algorithm aligns the tokens and propagates importance scores across the gap. This isn't just academic. Every time a new model drops, your fine-tuned adapters become obsolete. TITOK means you can migrate that hard-won knowledge to any new backbone in hours, not weeks. We just went from "each model needs its own adapter" to "knowledge flows freely between models." The efficiency gains alone make this commercially critical. But the real breakthrough is proving you can transfer reasoning capabilities through selective token-level supervision. Paper: arxiv. org/abs/2510.04682
Robert Youssef tweet media
English
38
108
477
42.5K
Roman retweetledi
Rohan Paul
Rohan Paul@rohanpaul_ai·
Really cool idea in this paper 💡 They propose intelligence works by reusing stored inference loops, not recomputing every time. It means the system keeps a memory of how it solved problems before, then reuses that stored know-how instead of solving from scratch every time. It calls this Memory-Amortized Inference (MAI), where past solutions live as loops and get adapted to new inputs. The core claim is nonergodicity, meaning the agent does not roam everywhere evenly, it keeps returning to useful regions. “Stored inference loops” are repeatable internal routines, a short sequence of states the model can re-enter that already sits near a good answer, so when a new input arrives it pulls the closest loop from memory and makes a small correction to fit the new case. MAI runs a 2 step cycle, retrieval pulls a similar past state from memory, then a small update nudges it to fit the current context. Because the path closes back on itself like a loop, the internal state stays consistent over time and avoids drift. This cuts compute, keeps behavior stable, and bakes in useful bias toward structures that worked before, think of it like reusing a trusted playbook and tweaking a couple of steps rather than writing a new one. This reuse works like a built in preference for simple explanations, which shrinks uncertainty and cuts compute versus full retraining. There is also a time link to Reinforcement Learning, which pushes value forward from rewards, while MAI reconstructs causes backward from memory. So planning can run forward, and inference can run backward, with both sides bootstrapping from partial information. The authors map this onto cortical columns, with feedforward pathways doing the updates and feedback pathways doing the retrieval, matching predictive coding style loops. The practical takeaway is energy efficient inference, store stable computation loops, start each problem near them, then make a tiny correction. ---- Paper – arxiv. org/abs/2508.14143 Paper Title: "Beyond Turing: Memory-Amortized Inference as a Foundation for Cognitive Computation"
Rohan Paul tweet media
English
22
52
339
24K
Roman retweetledi
Jack 🤖
Jack 🤖@JacklouisP·
Designing a humanoid robot isn't merely about replicating the human form. It's about understanding the rationale behind each joint and selectively choosing what to replicate. What's the best way to do this? First principles thinking. Here's a first principles approach to mechanical design in humanoid robotics, on the Soft Robotics Podcast: @GoingBallistic5 @MarwaEldiwiny
Jack 🤖 tweet mediaJack 🤖 tweet media
English
13
25
187
203.7K