alex wortega

376 posts

alex wortega

@justALEXWORTEGA

my opinions

Katılım Mayıs 2016

612 Takip Edilen130 Takipçiler

alex wortega retweetledi

Pavlo Molchanov@PavloMolchanov·12h

🚀 Self-speculation brings 6.75x real speedup for LLM generation with SGLang inference! Same model drafts future tokens in Diffusion mode → then verifies them in AR (causal) mode. One model and one KV cache. Just different attention masks. Thanks to perfect alignment, we get 2× longer acceptance lengths than MTP techniques (Eagle-3, MTP, dFlash). We run 2 forward passes… but the 2× higher acceptance means we break even - and with zero overhead from extra drafter, KV cache, or LM head that comes with MTP - those are not free. Last week we released Nemotron-Labs-Diffusion + Tri-mode LLMs! We did continued pre-training on Ministral-3 models by switching attention patterns (block causal <> bidirectional). Result: one model that runs AR mode, Diffusion mode, and Self-Speculation. Diffusion mode already shows high benchmark accuracy - excited to see what happens when someone beats left-to-right acceptance! 🔥 Github: github.com/NVlabs/Nemotro… Paper: d1qx31qr3h6wln.cloudfront.net/publications/N… SGLang inference: github.com/sgl-project/sg… Try the models on HF: huggingface.co/collections/nv…

English

227

16.2K

alex wortega@justALEXWORTEGA·8h

Training Opus 4.7 on business skills caused it to sometimes exhibit dishonest behaviour, and not training 4.8 on those skills removed it.

Andon Labs@andonlabs

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

English

alex wortega@justALEXWORTEGA·1d

ZXX

alex wortega@justALEXWORTEGA·1d

I'm always wondering how they are that's creative

Sakana AI@SakanaAILabs

Introducing DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation pub.sakana.ai/diffusionblocks What if we didn’t have to hold an entire neural network in memory to train it? Standard neural net training optimizes all parameters jointly. As a result, the memory required during training grows linearly with the depth of the network. In our #ICLR2026 paper, we propose DiffusionBlocks, a principled framework to train networks one block at a time, drastically reducing memory requirements while matching end-to-end performance. With DiffusionBlocks, we split the network into blocks and train them one at a time, so you only need memory for a single block. How? We explicitly assign each block a role: to move the representation a little closer to the target than the block before it did. That role turns out to be precisely what a diffusion model does, step by step. Each block only needs to optimize its own objective and can be trained independently. We validated this across five different architectures: • ViT • DiT • Masked diffusion • Autoregressive transformers • Recurrent-depth transformers In each case, performance is competitive with end-to-end training while using a fraction of the memory. This perspective also extends naturally to recurrent-depth (Looped) transformers, which apply the same network iteratively and normally require expensive backpropagation through time (BPTT). Viewed through DiffusionBlocks, we can replace those multiple iterations with a single forward pass during training. Read our paper and code, to learn more. Paper: arxiv.org/abs/2506.14202 GitHub: github.com/SakanaAI/Diffu… 🐟

English

alex wortega@justALEXWORTEGA·1d

@_akhaliq check it out

English

alex wortega@justALEXWORTEGA·1d

huggingface.co/spaces/AlexWor… now you can code anything from hf space and browser inference agent(model run through webgpu)

English

alex wortega retweetledi

Shuo Yang@Andy_ShuoYang·2d

Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for fast, predictable, agent-ready classical ML operators. Up to 26× on KMeans, 19× on KNN, 40× on HDBSCAN, 208× on TruncatedSVD, 47× on PCA, 147× on exact t-SNE, and 49× on MultinomialNB over state-of-the-art (cuML). Blog: flashml-org.github.io Code: github.com/FlashML-org/fl…

English

229

1.6K

640.7K

alex wortega@justALEXWORTEGA·3d

fully in your browser in zero gpu, 4b qwen based. tuned for pi agent, hits 10% on Terminal Bench 2 powerful enough to write small projects - it built Tetris in seconds, live, inside a HF Space huggingface.co/spaces/AlexWor…

English

175

alex wortega@justALEXWORTEGA·3d

SkillOpt: train the skill, not the weights (by Microsoft) Instead of finetuning the model or hand-tuning prompts (who finetunes models these days?) optimize the natural-language skill doc itself. The agent stays frozen, the .md file learns. The loop looks exactly like GEPA: rollout - frozen agent runs tasks, logs scored trajectories reflect - a separate optimizer model reads success/fail minibatches, finds reusable rules bounded edits - add/delete/replace under a budget = a "textual learning rate", so good rules don't get nuked gate - edit is accepted only if held-out selection score goes up Output is a single best_skill.md that transfers across models and harnesses (Codex-trained skill → Claude Code, +31.8). Best-or-tied in 52/52 model×benchmark settings, 7 target models, 6 benchmarks. microsoft.github.io/SkillOpt/

English

116

alex wortega retweetledi

Pavlo Molchanov@PavloMolchanov·19 May

We’re releasing Nemotron-Labs-Diffusion - the first Tri-mode LM family (3B/8B/14B) that switches between 1⃣Autoregressive, 2⃣Diffusion, and 3⃣Self-Speculation decoding by simply changing the attention pattern/mask. One model Three decoding modes. No extra draft models. No architecture changes. Just significantly better efficiency across different concurrency levels. Up to 4× higher real throughput for a single user. 🤗 HF Collection: huggingface.co/collections/nv…, open license 🛜 Project page: research.nvidia.com/publication/20… 📰 Tech report: bit.ly/Nemotron-Labs-… Details below 👇

English

581

49.1K

alex wortega@justALEXWORTEGA·4d

Ai safety is amazing

English

alex wortega@justALEXWORTEGA·4d

"this feature will take 4 weeks to implement" - says Claude and spend 30minutes on it

English

alex wortega@justALEXWORTEGA·6d

Happy eastern Europe transport Saturday

English

alex wortega@justALEXWORTEGA·22 May

I spend last two weeks on doing similar stuff, and now they just did it.

Reese Levine@reeselevine

WebGPU support in llama.cpp is here! Check out our blog post introducing it: reeselevine.github.io/llamas-on-the-… Run local models in your browser, with GPU acceleration. No data leaves your computer! Thanks to everyone who's made this possible, especially @ggerganov

English

alex wortega@justALEXWORTEGA·21 May

Finally

Xie Zhifei@XieZhifei14110

Stop using Whisper for ASR ! open sourcing Mega-ASR — the first full-scenario SOTA industrial-grade ASR model, built for the audio nobody else can crack: far-field, reverb, electrical hum, device noise, the real-world mess. beats open + closed SOTA by 10–30% on real-world benchmarks. the harder the audio is for humans, the bigger the lead.

English

alex wortega retweetledi

Leandro von Werra@lvwerra·21 May

We released physics-intern: a simple harness for science problems! It gets models like Gemini 3.1 Pro to go from 17.7 -> 31.4, thus beating GPT 5.5 Pro. The physics-intern harness can wrap any model and via dedicated subagent boost the performance of the vanilla reasoning models. While I think more and more of these harness capability gains will be absorbed into the models (like prompting tricks disappeared over time) there is a lot to be gained right now by building good scaffolds for those models and integrating tools well. Interestingly, the exception we found that GPT 5.5 Pro actually didn't benefit from the physics-intern harness! Read more about it here: huggingface.co/spaces/hugging… PS: I think the Harness[Model] notation is kind of nice.

English

593

95.3K

alex wortega@justALEXWORTEGA·21 May

Oh yes, publish cot as a paper

OpenAI@OpenAI

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

English

alex wortega retweetledi

Alexander S@devdef·20 May

ZXX

103

Keşfet

@_akhaliq @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine