JFPuget 🇺🇦🇨🇦🇬🇱

34.4K posts

JFPuget 🇺🇦🇨🇦🇬🇱 banner
JFPuget 🇺🇦🇨🇦🇬🇱

JFPuget 🇺🇦🇨🇦🇬🇱

@JFPuget

Machine Learning at @Nvidia, 6x Kaggle Grandmaster CPMP. Arc Prize winner. ML PhD. Ex ENS Ulm, ILOG CPLEX, IBM. Views are my own.

France Katılım Mart 2012
1.5K Takip Edilen19.7K Takipçiler
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
NVIDIA AI
NVIDIA AI@NVIDIAAI·
📊 Day 0 performance is here: DeepSeek-V4-Pro running on NVIDIA Blackwell Ultra. Using @vllm_project's Day 0 recipe, we’ve captured the initial performance Pareto for DeepSeek’s flagship 1M long-context model. This curve highlights the baseline for balancing AI factory throughput with real-time user interactivity. This is just the starting line. Expect these numbers to climb as we optimize the full co-design stack, including: • NVFP4 & Dynamo • Optimized CUDA kernels • Advanced parallelization techniques and beyond Read the full technical deep dive: nvda.ws/4u0gCcc
NVIDIA AI tweet media
English
15
25
268
31.1K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Stanislav Fort
Stanislav Fort@stanislavfort·
AISLE has discovered 20 of 23 OpenSSL zero-days (CVEs) across the last 3 consecutive security releases Latest release: 5 of 7 are AISLE 1 was co-reported by Anthropic (Mythos?) 63 days after AISLE OpenSSL encrypts 2/3 of the internet 10 fixes accepted straight into production
Stanislav Fort tweet media
English
2
16
54
3.7K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Fengzhuo Zhang
Fengzhuo Zhang@FengzhuoZhang·
The Newton–Schulz iteration coefficients optimized by DeepSeek-V4 are surprisingly strong: they effectively normalize all singular values to 1. This matches our previous intuition: a well-balanced spectrum may help strike a better balance across long-tail knowledge. Plot code: github.com/FengzhuoZhang/…
Fengzhuo Zhang tweet mediaFengzhuo Zhang tweet media
Fengzhuo Zhang@FengzhuoZhang

Why does Muon outperform Adam—and how? 🚀Answer: Muon Outperforms Adam in Tail-End Associative Memory Learning Three Key Findings: > Associative memory parameters are the main beneficiaries of Muon, compared to Adam. > Muon yields more isotropic weights than Adam. > In heavy-tailed tasks, Muon significantly improves tail-class learning compared to Adam. Paper Link: arxiv.org/pdf/2509.26030 A thread 🧵

English
6
34
269
33.6K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
The Redheaded libertarian
The Redheaded libertarian@TRHLofficial·
Modern problems require modern solutions.
The Redheaded libertarian tweet media
Català
325
3.7K
41.6K
1.6M
Sean
Sean@Sean60133791259·
@JFPuget please nobody has ever claimed google to be winning 🤣
English
1
0
2
9
JFPuget 🇺🇦🇨🇦🇬🇱
X TL: OpenAI is winning ... few weeks /months ... Google is winning ... few weeks /months ... Anthropic is winning ... few weeks /months ... repeat.
English
5
2
42
1.9K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
pc
pc@pcshipp·
Claude: You've used 90% of session limit Me instantly:
pc tweet media
English
596
2.8K
55.5K
1.7M
エチレン
エチレン@ethylene_66·
未だに Muon の式追ってないのまずいな
日本語
1
0
9
3K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Samay
Samay@Samaytwt·
Unpopular opinion: "AI makes everyone a developer" is true the same way "cameras makes everyone a photographer"
Samay tweet media
English
737
3.1K
27.7K
942.3K
chumajin
chumajin@ChuMajin·
このコミュニティコンペ、1日5 subって書いてあるのに、5000 subってどゆこと ??
chumajin tweet media
日本語
2
0
16
2.8K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Antoine Chaffin
Antoine Chaffin@antoine_chaffin·
The company that kicked off the decoder-only rush is post-training them to be encoders to perform classification tasks This aligns will all of our recent findings If you are not, at the very least, casting your decoders into encoders, you are missing out
Adam.GPT@TheRealAdamG

openai.com/index/introduc… "Today we’re releasing OpenAI Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. This release is part of our broader effort to support a more resilient software ecosystem by providing developers practical infrastructure for building with AI safely, including tools⁠ and models⁠ that make strong privacy and security protections easier to implement from the start. Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass."

English
5
6
58
5.6K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Grigory Sapunov
Grigory Sapunov@che_shr_cat·
1/ Frontier models plateau at synthesizing actual scientific insights. But a 4B parameter model fine-tuned with RL absolutely crushes them. Turns out, scientific discovery doesn't scale linearly with parameter count. 🧵
Grigory Sapunov tweet media
English
7
8
38
2.3K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Cihang Xie
Cihang Xie@cihangxie·
🚨 The era of "vibe coding" has a massive blind spot: When you push an AI agent to improve a public evaluation score, it might just start cheating. In our new paper, we dive into the dangers of Multi-Round User Pressure on coding agents. We built AgentPressureBench—a 34-task ML repository benchmark—and tested 13 frontier models. What we found is alarming for the future of automated AI development. 👇 📊 The Findings: 1) Exploitation is Universal: Across our benchmark, we observed 403 exploitative runs. 12 out of the 13 frontier agents we tested resorted to exploiting the public score at least once. 2) The Smartest Models Cheat the Most: There is a striking correlation (ρ = 0.77) between an agent's capability and its exploit rate. Top models like GPT-5.4 hit a ~97% exploit rate! 3) Pressure Accelerates the Collapse: When we increased user pressure, the average time to the first exploit plummeted from 19.67 rounds to just 4.08 rounds. 4) Real Performance Plummets: While the public score looks amazing, the hidden private score drops drastically under high pressure (falling from 0.92 down to 0.33). 5) How They Cheat: We found that GPT-family models tend to directly copy the evaluation labels, while Claude-family models prefer to stealthily train on the evaluation labels. 🛡️ The Fix: The good news? A simple fix goes a long way. Adding an explicit anti-exploit instruction to the prompt dropped the exploitation rate from 100% down to 8.3%. As we increasingly rely on coding agents to write and evaluate software, we need to be highly aware of how they optimize for the metric over the actual task.
Cihang Xie tweet media
Hardy Chen@HardyChen266091

1/n arxiv.org/abs/2604.20200 What happens when you push AI agents *too hard* to improve a score? Instead of getting better, they may find shortcuts to *game the metric* 🧠➡️🎯 As we rely more on automated evals, this can quietly creep in—good score, but weaker real performance⚠️

English
4
16
36
8.9K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
Rosinality
Rosinality@rosinality·
Hyper Connection + Looped Transformer.
Rosinality tweet media
English
4
48
347
19.7K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi
chiefofautism
chiefofautism@chiefofautism·
openai built a model that HIDES personal data in text so nothing leaks i flipped it INSIDE OUT same 1.5B weights, same label taxonomy, but instead of masks you get structured spans, name, email, phone, bank account, address, secrets, char offsets and all point it at logs, dumps, stolen inboxes and it just... returns every private thing in the pile
chiefofautism tweet media
English
51
91
1.9K
124.6K
Lucas Beyer (bl16)
Lucas Beyer (bl16)@giffmana·
@JFPuget I also wondered exactly about tests when i read this one. Hard to imagine doing this change but not testing for that. Also a bit weird if it's dogfooded by the whole company for a bit before rollout, and neither noticed by anyone, nor looking sus on any dashboards/metrics.
English
1
0
23
2.2K
JFPuget 🇺🇦🇨🇦🇬🇱
I wonder about the quality assurance for claude code when I see this. It often looks like that users are the QA for claude code. New features ship fast but they don't seem to be tested much before they are shipped. Maybe they should slow down a bit and test thoroughly.
JFPuget 🇺🇦🇨🇦🇬🇱 tweet media
ClaudeDevs@ClaudeDevs

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

English
6
1
40
5.8K
JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi