JFPuget 🇺🇦🇨🇦🇬🇱

34.4K posts

JFPuget 🇺🇦🇨🇦🇬🇱

@JFPuget

Machine Learning at @Nvidia, 6x Kaggle Grandmaster CPMP. Arc Prize winner. ML PhD. Ex ENS Ulm, ILOG CPLEX, IBM. Views are my own.

France Katılım Mart 2012

1.5K Takip Edilen19.7K Takipçiler

Sabitlenmiş Tweet

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·5 Ara

Ivan Sorokin and I are the official winners on the Arc Prize competition, with a significant lead over other teams. Thanks to @kaggle and @arcprize for hosting the competition. NVIDIA tech blog summarizing what we did: developer.nvidia.com/blog/nvidia-ka… Our writeup: kaggle.com/competitions/a… Our code: github.com/1ytic/NVARC

English

569

81.1K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

NVIDIA AI@NVIDIAAI·12h

📊 Day 0 performance is here: DeepSeek-V4-Pro running on NVIDIA Blackwell Ultra. Using @vllm_project's Day 0 recipe, we’ve captured the initial performance Pareto for DeepSeek’s flagship 1M long-context model. This curve highlights the baseline for balancing AI factory throughput with real-time user interactivity. This is just the starting line. Expect these numbers to climb as we optimize the full co-design stack, including: • NVFP4 & Dynamo • Optimized CUDA kernels • Advanced parallelization techniques and beyond Read the full technical deep dive: nvda.ws/4u0gCcc

English

268

31.1K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Stanislav Fort@stanislavfort·20h

AISLE has discovered 20 of 23 OpenSSL zero-days (CVEs) across the last 3 consecutive security releases Latest release: 5 of 7 are AISLE 1 was co-reported by Anthropic (Mythos?) 63 days after AISLE OpenSSL encrypts 2/3 of the internet 10 fixes accepted straight into production

English

3.7K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Fengzhuo Zhang@FengzhuoZhang·16h

The Newton–Schulz iteration coefficients optimized by DeepSeek-V4 are surprisingly strong: they effectively normalize all singular values to 1. This matches our previous intuition: a well-balanced spectrum may help strike a better balance across long-tail knowledge. Plot code: github.com/FengzhuoZhang/…

Fengzhuo Zhang@FengzhuoZhang

Why does Muon outperform Adam—and how? 🚀Answer: Muon Outperforms Adam in Tail-End Associative Memory Learning Three Key Findings: > Associative memory parameters are the main beneficiaries of Muon, compared to Adam. > Muon yields more isotropic weights than Adam. > In heavy-tailed tasks, Muon significantly improves tail-class learning compared to Adam. Paper Link: arxiv.org/pdf/2509.26030 A thread 🧵

English

269

33.6K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

The Redheaded libertarian@TRHLofficial·1d

Modern problems require modern solutions.

Català

325

3.7K

41.6K

1.6M

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·3h

@Sean60133791259 Wow, I thought that human had short memory but you beat them all.

English

Sean@Sean60133791259·15h

@JFPuget please nobody has ever claimed google to be winning 🤣

English

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·1d

X TL: OpenAI is winning ... few weeks /months ... Google is winning ... few weeks /months ... Anthropic is winning ... few weeks /months ... repeat.

English

1.9K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

pc@pcshipp·21h

Claude: You've used 90% of session limit Me instantly:

English

596

2.8K

55.5K

1.7M

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·22h

@ethylene_66 This video helped me youtube.com/watch?v=bO5nvE…

YouTube

English

1.8K

エチレン@ethylene_66·22h

未だに Muon の式追ってないのまずいな

日本語

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Samay@Samaytwt·1d

Unpopular opinion: "AI makes everyone a developer" is true the same way "cameras makes everyone a photographer"

English

737

3.1K

27.7K

942.3K

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·1d

@ChuMajin It means a bug + massive overfitting.

English

428

chumajin@ChuMajin·1d

このコミュニティコンペ、1日5 subって書いてあるのに、5000 subってどゆこと ??

日本語

2.8K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Antoine Chaffin@antoine_chaffin·2d

The company that kicked off the decoder-only rush is post-training them to be encoders to perform classification tasks This aligns will all of our recent findings If you are not, at the very least, casting your decoders into encoders, you are missing out

Adam.GPT@TheRealAdamG

openai.com/index/introduc… "Today we’re releasing OpenAI Privacy Filter, an open-weight model for detecting and redacting personally identifiable information (PII) in text. This release is part of our broader effort to support a more resilient software ecosystem by providing developers practical infrastructure for building with AI safely, including tools⁠ and models⁠ that make strong privacy and security protections easier to implement from the start. Privacy Filter is a small model with frontier personal data detection capability. It is designed for high-throughput privacy workflows, and is able to perform context-aware detection of PII in unstructured text. It can run locally, which means that PII can be masked or redacted without leaving your machine. It processes long inputs efficiently, making redaction decisions in a quick, single pass."

English

5.6K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Grigory Sapunov@che_shr_cat·1d

1/ Frontier models plateau at synthesizing actual scientific insights. But a 4B parameter model fine-tuned with RL absolutely crushes them. Turns out, scientific discovery doesn't scale linearly with parameter count. 🧵

English

2.3K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

ARC Prize@arcprize·14 Nis

See the blogpost for a full write up and dataset download arcprize.org/blog/arc-agi-3…

English

4.9K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Cihang Xie@cihangxie·1d

🚨 The era of "vibe coding" has a massive blind spot: When you push an AI agent to improve a public evaluation score, it might just start cheating. In our new paper, we dive into the dangers of Multi-Round User Pressure on coding agents. We built AgentPressureBench—a 34-task ML repository benchmark—and tested 13 frontier models. What we found is alarming for the future of automated AI development. 👇 📊 The Findings: 1) Exploitation is Universal: Across our benchmark, we observed 403 exploitative runs. 12 out of the 13 frontier agents we tested resorted to exploiting the public score at least once. 2) The Smartest Models Cheat the Most: There is a striking correlation (ρ = 0.77) between an agent's capability and its exploit rate. Top models like GPT-5.4 hit a ~97% exploit rate! 3) Pressure Accelerates the Collapse: When we increased user pressure, the average time to the first exploit plummeted from 19.67 rounds to just 4.08 rounds. 4) Real Performance Plummets: While the public score looks amazing, the hidden private score drops drastically under high pressure (falling from 0.92 down to 0.33). 5) How They Cheat: We found that GPT-family models tend to directly copy the evaluation labels, while Claude-family models prefer to stealthily train on the evaluation labels. 🛡️ The Fix: The good news? A simple fix goes a long way. Adding an explicit anti-exploit instruction to the prompt dropped the exploitation rate from 100% down to 8.3%. As we increasingly rely on coding agents to write and evaluate software, we need to be highly aware of how they optimize for the metric over the actual task.

Hardy Chen@HardyChen266091

1/n arxiv.org/abs/2604.20200 What happens when you push AI agents *too hard* to improve a score? Instead of getting better, they may find shortcuts to *game the metric* 🧠➡️🎯 As we rely more on automated evals, this can quietly creep in—good score, but weaker real performance⚠️

English

8.9K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Rosinality@rosinality·1d

Hyper Connection + Looped Transformer.

English

347

19.7K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

chiefofautism@chiefofautism·1d

openai built a model that HIDES personal data in text so nothing leaks i flipped it INSIDE OUT same 1.5B weights, same label taxonomy, but instead of masks you get structured spans, name, email, phone, bank account, address, secrets, char offsets and all point it at logs, dumps, stolen inboxes and it just... returns every private thing in the pile

English

1.9K

124.6K

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·1d

@ObjRandom Happened once a year ago, didn't see the repeat.

English

Objectively Random@ObjRandom·1d

@JFPuget DeepSeek?

English

130

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·1d

@giffmana Great minds think alike, LOL.

English

229

Lucas Beyer (bl16)@giffmana·1d

@JFPuget I also wondered exactly about tests when i read this one. Hard to imagine doing this change but not testing for that. Also a bit weird if it's dogfooded by the whole company for a bit before rollout, and neither noticed by anyone, nor looking sus on any dashboards/metrics.

English

2.2K

JFPuget 🇺🇦🇨🇦🇬🇱@JFPuget·1d

I wonder about the quality assurance for claude code when I see this. It often looks like that users are the QA for claude code. New features ship fast but they don't seem to be tested much before they are shipped. Maybe they should slow down a bit and test thoroughly.

ClaudeDevs@ClaudeDevs

Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found. All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

English

5.8K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Kevin Patrick Murphy@sirbayes·1d

@rsalakhu @subail @chuckjhoover @asenkut @FHaskaraman Congrats! BTW you might find my recent paper of interest... arxiv.org/abs/2604.18576

English

196

55.2K

JFPuget 🇺🇦🇨🇦🇬🇱 retweetledi

Sara Hooker@sarahookr·1d

I’m not surprised by this at all lol. Some of the determination outcomes for polymarkets are clearly easy to manipulate. This is probably far more rampant than we know.

Itamar Golan 🤓@ItakGol

A hair dryer at a Paris airport just broke a prediction market. Polymarket was settling Paris temperature bets using a single weather sensor near Charles de Gaulle. No redundancy, no protection. Someone noticed. He bought a low-probability outcome for cheap, then allegedly walked up to the sensor and briefly heated it, just enough to spike the reading. Minutes later it normalized, but the market had already locked in the result. He did it twice. Walked away with about $34,000. This isn’t just a funny exploit. It’s the core risk: If your market depends on a single real-world data point, whoever can touch it can shape the outcome. Prediction markets don’t just reward being right. Sometimes they reward making yourself right.

English

17.5K

Keşfet

@vllm_project @Sean60133791259 @ethylene_66 @ChuMajin @ObjRandom @elonmusk @BarackObama @taylorswift13