Riccardo Mattivi (@rmattivi) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

Data Science meets classic poetry: a reinterpretation of Dante's Divine Comedy from the point of view of ML lifecycle! @rmattivi/if-dante-were-a-data-scientist-inferno-data-part-i-4d5ae073ff32" target="_blank" rel="nofollow noopener">medium.com/@rmattivi/if-d… #chatgpt used to augment my skills and adapt the most famous parts of • Inferno -> Data • Purgatorio -> Modelling • Paradiso -> Prod

English

0

1

7

1.8K

Riccardo Mattivi@rmattivi·4d

autoresearch, by agents

Leandro von Werra@lvwerra

Excited to release the ML intern! (slightly ahead of OpenAIs timeline) It's the result of months of careful design and tuning for a compute and hub centric agent harness: > give the model access to all the right docs and papers with minimal fraction > let it run experiments on fast CPU and GPU instances and easily investigate logs > push and pull datasets and models from and to the hub While general coding agents can do all this as well, making execution as seamless as possible gives the agent a significant advantage.

English

0

1

23

Riccardo Mattivi retweetledi

Leandro von Werra@lvwerra·4d

Excited to release the ML intern! (slightly ahead of OpenAIs timeline) It's the result of months of careful design and tuning for a compute and hub centric agent harness: > give the model access to all the right docs and papers with minimal fraction > let it run experiments on fast CPU and GPU instances and easily investigate logs > push and pull datasets and models from and to the hub While general coding agents can do all this as well, making execution as seamless as possible gives the agent a significant advantage.

Aksel@akseljoonas

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English

3

4

63

7.8K

Riccardo Mattivi retweetledi

Aksel@akseljoonas·4d

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English

122

593

4.4K

1M

Riccardo Mattivi retweetledi

François Chollet@fchollet·11 Nis

Good design is the art of packing 1,000 "hows" into a single "what". Good design is compression: making the numerator trend towards infinity while the denominator stays at 1.

English

30

19

333

30.9K

Riccardo Mattivi@rmattivi·12 Nis

Excellent visualization, and based on daily experiences. Curious to know more about the aggregates.

Sinéad O’Sullivan@SineadOS1

The protests in Ireland are not about just fuel! They are about the distance between Ireland on this graph and every other modern and developed economy. Ireland is second wealthiest but gets waaaaay less than any other country for that wealth. By a golden mile. That visual gap in this graph? That’s what people are protesting. It’s a lack of infrastructure and the everyday enshittification of services, the economy, and the additional difficulty of trying to live, relative to peers in any other country. It also highlights why people don’t get uniformly listened to! - because there is no government architecture to engage meaningfully across this huge gap. That gap is a three hour drive to work in traffic, a 14 month wait for an MRI, buses that don’t arrive, trains that don’t exist, schools that have no places for your kids, houses that are unaffordable, pubs that close before midnight, €12 sandwiches, expensive fuel. People feel this gap, even if they can’t explain it precisely. And that builds into resentment, and ultimately protest. Fuel just happened to be the next thing that could be pointed to, today.

English

0

37

Riccardo Mattivi@rmattivi·10 Nis

@BiancoDavinci Nothing happens

English

0

34

DaVinci@BiancoDavinci·1 Nis

This is a healing grid by Japanese artist Ryota Kanai. If you stare at the center, the irregularities start to heal themselves because your brain strongly prefers to see regular patterns.

English

338

3.4K

26.3K

4.1M

Riccardo Mattivi retweetledi

Raja Patnaik@RajaPatnaik·7 Nis

Very cool stuff out of @NousResearch. They open-sourced a system that lets Hermes agents evolve themselves — no GPU training required. It uses GEPA to automatically improve skills, prompts, and tool descriptions. Here's how it works:

English

7

45

469

40.7K

Riccardo Mattivi@rmattivi·9 Nis

Well said

François Chollet@fchollet

The new model from Meta is already looking like a disappointment: overoptimized for public benchmark numbers at the detriment of everything else. Knowing how to evaluate models in a way that correlates with actual usefulness is a core competency for AI labs, and any new lab is unlikely to be successful without first figuring that out.

English

0

91

Riccardo Mattivi retweetledi

François Chollet@fchollet·9 Nis

The new model from Meta is already looking like a disappointment: overoptimized for public benchmark numbers at the detriment of everything else. Knowing how to evaluate models in a way that correlates with actual usefulness is a core competency for AI labs, and any new lab is unlikely to be successful without first figuring that out.

English

96

104

2.2K

323.3K

Riccardo Mattivi retweetledi

Boris Cherny@bcherny·7 Nis

Mythos is very powerful, and should feel terrifying. I am proud of our approach to responsibly preview it with cyber defenders, rather than generally releasing it into the wild. Model card here: www-cdn.anthropic.com/53566bf5440a10…

Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English

581

613

9.9K

1.3M

Riccardo Mattivi retweetledi

Javi Lopez ⛩️@javilopen·3 Nis

NASA has launched a website where you can follow the Artemis II mission to the Moon in real time 👩‍🚀 Absolutely amazing. Link 🔗👇

English

124

2.1K

15.5K

1.2M

Riccardo Mattivi@rmattivi·3 Nis

Amazing work from @hla_michael. Looking forward to seeing this approach developed further.

Michael Hla@hla_michael

I trained an LLM from scratch on pre-1900 text to see if it could come up with quantum mechanics and relativity. While the model is too small to do meaningful reasoning, it has glimpses of intuition. When given observations from past landmark experiments, the model can declare that “light is made up of definite quantities of energy” and even suggest that gravity and acceleration are locally equivalent. I’m releasing the dataset + models and leave this as an open problem to the research community. I also include what this project has taught me about intelligence in a mini essay linked below. 🧵(1/n)

English

0

12

Riccardo Mattivi@rmattivi·3 Nis

@lydiahallie @bcherny when the real root cause will be fixed? These are pretty general, not very informative guidelines

English

0

17

Lydia Hallie ✨@lydiahallie·2 Nis

Thank you to everyone who spent time sending us feedback and reports. We've investigated and we're sorry this has been a bad experience. Here's what we found:

Lydia Hallie ✨@lydiahallie

We're aware people are hitting usage limits in Claude Code way faster than expected. Actively investigating, will share more when we have an update!

English

1.1K

216

3.5K

3M

Riccardo Mattivi retweetledi

Anthropic@AnthropicAI·2 Nis

New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.

English

1K

2.7K

17.7K

3.8M

Riccardo Mattivi@rmattivi·31 Mar

@lydiahallie @lydiahallie any updates on the 10x token consumption?

English

0

2

162

Lydia Hallie ✨@lydiahallie·30 Mar

Update: still working on this. It's the top priority for the team, I know this is blocking a lot of you. More as soon as we have it.

English

315

39

1.8K

255.1K

Lydia Hallie ✨@lydiahallie·30 Mar

We're aware people are hitting usage limits in Claude Code way faster than expected. Actively investigating, will share more when we have an update!

English

1.6K

747

13.6K

4.2M

Riccardo Mattivi@rmattivi·30 Mar

@Pokee_AI PokeeClaw

English

1

0

1

57

Pokee AI@Pokee_AI·30 Mar

OpenClaw doesn't belong in production. We built PokeeClaw — enterprise-secure AI agents, zero setup, 1,000+ app integrations. Try now: pokee.ai First 500 to follow @Pokee_AI, comment “PokeeClaw”, like & repost get 1 month free.

English

600

465

1.2K

1.1M

Riccardo Mattivi retweetledi

Boris Cherny@bcherny·30 Mar

I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes.

English

554

2.5K

23.2K

3.9M

Riccardo Mattivi@rmattivi·21 Mar

@heynavtoor Any other evaluation available beside the one in the paper?

English

0

65

Nav Toor@heynavtoor·21 Mar

🚨 Hedge fund managers are going to hate this. Someone just open sourced a system that does their entire job. 30.5% annualized returns. $0 in fees. It's called TradingAgents. Not one AI agent. An entire simulated trading firm. Analysts, researchers, traders, and risk managers. All AI. All arguing with each other before making a single trade. No Bloomberg Terminal. No $50K data feeds. No MBA required. Here's what's inside this thing: → 4 AI analysts scanning financials, news, social sentiment, and technicals → A Bull and Bear researcher that literally debate each other → A trader that synthesizes every argument into a final call → A risk management team that can veto any trade → A fund manager that approves or rejects execution Here's the wildest part: It beat every traditional trading strategy they benchmarked. Cumulative returns. Sharpe ratio. Max drawdown. All of them. Hedge funds charge 2% management + 20% performance fees for this exact workflow. This is free. 100% Open Source.

English

126

322

2.3K

308K

Riccardo Mattivi retweetledi

Thariq@trq212·21 Mar

I put a lot of heart into my technical writing, I hope it's useful to you all. 📌 Here's a pinned thread of everything I've written. (much of this will be posted on the Claude blog soon as well)

English

242

781

7.5K

1.2M

Riccardo Mattivi retweetledi

Aakash Gupta@aakashgupta·20 Mar

Cursor is raising at a $50 billion valuation on the claim that its “in-house models generate more code than almost any other LLMs in the world.” Less than 24 hours after launching Composer 2, a developer found the model ID in the API response: kimi-k2p5-rl-0317-s515-fast. That’s Moonshot AI’s Kimi K2.5 with reinforcement learning appended. A developer named Fynn was testing Cursor’s OpenAI-compatible base URL when the identifier leaked through the response headers. Moonshot’s head of pretraining, Yulun Du, confirmed on X that the tokenizer is identical to Kimi’s and questioned Cursor’s license compliance. Two other Moonshot employees posted confirmations. All three posts have since been deleted. This is the second time. When Cursor launched Composer 1 in October 2025, users across multiple countries reported the model spontaneously switching its inner monologue to Chinese mid-session. Kenneth Auchenberg, a partner at Alley Corp, posted a screenshot calling it a smoking gun. KR-Asia and 36Kr confirmed both Cursor and Windsurf were running fine-tuned Chinese open-weight models underneath. Cursor never disclosed what Composer 1 was built on. They shipped Composer 1.5 in February and moved on. The pattern: take a Chinese open-weight model, run RL on coding tasks, ship it as a proprietary breakthrough, publish a cost-performance chart comparing yourself against Opus 4.6 and GPT-5.4 without disclosing that your base model was free, then raise another round. That chart from the Composer 2 announcement deserves its own paragraph. Cursor plotted Composer 2 against frontier models on a price-vs-quality axis to argue they’d hit a superior tradeoff. What the chart doesn’t show is that Anthropic and OpenAI trained their models from scratch. Cursor took an open-weight model that Moonshot spent hundreds of millions developing, ran RL on top, and presented the output as evidence of in-house research. That’s margin arbitrage on someone else’s R&D dressed up as a benchmark slide. The license makes this more than an attribution oversight. Kimi K2.5 ships under a Modified MIT License with one clause designed for exactly this scenario: if your product exceeds $20 million in monthly revenue, you must prominently display “Kimi K2.5” on the user interface. Cursor’s ARR crossed $2 billion in February. That’s roughly $167 million per month, 8x the threshold. The clause covers derivative works explicitly. Cursor is valued at $29.3 billion and raising at $50 billion. Moonshot’s last reported valuation was $4.3 billion. The company worth 12x more took the smaller company’s model and shipped it as proprietary technology to justify a valuation built on the frontier lab narrative. Three Composer releases in five months. Composer 1 caught speaking Chinese. Composer 2 caught with a Kimi model ID in the API. A P0 incident this year. And a benchmark chart that compares an RL fine-tune against models requiring billions in training compute without disclosing the base was free. The question for investors in the $50 billion round: what exactly are you buying? A VS Code fork with strong distribution, or a frontier research lab? The model ID in the API answers that. If Moonshot doesn’t enforce this license against a company generating $2 billion annually from a derivative of their model, the attribution clause becomes decoration for every future open-weight release. Every AI lab watching this is running the same math: why open-source your model if companies with better distribution can strip attribution, call it proprietary, and raise at 12x your valuation? kimi-k2p5-rl-0317-s515-fast is the most expensive model ID leak in the history of AI licensing.

Harveen Singh Chadha@HarveenChadha

things are about to get interesting from here on

English

249

550

4.4K

1.4M

Riccardo Mattivi

Keşfet