Riccardo Mattivi

731 posts

Riccardo Mattivi banner
Riccardo Mattivi

Riccardo Mattivi

@rmattivi

Director AI/ML Engineer at Mastercard. Interested in art, music, crypto and literature.

Dublin City, Ireland Katılım Ekim 2010
2.5K Takip Edilen273 Takipçiler
Sabitlenmiş Tweet
Riccardo Mattivi
Riccardo Mattivi@rmattivi·
Data Science meets classic poetry: a reinterpretation of Dante's Divine Comedy from the point of view of ML lifecycle! @rmattivi/if-dante-were-a-data-scientist-inferno-data-part-i-4d5ae073ff32" target="_blank" rel="nofollow noopener">medium.com/@rmattivi/if-d… #chatgpt used to augment my skills and adapt the most famous parts of • Inferno -> Data • Purgatorio -> Modelling • Paradiso -> Prod
English
0
1
7
1.8K
Riccardo Mattivi retweetledi
Leandro von Werra
Leandro von Werra@lvwerra·
Excited to release the ML intern! (slightly ahead of OpenAIs timeline) It's the result of months of careful design and tuning for a compute and hub centric agent harness: > give the model access to all the right docs and papers with minimal fraction > let it run experiments on fast CPU and GPU instances and easily investigate logs > push and pull datasets and models from and to the hub While general coding agents can do all this as well, making execution as seamless as possible gives the agent a significant advantage.
Aksel@akseljoonas

Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.

English
3
4
63
7.8K
Riccardo Mattivi retweetledi
Aksel
Aksel@akseljoonas·
Introducing ml-intern, the agent that just automated the post-training team @huggingface It's an open-source implementation of the real research loop that our ML researchers do every day. You give it a prompt, it researches papers, goes through citations, implements ideas in GPU sandboxes, iterates and builds deeply research-backed models for any use case. All built on the Hugging Face ecosystem. It can pull off crazy things: We made it train the best model for scientific reasoning. It went through citations from the official benchmark paper. Found OpenScience and NemoTron-CrossThink, added 7 difficulty-filtered dataset variants from ARC/SciQ/MMLU, and ran 12 SFT runs on Qwen3-1.7B. This pushed the score 10% → 32% on GPQA in under 10h. Claude Code's best: 22.99%. In healthcare settings it inspected available datasets, concluded they were too low quality, and wrote a script to generate 1100 synthetic data points from scratch for emergencies, hedging, multilingual etc. Then upsampled 50x for training. Beat Codex on HealthBench by 60%. For competitive mathematics, it wrote a full GRPO script, launched training with A100 GPUs on hf.co/spaces, watched rewards claim and then collapse, and ran ablations until it succeeded. All fully backed by papers, autonomously. How it works? ml-intern makes full use of the HF ecosystem: - finds papers on arxiv and hf.co/papers, reads them fully, walks citation graphs, pulls datasets referenced in methodology sections and on hf.co/datasets - browses the Hub, reads recent docs, inspects datasets and reformats them before training so it doesn't waste GPU hours on bad data - launches training jobs on HF Jobs if no local GPUs are available, monitors runs, reads its own eval outputs, diagnoses failures, retrains ml-intern deeply embodies how researchers work and think. It knows how data should look like and what good models feel like. Releasing it today as a CLI and a web app you can use from your phone/desktop. CLI: github.com/huggingface/ml… Web + mobile: huggingface.co/spaces/smolage… And the best part? We also provisioned 1k$ GPU resources and Anthropic credits for the quickest among you to use.
English
122
593
4.4K
1M
Riccardo Mattivi retweetledi
François Chollet
François Chollet@fchollet·
Good design is the art of packing 1,000 "hows" into a single "what". Good design is compression: making the numerator trend towards infinity while the denominator stays at 1.
English
30
19
333
30.9K
DaVinci
DaVinci@BiancoDavinci·
This is a healing grid by Japanese artist Ryota Kanai. If you stare at the center, the irregularities start to heal themselves because your brain strongly prefers to see regular patterns.
DaVinci tweet media
English
338
3.4K
26.3K
4.1M
Riccardo Mattivi retweetledi
Raja Patnaik
Raja Patnaik@RajaPatnaik·
Very cool stuff out of @NousResearch. They open-sourced a system that lets Hermes agents evolve themselves — no GPU training required. It uses GEPA to automatically improve skills, prompts, and tool descriptions. Here's how it works:
Raja Patnaik tweet media
English
7
45
469
40.7K
Riccardo Mattivi retweetledi
François Chollet
François Chollet@fchollet·
The new model from Meta is already looking like a disappointment: overoptimized for public benchmark numbers at the detriment of everything else. Knowing how to evaluate models in a way that correlates with actual usefulness is a core competency for AI labs, and any new lab is unlikely to be successful without first figuring that out.
English
96
104
2.2K
323.3K
Riccardo Mattivi retweetledi
Boris Cherny
Boris Cherny@bcherny·
Mythos is very powerful, and should feel terrifying. I am proud of our approach to responsibly preview it with cyber defenders, rather than generally releasing it into the wild. Model card here: www-cdn.anthropic.com/53566bf5440a10…
Anthropic@AnthropicAI

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

English
581
613
9.9K
1.3M
Riccardo Mattivi retweetledi
Javi Lopez ⛩️
Javi Lopez ⛩️@javilopen·
NASA has launched a website where you can follow the Artemis II mission to the Moon in real time 👩‍🚀 Absolutely amazing. Link 🔗👇
English
124
2.1K
15.5K
1.2M
Riccardo Mattivi retweetledi
Anthropic
Anthropic@AnthropicAI·
New Anthropic research: Emotion concepts and their function in a large language model. All LLMs sometimes act like they have emotions. But why? We found internal representations of emotion concepts that can drive Claude’s behavior, sometimes in surprising ways.
English
1K
2.7K
17.7K
3.8M
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
Update: still working on this. It's the top priority for the team, I know this is blocking a lot of you. More as soon as we have it.
English
315
39
1.8K
255.1K
Lydia Hallie ✨
Lydia Hallie ✨@lydiahallie·
We're aware people are hitting usage limits in Claude Code way faster than expected. Actively investigating, will share more when we have an update!
English
1.6K
747
13.6K
4.2M
Pokee AI
Pokee AI@Pokee_AI·
OpenClaw doesn't belong in production. We built PokeeClaw — enterprise-secure AI agents, zero setup, 1,000+ app integrations. Try now: pokee.ai First 500 to follow @Pokee_AI, comment “PokeeClaw”, like & repost get 1 month free.
English
600
465
1.2K
1.1M
Riccardo Mattivi retweetledi
Boris Cherny
Boris Cherny@bcherny·
I wanted to share a bunch of my favorite hidden and under-utilized features in Claude Code. I'll focus on the ones I use the most. Here goes.
English
554
2.5K
23.2K
3.9M
Nav Toor
Nav Toor@heynavtoor·
🚨 Hedge fund managers are going to hate this. Someone just open sourced a system that does their entire job. 30.5% annualized returns. $0 in fees. It's called TradingAgents. Not one AI agent. An entire simulated trading firm. Analysts, researchers, traders, and risk managers. All AI. All arguing with each other before making a single trade. No Bloomberg Terminal. No $50K data feeds. No MBA required. Here's what's inside this thing: → 4 AI analysts scanning financials, news, social sentiment, and technicals → A Bull and Bear researcher that literally debate each other → A trader that synthesizes every argument into a final call → A risk management team that can veto any trade → A fund manager that approves or rejects execution Here's the wildest part: It beat every traditional trading strategy they benchmarked. Cumulative returns. Sharpe ratio. Max drawdown. All of them. Hedge funds charge 2% management + 20% performance fees for this exact workflow. This is free. 100% Open Source.
Nav Toor tweet media
English
126
322
2.3K
308K
Riccardo Mattivi retweetledi
Thariq
Thariq@trq212·
I put a lot of heart into my technical writing, I hope it's useful to you all. 📌 Here's a pinned thread of everything I've written. (much of this will be posted on the Claude blog soon as well)
English
242
781
7.5K
1.2M
Riccardo Mattivi retweetledi
Aakash Gupta
Aakash Gupta@aakashgupta·
Cursor is raising at a $50 billion valuation on the claim that its “in-house models generate more code than almost any other LLMs in the world.” Less than 24 hours after launching Composer 2, a developer found the model ID in the API response: kimi-k2p5-rl-0317-s515-fast. That’s Moonshot AI’s Kimi K2.5 with reinforcement learning appended. A developer named Fynn was testing Cursor’s OpenAI-compatible base URL when the identifier leaked through the response headers. Moonshot’s head of pretraining, Yulun Du, confirmed on X that the tokenizer is identical to Kimi’s and questioned Cursor’s license compliance. Two other Moonshot employees posted confirmations. All three posts have since been deleted. This is the second time. When Cursor launched Composer 1 in October 2025, users across multiple countries reported the model spontaneously switching its inner monologue to Chinese mid-session. Kenneth Auchenberg, a partner at Alley Corp, posted a screenshot calling it a smoking gun. KR-Asia and 36Kr confirmed both Cursor and Windsurf were running fine-tuned Chinese open-weight models underneath. Cursor never disclosed what Composer 1 was built on. They shipped Composer 1.5 in February and moved on. The pattern: take a Chinese open-weight model, run RL on coding tasks, ship it as a proprietary breakthrough, publish a cost-performance chart comparing yourself against Opus 4.6 and GPT-5.4 without disclosing that your base model was free, then raise another round. That chart from the Composer 2 announcement deserves its own paragraph. Cursor plotted Composer 2 against frontier models on a price-vs-quality axis to argue they’d hit a superior tradeoff. What the chart doesn’t show is that Anthropic and OpenAI trained their models from scratch. Cursor took an open-weight model that Moonshot spent hundreds of millions developing, ran RL on top, and presented the output as evidence of in-house research. That’s margin arbitrage on someone else’s R&D dressed up as a benchmark slide. The license makes this more than an attribution oversight. Kimi K2.5 ships under a Modified MIT License with one clause designed for exactly this scenario: if your product exceeds $20 million in monthly revenue, you must prominently display “Kimi K2.5” on the user interface. Cursor’s ARR crossed $2 billion in February. That’s roughly $167 million per month, 8x the threshold. The clause covers derivative works explicitly. Cursor is valued at $29.3 billion and raising at $50 billion. Moonshot’s last reported valuation was $4.3 billion. The company worth 12x more took the smaller company’s model and shipped it as proprietary technology to justify a valuation built on the frontier lab narrative. Three Composer releases in five months. Composer 1 caught speaking Chinese. Composer 2 caught with a Kimi model ID in the API. A P0 incident this year. And a benchmark chart that compares an RL fine-tune against models requiring billions in training compute without disclosing the base was free. The question for investors in the $50 billion round: what exactly are you buying? A VS Code fork with strong distribution, or a frontier research lab? The model ID in the API answers that. If Moonshot doesn’t enforce this license against a company generating $2 billion annually from a derivative of their model, the attribution clause becomes decoration for every future open-weight release. Every AI lab watching this is running the same math: why open-source your model if companies with better distribution can strip attribution, call it proprietary, and raise at 12x your valuation? kimi-k2p5-rl-0317-s515-fast is the most expensive model ID leak in the history of AI licensing.
Harveen Singh Chadha@HarveenChadha

things are about to get interesting from here on

English
249
550
4.4K
1.4M