Simone Romano ⚗️👁️💭🏕️ retweetledi
Simone Romano ⚗️👁️💭🏕️
5.8K posts

Simone Romano ⚗️👁️💭🏕️
@ialuronico
🖤 machine learning products. X-(Bigtech+, researcher)
Helsinki, Finland Katılım Ocak 2008
4.9K Takip Edilen1.1K Takipçiler
Simone Romano ⚗️👁️💭🏕️ retweetledi

Simone Romano ⚗️👁️💭🏕️ retweetledi

In a world where everyone can build websites, apps and features easily (thank you Cursor, Lovable, Claude and the likes), it will take more for you and your company to differentiate themselves (which is in my opinion the basis for success).
That's why we're seeing more and more people and companies starting to train, optimize and run their own models (rather than outsource this to third parties).
This is the future we want to enable with Hugging Face: empower millions of people to build AI themselves, not just be API users.
Cool new project in this vein from @mishig25: auto-research built on top of @huggingface so that your agents find and push their intermediary checkpoints, datasets, learn from papers and collaborate on the hub: github.com/mishig25/hf-au…
Let's make all AI builders rather than AI users!

English

Introducing multi-model intelligence in Researcher | Microsoft Community Hub techcommunity.microsoft.com/blog/microsoft…
English
Simone Romano ⚗️👁️💭🏕️ retweetledi

After @Pinterest @Airbnb @NotionHQ @cursor_ai, today it’s @eoghan @intercom publicly sharing that they’re finding it better, cheaper, faster to use and train open models themselves rather than use APIs for many tasks.
And hundreds of other companies are doing the same without sharing.
Ultimately, I believe the majority of AI workflows will be in-house based on open-source (vs API). It took much more time than we anticipated but it’s happening now!

English
Simone Romano ⚗️👁️💭🏕️ retweetledi
Simone Romano ⚗️👁️💭🏕️ retweetledi

The Random Forest of the 2030s? open.substack.com/pub/mindfulmod…
English

I made a MCP and a bot for Polymarket.
Here is the source code: github.com/simoroma/polym…
Feel free to use it.
English
Simone Romano ⚗️👁️💭🏕️ retweetledi

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings
pub.sakana.ai/DroPE/
We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning.
The core insight of this work challenges a fundamental assumption in Transformer architecture. We discovered that explicit positional embeddings like RoPE are critical for training convergence but eventually become the primary bottleneck preventing models from generalizing to longer sequences.
Our solution is radically simple: We treat positional embeddings as a temporary training scaffold rather than a permanent architectural necessity.
Real-world workflows like reviewing massive code diffs or analyzing legal contracts require context windows that break standard pretrained models. While models without positional embeddings (NoPE) generalize better to these unseen lengths, they are notoriously unstable to train from scratch.
Here, we achieve the best of both worlds by using embeddings to ensure stability during pretraining and then dropping them to unlock length extrapolation during inference. Our approach unlocks seamless zero-shot context extension without any expensive long-context training.
We demonstrated this on a range of off-the-shelf open-source LLMs. In our tests, recalibrating any model with DroPE requires less than 1% of the original pretraining budget, yet it significantly outperforms established methods on challenging benchmarks like LongBench and RULER.
We have released the code and the full paper to encourage the community to rethink the role of positional encodings in modern LLMs.
Paper: arxiv.org/abs/2512.12167
Code: github.com/SakanaAI/DroPE
GIF
English
Simone Romano ⚗️👁️💭🏕️ retweetledi

Vending Machines... It's quite absurd how big this got. From an obscure arXiv paper to vending machines with thousands of AI researchers using them worldwide. @AnthropicAI just made a second post, and we celebrate it with some behind-the-scenes of Project Vend history 🧵.

Anthropic@AnthropicAI
You might remember Project Vend: an experiment where we (and our partners at @andonlabs) had Claude run a shop in our San Francisco office. After a rough start, the business is doing better. Mostly.
English
Simone Romano ⚗️👁️💭🏕️ retweetledi
Simone Romano ⚗️👁️💭🏕️ retweetledi

Researchers have mathematically proven that the universe cannot be a computer simulation.
Their paper in the Journal of Holography Applications in Physics shows that reality operates on principles beyond computation.
Using Gödel’s incompleteness theorem, they argue that no algorithmic or computational system can fully describe the universe, because some truths, so called "Gödelian truths" require non algorithmic understanding, a form of reasoning that no computer or simulation can reproduce.
Since all simulations are inherently algorithmic, and the fundamental nature of reality is non algorithmic, the researchers conclude that the universe cannot be, and could never be a simulation.

English

An agent that acts on your behalf clicking on the browser! techcommunity.microsoft.com/blog/microsoft…
English
Simone Romano ⚗️👁️💭🏕️ retweetledi

That blue node moving through the codebase?... That's Claude Code.
Found this video on youtube of the entire claude-code-templates development on GitHub - every commit, community PR, and file change from the git logs mapped out.
Seeing an AI agent navigate the dependency graph and ship features is wild 🤯
English
Simone Romano ⚗️👁️💭🏕️ retweetledi
Simone Romano ⚗️👁️💭🏕️ retweetledi

From the Hierarchical Reasoning Model (HRM) to a new Tiny Recursive Model (TRM).
A few months ago, the HRM made big waves in the AI research community as it showed really good performance on the ARC challenge despite its small 27M size. (That's about 22x smaller than the smallest Qwen3 0.6B model.)
Now, the new "Less is More: Recursive Reasoning with Tiny Networks" paper proposes Tiny Recursive Model (TRM), which a simpler and even smaller model (7M, 4× smaller than HRM) that performs even better on the ARC challenge.
🔹 What does recursion mean here?
TRM refines its answer in two steps:
1. It updates a latent (reasoning) state from the current question and answer.
2. Then it updates the answer based on that latent state.
Training runs for up to 16 refinement steps per batch. Each step does several no-grad loops to improve the answer, followed by one gradient loop that learns from the full reasoning process.
By the way, the question and the answer are grids of discrete tokens, not text. (E.g., 9×9 Sudoku and up to 30×30 ARC and Maze.)
🔹 And how does it differ from HRM?
In short, HRM recurses multiple times through two small neural nets with 4 transformer blocks each (high and low frequency). TRM is much smaller (i.e., 4x) and only a single network with 2 transformer blocks.
TRM backpropagates through the full recursion once per step, whereas HRM only backpropagates through the final few steps. And TRM also removes HRM's extra forward pass for halting and instead uses a simple binary cross-entropy loss to learn when to stop iterating.
🔹 Surprising tidbits
1. The author found that adding layers decreased generalization due to overfitting. And going from 4 to 2 layers improved the model from 79.5% to 87.4% on Sudoku.
2. Replacing the self-attention layer with an MLP layer also improved accuracy (74.7% -> 87.4% on Sudoku); however, note that this only make sense here since we have a fixed-length, small context to work with.
🔹 Bigger picture
My personal caveat: comparing this method (or HRMs) to LLMs feels a bit unfair since HRMs/TRM are specialized models trained for specific tasks (here: ARC, Sudoku, and Maze pathfinding) while LLMs are generalists. It’s like comparing a pocket calculator to a laptop. Both serve a purpose, just in different contexts.
That said, HRMs and the recursive model proposed here are fascinating proof‑of‑concepts that show what’s possible with relatively small and efficient architectures. I'm still curious what the real‑world use case will look like. Maybe they could serve as reasoning or planning modules within a larger tool‑calling system.
In practice, we often start by throwing LLMs at a problem, which makes sense for quick prototyping and establishing a baseline. But I can see a point where someone sits down afterward and trains a focused model like this to solve the same task more efficiently.

English
Simone Romano ⚗️👁️💭🏕️ retweetledi
Simone Romano ⚗️👁️💭🏕️ retweetledi

@Dr_Singularity “7M parameters achieve 45% on ARC-AGI”?
Not exactly.
1000× data augmentation
1000× ensemble voting
3.75× recursive compute
That’s 7M × 480,000 forward passes — not tiny, just time-hungry.
The result isn’t less is more, it’s more is more at test time.
English






