Santiago M.

1.5K posts

Santiago M. banner
Santiago M.

Santiago M.

@sanmking

Independent researcher. Focused on Artificial Reasoning | X-AWS

Katılım Haziran 2024
232 Takip Edilen141 Takipçiler
Sabitlenmiş Tweet
Santiago M.
Santiago M.@sanmking·
After 18 months, I’m happy to share my research: Where I empirically demonstrate the limitations of language as a representation. And mathematically formulate their inefficiencies compared to traditional learning algorithms. I was deeply inspired by the work of @ev_fedorenko
Santiago M. tweet media
English
0
1
6
1.2K
Santiago M. retweetledi
Rosmine
Rosmine@rosmine·
The launch was amazing, that you so much everyone ❤️ - multiple companies reached out to request DFT training - successful author said the model was incredible - at least one donation offer that was not a scam Now I'm getting ready to train the open weights model. I've figured out several tricks that are going to make the next model even better Huge shoutout to @brendanh0gan @sanmking @HrishbhDalal for providing feedback on early versions, and to @Algomancer for sponsoring this and other work They are all awesome and you should follow them immediately
Rosmine@rosmine

I fixed why LLMs write so poorly, and I have a demo to prove it Announcing Distribution Fine Tuning (DFT): A post training step that fixes LLM writing Model outputs fooled pangram on 100% of test cases

English
16
8
336
21.9K
Santiago M.
Santiago M.@sanmking·
@DimitrisPapail Not necessarily, because somehow there’s a chance that Shakespeare was actually a monkey 🙈
English
0
0
0
480
Santiago M.
Santiago M.@sanmking·
@NCouriel Cambiaría GPT-3 por GPT-2. No concuerdo con LSTM. Me gusta la inclusión de Word2Vec. Buena lista!
Español
0
0
1
24
Naomi
Naomi@NCouriel·
The ultimate list de (en mi opinión) los papers más importantes de inteligencia artificial en la historia: 1.Perceptron, Rosenblatt (1958) 2.Backpropagation, Rumelhart, Hinton & Williams (1986) 3.LSTM, Hochreiter & Schmidhuber (1997) 4.LeNet-5, LeCun (1998) 5.AlexNet, Krizhevsky, Sutskever & Hinton (2012) 6.Word2Vec, Mikolov (2013) 7.GANs, Goodfellow (2014) 8.Seq2Seq, Sutskever (2014) 9.Adam, Kingma & Ba (2014) 10.VGG, Simonyan & Zisserman (2014) 11.Batch Norm, Ioffe & Szegedy (2015) 12.ResNet, He et al. (2015) 13.AlphaGo, Silver et al., DeepMind (2016) 14.Attention is All You Need, Vaswani (2017) 👑 15.BERT, Devlin (2018) 16.GPT-3, Brown et al., OpenAI (2020) 17.DDPM (Diffusion), Ho et al. (2020) 18.DALL-E, Ramesh et al. (2021) 19.CLIP, Radford et al. (2021) 20.Chinchilla, Hoffmann et al., DeepMind (2022) 21.InstructGPT / RLHF, Ouyang et al. (2022) 22.LLaMA, Touvron et al., Meta (2023) 23.DeepSeek-R1, DeepSeek (2025), primer LLM publicado en Nature, razonamiento puro con RL Cualquiera de estos te cambia la forma de pensar el campo. Si querés entender AI desde la academia y no solo desde el hype, guardá este post 📚
Español
5
22
173
6K
Negar Arabzadeh
Negar Arabzadeh@NegarEmpr·
1/ Thrilled to introduce T³: a corpus for RAG over reasoning tasks, built from thinking traces. We show that surprisingly RAG can improve reasoning— with the right corpus. Rag with Transformed Thinking Traces T³ gain by up to 43.9% on AIME 2025-2026. 🔗 arxiv.org/abs/2605.03344 🧵
Negar Arabzadeh tweet media
English
11
31
212
472.1K
Santiago M.
Santiago M.@sanmking·
@jasonlk What do you look on a great hire? Could you share a resource, please?
English
0
0
0
32
Jason ✨👾SaaStr.Ai✨ Lemkin
Almost every important mistake I've made in the past 15+ years has been due to lowering the hire bar Directly or indirectly, it leads to chaos, slowdown, doubt, and confusing inputs
English
14
3
70
9.5K
Santiago M.
Santiago M.@sanmking·
To learn more you can visit the GitHub Repo. The tentative topic for a research paper would be: What Makes a Good Description? Measuring Geometric Faithfulness in Hierarchical Semantic Representations github.com/sanmquin/AI/tr… Your thoughts are encouraged!
English
0
0
2
27
Santiago M.
Santiago M.@sanmking·
It all started when trying to asses the descriptions of PCA dimensions. The chart displays two of the most significant dimensions that correlate with @20vcFund performance. Optimal engagement clearly sits between at provocative commentary about AI impact.
Santiago M. tweet media
English
1
0
3
55
Santiago M.
Santiago M.@sanmking·
Can we evaluate the accuracy of a description? Many know about “reversion to the mean”: The undesirable property of LLMs that make them sound generic. Perhaps geometry can provide a solution 🧵
English
1
0
6
14.2K
Santiago M.
Santiago M.@sanmking·
And again I would push back, the point is not that the math is perfect, but to have a better layer of communication for the empirical phenomenon observed. My only point is that, hopefully, just as in software, mathematical proofs would stop being an obtrusive barrier, and instead, gradually, a more accessible formal language. I do hope that at least in AI, more discoveries will start to emerge from theory, and not only from data. Math is powerful enough to simplify complex phenomena.
English
1
0
0
44
Alexander Terenin
Alexander Terenin@avt_im·
@sanmking I agree and believe is that it is _completely_ mechanized - but I think this alone is not enough to hope for an automatic answer to many theoretical questions. The problem is that reality itself can be far too complex. Many of its mysteries will be far beyond both humans and AI.
English
1
0
1
44
Alexander Terenin
Alexander Terenin@avt_im·
It is one thing to be able to express your idea in mathematical language, and another thing completely to be able to prove it correct. It is easy to write down backprop for training a neural network. To this day, no-one I know of can prove it generically achieves low test loss.
English
2
4
34
5.7K
Santiago M.
Santiago M.@sanmking·
@avt_im There’s always exceptions, but my guess is that we will be surprised by how “mechanized” reality truly is! An example involving planning and creativity:
Santiago M.@sanmking

@decisionneurop The fact that creativity reuses many of mechanism than planning, gives a potential verifiable representation of creativity!

English
1
0
0
59
Alexander Terenin
Alexander Terenin@avt_im·
@sanmking I agree in principle. But there are certain classes of behavior which are easy to see empirically, but extraordinarily hard to pin down theoretically. Even ASI may not be enough for that - the exact degree of S will likely matter.
English
1
0
2
131
Santiago M.
Santiago M.@sanmking·
@avt_im That is changing fast, and I expect one of the first consequences of LLM expertise in math verification. The ability to formalize intuitions, and communicate them in a standard language.
English
1
0
0
159
Alexander Terenin
Alexander Terenin@avt_im·
Mathematical proof is an extraordinarily high standard by which to judge success. So high, that almost no original ideas in machine learning are developed to that standard first. The only exception I can think of off the top of my head is Greg Yang's muP work.
English
1
1
12
8.2K
Roli Bosch
Roli Bosch@rolibosch·
Wouldn’t an intelligent system get smarter with more context instead of incoherent
English
2
0
3
64
Santiago M.
Santiago M.@sanmking·
@barrowjoseph That’s what makes it valuable. Art becomes manufacturing! Good luck.
English
0
0
0
12
Joe Barrow
Joe Barrow@barrowjoseph·
@sanmking This survey is organic, free-range human effort. Just a labor of love, tbh.
English
1
0
0
22
Joe Barrow
Joe Barrow@barrowjoseph·
Working on a survey of VLM-based OCR models, pretty notable uptick in releases in 2025, largely thanks to Qwen.
Joe Barrow tweet media
English
2
0
3
481
deep Manifold
deep Manifold@BetaTomorrow·
@sanmking @che_shr_cat Thanks.. I have no idea how many article will be for this series, probably will be over 10, I try to write every week :) meanwhile, please check out deepmanifo.ai. look forward to good discussion.
English
1
0
2
26
Grigory Sapunov
Grigory Sapunov@che_shr_cat·
Another beautiful work on geometry! 1/ Stop steering LLMs in straight lines. The Linear Representation Hypothesis is a useful lie, but it breaks down fast. Pushing activations across flat Euclidean space causes "teleportation" and diversity collapse. The real geometry is curved. 🧵
Grigory Sapunov tweet media
English
9
36
219
14.4K
Santiago M.
Santiago M.@sanmking·
@Underfox3 Is that the most devastating blow to NVIDIA’s scientific supremacy you’ve seen? It could the beginning of a confirmed superior architecture.
Santiago M. tweet media
English
1
0
0
28
Underfox
Underfox@Underfox3·
In this paper is proposed CStencil, an iterative 2D stencil solver based on the Jacobi Method on the Cerebras WSE-3, supporting both Star and Box stencil patterns of various orders. arxiv.org/pdf/2605.07954
Underfox tweet mediaUnderfox tweet mediaUnderfox tweet mediaUnderfox tweet media
English
2
11
60
3.1K