Ubik

11.6K posts

Ubik banner
Ubik

Ubik

@mr_ubik

Senior ML Engineer/Scientist. Tech Optimist ▶️. Meditation noob. Lover of 🐧🐧, caffeine (matcha and v60) and, memes. Mugunone.

Bologna (Italy) Katılım Nisan 2012
1.1K Takip Edilen434 Takipçiler
Ubik retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
Open models year in review What a year! We're back with an updated open model builder tier list, our top models of the year, and our predictions for 2026. First, the winning models: 1. DeepSeek R1 (@deepseek_ai): Transformed the AI world 2. Qwen 3 Family (@AlibabaGroup): The new default open models 3. Kimi K2 Family (@Kimi_Moonshot): Models that convinced the world that DeepSeek wasn't special and China would produce numerous leading models. Runner up models: MiniMax M2 (@minimax_ai), GLM 4.5 (@Zai_org), GPT-OSS (@OpenAI), Gemma 3 (@GoogleAI), Olmo 3 (@allen_ai) Honorable Mentions: Nvidia's (@nvidia) Parakeet speech-to-text model & Nemotron 2 LLM, Moondream 3 VLM (@moondreamai), Granite 4 LLMs (@IBMResearch), and HuggingFace's (@huggingface) SmolLM3. Updated Tier list: Frontier open labs: DeepSeek (@deepseek_ai), Qwen (@AlibabaGroup), and Kimi Moonshot (@Kimi_Moonshot) Close behind: Z.ai (@Zai_org) & MiniMax AI (@minimax_ai) (notably none from the U.S. here and up) Noteworthy (a mix of US & China): StepFun AI (@StepFun_ai), Ant Group's (@AntGroup/ @TheInclusionAI Inclusion AI, Meituan (@Meituan_LongCat), Tencent (@TencentHunyuan), IBM (@IBMResearch), Nvidia (@nvidia), Google (@GoogleAI), & Mistral (@MistralAI) Then a bunch more below that, which we detail. Predictions for 2026: 1. Scaling will continue with open models. 2. No substantive changes in the open model safety narrative. 3. Participation will continue to grow. 4. Ongoing general trends will continue w/ MoEs, hybrid attention, dense for fine-tuning. 5. The open and closed frontier gap will stay roughly the same on any public benchmarks. 6. No Llama-branded open model releases from Meta in 2026. Read the full post on @interconnectsai -- link below.
Nathan Lambert tweet media
English
69
261
1.5K
351.7K
Ubik retweetledi
Vlad Tenev
Vlad Tenev@vladtenev·
I think our definition of mathematics will fundamentally change. Mathematicians used to spend their time solving complex equations, and automation freed them up to do more abstract creative work. But despite all the advances in computers, communications, and AI, math is still largely done in isolation with a chalkboard and a couch. Most collaboration is done in-person at conferences. This is starting to change. Math in the future will look more like writing software. The two will increasingly converge.
raghav@rargulati

whoa. very nice and well done. i’m curious how these tools will push young mathematicians and math. part of the necessary pain in studying math is reflecting painfully on a proof you may never solve but the exercise is good for developing structure and intuition.

English
66
71
1.1K
212.1K
Ubik retweetledi
Jifan Zhang
Jifan Zhang@jifan_zhang·
New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated thousands of scenarios to find out. 🧵
Jifan Zhang tweet mediaJifan Zhang tweet media
English
61
171
1.3K
320.3K
Ubik retweetledi
ℏεsam
ℏεsam@Hesamation·
Fuck ML tutorials. This is a collection of 300 ML system design case studies in real world, from Stripe, Spotify, Netflix, Meta, etc. Perfect for interviews and to learn how it’s done in the battlefield. Wish there was a similar thing for agents!
ℏεsam tweet media
English
28
690
6.1K
436.9K
Ubik retweetledi
Andrej Karpathy
Andrej Karpathy@karpathy·
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...
vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English
559
1.6K
13.3K
3.3M
Ubik retweetledi
Aaron Rupar
Aaron Rupar@atrupar·
There is no world in which it is normal for the president to publicly call upon his attorney general to hurry up and prosecute his political foes. It’s like the Watergate tapes but posted on social media. Let’s get a grip on what’s happening here.
English
623
7K
37K
909.5K
Ubik retweetledi
Richard Hanania
Richard Hanania@RichardHanania·
Erika: Find Jesus. Forgive your enemies. <crowd cheers> Trump, following the widow, giving the keynote: No, I’m overruling Christianity, don’t forgive your enemies and hate them. <crowd cheers> What a perfect encapsulation of the entire MAGA movement.
English
996
10.6K
114.4K
1.9M
Ubik retweetledi
Apollo Research
Apollo Research@apolloaievals·
We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").
Apollo Research tweet mediaApollo Research tweet media
English
3
24
168
29.2K
Ubik retweetledi
Simon Willison
Simon Willison@simonw·
This model is pretty sassy, later in the thinking trace it said: Self-check: Am I being too pedantic? Nah—if someone asks for impossible things, it’s better to gently correct than make fake art that could confuse them.
English
7
3
477
20.5K
Ubik retweetledi
Ege Erdil
Ege Erdil@EgeErdil2·
this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century
Ege Erdil tweet media
English
88
143
2.1K
843.4K
Ubik retweetledi
Alex Turner
Alex Turner@Turn_Trout·
The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention
dave kasten@David_Kasten

@CongressmanRaja @AnthropicAI @jackclarkSF @MarkBeall Dunn (R-FL): Asks about Jack Clark's substack. Also asks about the @AnthropicAI / @redwood_ai paper on Sleeper Agents. @jackclarkSF confirms. If you thought that Anthropic/Redwood's approach of publishing papers lacked policy impact...well, update your beliefs.

English
3
5
45
5.8K
Ubik retweetledi
Nathan Lambert
Nathan Lambert@natolambert·
I bet pretty soon a Chinese research org drops a LLM scaling laws for RL paper. Closed frontier labs have definitely done this and wont share it, academics havent mastered the data + infra tweaks yet.
English
13
42
745
67.6K
Ubik retweetledi
L'Avvocato dell'Atomo/The Atomic Advocate
Pur di non guardare al nucleare, il PD è dispostissimo a finanziare una dittatura nazifascista che sta conducendo una guerra a scopo di genocidio culturale. E ovviamente fanculo la decarbonizzazione. Meglio il riscaldamento globale e Putin che 15 reattori nucleari, vuoi mettere?
Il Foglio@ilfoglio_it

Anche il Pd, come il M5s, "non esclude" di tornare a comprare gas dalla Russia. Nel Libro Verde i dem considerano la “riperesa dei flussi dalla Russia” al posto del Gnl americano - @LucianoCapone e @CarloStagnaro ilfoglio.it/economia/2025/…

Italiano
66
271
1.8K
76.3K
Ubik retweetledi
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
China has a ton of cracked AI labs starved of compute, and GPU-rich megacorps with Meta-tier managerial issues The US has infinite compute and cracked people siloed in underperforming labs so that they don't contribute to the competitor's effort EU has eurocrats frustrating
English
3
3
80
4.7K
Ubik retweetledi
Neel Nanda
Neel Nanda@NeelNanda5·
I've resolved this positively: 2 papers convincingly show sparse autoencoders beating baselines on real tasks: Hypothesis Generation & Auditing LLMs SAEs shine when you don't know what you're looking for, but lack precision. Sometimes the right tool for the job, sometimes not.
Neel Nanda tweet media
Neel Nanda@NeelNanda5

Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…

English
6
17
204
19.7K
Ubik retweetledi
John David Pressman
John David Pressman@jd_pressman·
"The problem with utilitarianism is that utilitarians think utility is the only thing that matters. The problem with consequentialism is that many consequentialists forget that utility is a thing that matters at all." - deepseek/deepseek-v3-base
English
3
3
30
1.4K
Ubik retweetledi
Richard Ngo
Richard Ngo@RichardMCNgo·
In my head I’ve started referring to political quadrants in terms of properties of their preferred coordination networks. Top two are centralized. Bottom two are distributed. Left two are symmetric (aka egalitarian). Right two are asymmetric.
Richard Ngo tweet media
English
220
449
4.8K
359.9K
Ubik retweetledi
Ben Landau-Taylor
Ben Landau-Taylor@benlandautaylor·
Oh so we eradicated a horrible parasite with a massive technopunk operation to engineer, breed, and transport hundreds of millions of sterile screwworms, but now we're getting it back because because someone fucked up the basic logistics
Ben Landau-Taylor tweet media
English
5
30
312
30.8K
Ubik retweetledi
COSSACKGUNDI
COSSACKGUNDI@cossackgundi·
UK nationals setting fires for Wagner, talking to Russian bots, claiming IRA ties and we’re still calling this “just crime” Russia has been waging it's war against the west for years we just haven't caught up to it.
English
22
173
1.6K
29K