Ubik

11.6K posts

Ubik

@mr_ubik

Senior ML Engineer/Scientist. Tech Optimist ▶️. Meditation noob. Lover of 🐧🐧, caffeine (matcha and v60) and, memes. Mugunone.

Bologna (Italy) Katılım Nisan 2012

1.1K Takip Edilen434 Takipçiler

Sabitlenmiş Tweet

Ubik@mr_ubik·28 Haz

Our very first paper is out arxiv.org/abs/1906.11632 - A Survey on GANs for Anomaly Detection #milestone #gans #DeepLearning #paper #tensorflow Big Thanks to my colleagus @manughelfi @paolo_galeone @_iLeW_

GIF

English

Ubik retweetledi

Nathan Lambert@natolambert·14 Ara

Open models year in review What a year! We're back with an updated open model builder tier list, our top models of the year, and our predictions for 2026. First, the winning models: 1. DeepSeek R1 (@deepseek_ai): Transformed the AI world 2. Qwen 3 Family (@AlibabaGroup): The new default open models 3. Kimi K2 Family (@Kimi_Moonshot): Models that convinced the world that DeepSeek wasn't special and China would produce numerous leading models. Runner up models: MiniMax M2 (@minimax_ai), GLM 4.5 (@Zai_org), GPT-OSS (@OpenAI), Gemma 3 (@GoogleAI), Olmo 3 (@allen_ai) Honorable Mentions: Nvidia's (@nvidia) Parakeet speech-to-text model & Nemotron 2 LLM, Moondream 3 VLM (@moondreamai), Granite 4 LLMs (@IBMResearch), and HuggingFace's (@huggingface) SmolLM3. Updated Tier list: Frontier open labs: DeepSeek (@deepseek_ai), Qwen (@AlibabaGroup), and Kimi Moonshot (@Kimi_Moonshot) Close behind: Z.ai (@Zai_org) & MiniMax AI (@minimax_ai) (notably none from the U.S. here and up) Noteworthy (a mix of US & China): StepFun AI (@StepFun_ai), Ant Group's (@AntGroup/ @TheInclusionAI Inclusion AI, Meituan (@Meituan_LongCat), Tencent (@TencentHunyuan), IBM (@IBMResearch), Nvidia (@nvidia), Google (@GoogleAI), & Mistral (@MistralAI) Then a bunch more below that, which we detail. Predictions for 2026: 1. Scaling will continue with open models. 2. No substantive changes in the open model safety narrative. 3. Participation will continue to grow. 4. Ongoing general trends will continue w/ MoEs, hybrid attention, dense for fine-tuning. 5. The open and closed frontier gap will stay roughly the same on any public benchmarks. 6. No Llama-branded open model releases from Meta in 2026. Read the full post on @interconnectsai -- link below.

English

261

1.5K

351.7K

Ubik retweetledi

Vlad Tenev@vladtenev·30 Kas

I think our definition of mathematics will fundamentally change. Mathematicians used to spend their time solving complex equations, and automation freed them up to do more abstract creative work. But despite all the advances in computers, communications, and AI, math is still largely done in isolation with a chalkboard and a couch. Most collaboration is done in-person at conferences. This is starting to change. Math in the future will look more like writing software. The two will increasingly converge.

raghav@rargulati

whoa. very nice and well done. i’m curious how these tools will push young mathematicians and math. part of the necessary pain in studying math is reflecting painfully on a proof you may never solve but the exercise is good for developing structure and intuition.

English

1.1K

212.1K

Ubik retweetledi

Jifan Zhang@jifan_zhang·24 Eki

New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated thousands of scenarios to find out. 🧵

English

171

1.3K

320.3K

Ubik retweetledi

ℏεsam@Hesamation·28 Tem

Fuck ML tutorials. This is a collection of 300 ML system design case studies in real world, from Stripe, Spotify, Netflix, Meta, etc. Perfect for interviews and to learn how it’s done in the battlefield. Wish there was a similar thing for agents!

English

690

6.1K

436.9K

Ubik retweetledi

Andrej Karpathy@karpathy·21 Eki

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English

559

1.6K

13.3K

3.3M

Ubik retweetledi

Aaron Rupar@atrupar·21 Eyl

There is no world in which it is normal for the president to publicly call upon his attorney general to hurry up and prosecute his political foes. It’s like the Watergate tapes but posted on social media. Let’s get a grip on what’s happening here.

English

623

37K

909.5K

Ubik retweetledi

Richard Hanania@RichardHanania·22 Eyl

Erika: Find Jesus. Forgive your enemies. <crowd cheers> Trump, following the widow, giving the keynote: No, I’m overruling Christianity, don’t forgive your enemies and hate them. <crowd cheers> What a perfect encapsulation of the entire MAGA movement.

English

996

10.6K

114.4K

1.9M

Ubik retweetledi

Apollo Research@apolloaievals·7 Ağu

We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").

English

168

29.2K

Ubik retweetledi

Simon Willison@simonw·11 Ağu

This model is pretty sassy, later in the thinking trace it said: Self-check: Am I being too pedantic? Nah—if someone asks for impossible things, it’s better to gently correct than make fake art that could confuse them.

English

477

20.5K

Ubik retweetledi

Ege Erdil@EgeErdil2·7 Ağu

this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century

English

143

2.1K

843.4K

Ubik retweetledi

Alex Turner@Turn_Trout·27 Haz

The "sleeper agent" terminology is hyperbolic and unfortunate IMO. Crying wolf. Should have reserved such an aggressive title for *actually finding dangerous sleeper agents*. But hey, it got a lot of attention

dave kasten@David_Kasten

@CongressmanRaja @AnthropicAI @jackclarkSF @MarkBeall Dunn (R-FL): Asks about Jack Clark's substack. Also asks about the @AnthropicAI / @redwood_ai paper on Sleeper Agents. @jackclarkSF confirms. If you thought that Anthropic/Redwood's approach of publishing papers lacked policy impact...well, update your beliefs.

English

5.8K

Ubik retweetledi

Nathan Lambert@natolambert·27 Tem

I bet pretty soon a Chinese research org drops a LLM scaling laws for RL paper. Closed frontier labs have definitely done this and wont share it, academics havent mastered the data + infra tweaks yet.

English

745

67.6K

Ubik retweetledi

Il Foglio@ilfoglio_it·16 Tem

Anche il Pd, come il M5s, "non esclude" di tornare a comprare gas dalla Russia. Nel Libro Verde i dem considerano la “riperesa dei flussi dalla Russia” al posto del Gnl americano - @LucianoCapone e @CarloStagnaro ilfoglio.it/economia/2025/…

Italiano

112

340

114.7K

Ubik retweetledi

L'Avvocato dell'Atomo/The Atomic Advocate@AvvocatoAtomico·16 Tem

Pur di non guardare al nucleare, il PD è dispostissimo a finanziare una dittatura nazifascista che sta conducendo una guerra a scopo di genocidio culturale. E ovviamente fanculo la decarbonizzazione. Meglio il riscaldamento globale e Putin che 15 reattori nucleari, vuoi mettere?

Il Foglio@ilfoglio_it

Italiano

271

1.8K

76.3K

Ubik retweetledi

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·17 Tem

China has a ton of cracked AI labs starved of compute, and GPU-rich megacorps with Meta-tier managerial issues The US has infinite compute and cracked people siloed in underperforming labs so that they don't contribute to the competitor's effort EU has eurocrats frustrating

English

4.7K

Ubik retweetledi

Neel Nanda@NeelNanda5·17 Tem

I've resolved this positively: 2 papers convincingly show sparse autoencoders beating baselines on real tasks: Hypothesis Generation & Auditing LLMs SAEs shine when you don't know what you're looking for, but lack precision. Sometimes the right tool for the job, sometimes not.

Neel Nanda@NeelNanda5

Manifold Market: Will Sparse Autoencoders be successfully used on a downstream task in the next year and beat baselines? Stephen Grugett asked me for alignment-relevant markets, this was my best idea. I think SAEs are promising, but how far can they go? manifold.markets/NeelNanda/will…

English

204

19.7K

Ubik retweetledi

John David Pressman@jd_pressman·8 Tem

"The problem with utilitarianism is that utilitarians think utility is the only thing that matters. The problem with consequentialism is that many consequentialists forget that utility is a thing that matters at all." - deepseek/deepseek-v3-base

English

1.4K

Ubik retweetledi

Richard Ngo@RichardMCNgo·9 Tem

In my head I’ve started referring to political quadrants in terms of properties of their preferred coordination networks. Top two are centralized. Bottom two are distributed. Left two are symmetric (aka egalitarian). Right two are asymmetric.

English

220

449

4.8K

359.9K

Ubik retweetledi

Ben Landau-Taylor@benlandautaylor·8 Tem

Oh so we eradicated a horrible parasite with a massive technopunk operation to engineer, breed, and transport hundreds of millions of sterile screwworms, but now we're getting it back because because someone fucked up the basic logistics

English

312

30.8K

Ubik retweetledi

COSSACKGUNDI@cossackgundi·8 Tem

UK nationals setting fires for Wagner, talking to Russian bots, claiming IRA ties and we’re still calling this “just crime” Russia has been waging it's war against the west for years we just haven't caught up to it.

English

173

1.6K

29K

Keşfet

@deepseek_ai @AlibabaGroup @Kimi_Moonshot @minimax_ai @Zai_org @OpenAI @GoogleAI @allen_ai