Mansour

302 posts

Mansour

@Mansourdam

Product Manager-AI Native Products, ML

Amsterdam, The Netherlands Katılım Ekim 2010

1.8K Takip Edilen167 Takipçiler

Sabitlenmiş Tweet

Mansour@Mansourdam·19 Eki

Claude Skills is context engineering, not just a feature. The problem: Claude's default window is ~200K tokens . You can't dump all your docs/standards/workflows into prompts without degradation. Skills uses progressive disclosure - 3 tiers

English

Mansour@Mansourdam·2d

@mweinbach @mweinbach Hey Max, are you sure these models were trained from scratch and are not distilled from Gemini?

English

Max Weinbach@mweinbach·2d

Again, 4 of the 5 models Apple is shipping are fully from Apple, trained on latest gen TPU, with some Gemini responses as part of the dataset. 1 models is, I believe, a larger Gemini base model that was adapted to work better for Siri with further training Awful comparison

Marques Brownlee@MKBHD

Apple is insisting that the new Siri is NOT Gemini youtu.be/N36yb-X1LN0?is…

English

1.1K

99.1K

Mansour@Mansourdam·2d

@ValerioCapraro @ValerioCapraro Interesting one. Gemini 3.1 Pro gets it wrong at first, but then with the tip it got it. Interestingly, Fable failed even after the tip.

English

305

Valerio Capraro@ValerioCapraro·3d

Claude Fable 5 doesn’t truly understand. And here is a beautiful proof: The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping. For example, the correct translation of: “Solo 3 parole: non sei solo” is not: “Just 3 words: you are not alone” but: “Just 4 words: you are not alone.” An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence. Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test. LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand. We are still far from AGI.

English

234

109

1.4K

390K

Mansour@Mansourdam·2d

@MParakhin @MParakhin I'm really surprised they haven't done this by now. PR reviews are such a pain right now, and I think a Pro/ Deep Think model would help to solve it.

English

Mikhail Parakhin@MParakhin·3d

Well, Tibo, for a year now I was pleading, arguing for, begging you guys to bring Pro as an advisor model into Codex (really, allow for the LARGE thinking budget)…

Tibo@thsottiaux

I would like to claim my 1% of royalty fees.

English

356

68.6K

Mansour@Mansourdam·4d

@GergelyOrosz @GergelyOrosz could you share some examples of these pushbacks ?!

English

871

Gergely Orosz@GergelyOrosz·4d

I will be honest: 1. Since Opus 4.7 I get pushback in unexpected ways from Anthropic models 2. GPT-5.5 doesn't do this When I know what I want to do, I prefer #2. I don't like a vendor knowing better what I do, how I do it, with zero transparency. I humor Opus but like GPT...

Dylan Patel@dylan522p

Usage share of OpenAI grew vs Anthropic yesterday despite Mythos 5 / Fable 5 launch Multiple power users at SemiAnalysis tried Mythos / Fable Got refusals for nonsensical reasons Got pissed off at Anthropic Gave Codex a legitimate try Now they actually prefer it to 4.8 Opus

English

327

56.2K

Mansour@Mansourdam·5d

@mweinbach Yeah, its a great ux. Our bet though is multimodal logging, voice-first since it's the lowest-friction way to capture detail, with strong memory handling that personalizes the app over time.

English

479

Max Weinbach@mweinbach·5d

@Mansourdam It's built into the camera, just a little easier

English

2.6K

Max Weinbach@mweinbach·5d

I really like how Apple does nutrition information Rather than telling you calories and what not, it just tells you if it’s bad, OK, good, or great.

English

1.6K

122.9K

Mansour@Mansourdam·1 Haz

@AiBattle_ It's not a good benchmark , Sonnet 4.6 scores higher than Opus 4.6 lol, and there is absolutely no way 3.5 Flash outperforms Opus 4.6.

English

8.4K

AiBattle@AiBattle_·1 Haz

MiniMax M2.7 scored 0% on DeepSWE. I’m really curious to see how well M3 will do The model rankings on the DeepSWE benchmark seem to reflect model performance better than other coding benchmarks

English

711

109K

Mansour@Mansourdam·28 May

If the claude Mythos rumors are true,that it uses a recurrent-depth/looped transformer with a shared block iterated for latent reasoning,Sakana DiffusionBlocks is directly relevant: it reframes those iterations as diffusion denoising steps, enabling single-pass training instead of K-step BPTT.

hardmaru@hardmaru

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

English

446

Mansour@Mansourdam·27 May

@badlogicgames Smaller models like Kokoro-82M are already working surprisingly well for TTS Check this out: github.com/cyanxxy/offlin…

English

1.9K

Mario Zechner@badlogicgames·27 May

jesus, qwen3-tts is FANTASTIC. going for full local stt/tts/llm with parakeet, qwen3-tts, and gemma 4 via llama.cpp for my little robot. excite, excite! huggingface.co/Qwen/Qwen3-TTS…

English

1.3K

83.1K

Mansour@Mansourdam·6 May

@_arohan_ @_arohan_ I think most of the chips in Colossus 1 are H100s.

English

239

rohan anil@_arohan_·6 May

Presumably Blackwells, thats a lot.

English

2.7K

Mansour@Mansourdam·24 Mar

@rauchg @rauchg I still remember our conversation in Amsterdam in 2022, right before the ChatGPT moment. You said a lot of legacy SaaS wouldn't survive long term and pointed to Salesforce as a prime example: clunky UI, slow, bloated. Now with AI, we're hopefully getting there.

English

Guillermo Rauch@rauchg·24 Mar

Almost every SaaS app inside Vercel has now been replaced with a generated app or agent interface, deployed on Vercel. Support, sales, marketing, PM, HR, dataviz, even design and video workflows. It’s shocking. The SaaSpocalypse is both understated and overstated. Over because the key systems of record and storage are still there (Salesforce, Snowflake, etc.) Understated because the software we are generating is more beautiful, personalized, and crucially, fits our business problems better. We struggled for years to represent the health of a Vercel customer properly inside Salesforce. Too much data (trillions of consumption data points), the ontology of Vercel was a mismatch to the built-in assumptions, and the resulting UI was bizarre. We generated what we needed instead. When you don’t need a UI, you just ask an agent with natural language. We’ve also been moving off legacy systems with poor, slow, outdated, and inconsistent APIs, as well as just dropping abstraction down to more traditional databases. UI is a function 𝑓 of data (always has been), and that 𝑓 is increasingly becoming the LLM.

English

258

135

848.2K

Mansour retweetledi

Neil Stone@DrNeilStone·27 Oca

This person runs an "anti racism" organisation in the UK I kid you not

English

415

2.7K

19.7K

373K

Mansour retweetledi

Cloudflare Radar@CloudflareRadar·11 Oca

Iran's Internet shutdown is now in its third day. Traffic volumes remain extremely low. Follow the latest status at radar.cloudflare.com/traffic/ir #IranProtests2026 #IranDigitalBlackout

English

191

613

93.5K

Mansour@Mansourdam·28 Kas

@raizamrtn Exactly,There are many frontend issues; streaming, for instance, is buggy and triggers multiple re-renders. It makes me wonder if anyone is actually QAing this or gathering feedback.

English

161

Raiza Martin@raizamrtn·28 Kas

I’m sorry to say this but most of what stops me from switching from the ChatGPT app to Gemini (today) is literally front end and I know googlers will do *anything* but front end eng work!!

English

113

1.8K

191.9K

Mansour@Mansourdam·9 Kas

@fleetingbits The recent paper was from Google Research, not DeepMind.

English

FleetingBits@fleetingbits·8 Kas

When DeepMind publishes research, you can be sure that it is either Nobel Prize worthy or something DeepMind never intends to use in production - no middle ground.

English

55.9K

Mansour@Mansourdam·21 Eki

@karpathy @karpathy, what is your take on diffusion language models?

English

140

Andrej Karpathy@karpathy·21 Eki

I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...

vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English

558

1.6K

13.3K

3.3M

Mansour@Mansourdam·19 Eki

Once again amazing work by @JeremyDanielFox and the team👏

English

Mansour@Mansourdam·19 Eki

Limitations: You're trusting Claude's judgment on which skills to load. Skills can execute code, so audit thoroughly. The pattern - agents with filesystem access managing their own context through selective loading - is where agentic architectures are heading.

English

127

Mansour@Mansourdam·19 Eki

English

Keşfet

@mweinbach @ValerioCapraro @MParakhin @GergelyOrosz @AiBattle_ @badlogicgames @_arohan_ @rauchg