Mansour

302 posts

Mansour

Mansour

@Mansourdam

Product Manager-AI Native Products, ML

Amsterdam, The Netherlands Katılım Ekim 2010
1.8K Takip Edilen167 Takipçiler
Sabitlenmiş Tweet
Mansour
Mansour@Mansourdam·
Claude Skills is context engineering, not just a feature. The problem: Claude's default window is ~200K tokens . You can't dump all your docs/standards/workflows into prompts without degradation. Skills uses progressive disclosure - 3 tiers
English
1
1
6
1K
Mansour
Mansour@Mansourdam·
@mweinbach @mweinbach Hey Max, are you sure these models were trained from scratch and are not distilled from Gemini?
English
2
0
5
3K
Mansour
Mansour@Mansourdam·
@ValerioCapraro @ValerioCapraro Interesting one. Gemini 3.1 Pro gets it wrong at first, but then with the tip it got it. Interestingly, Fable failed even after the tip.
Mansour tweet media
English
0
0
2
305
Valerio Capraro
Valerio Capraro@ValerioCapraro·
Claude Fable 5 doesn’t truly understand. And here is a beautiful proof: The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping. For example, the correct translation of: “Solo 3 parole: non sei solo” is not: “Just 3 words: you are not alone” but: “Just 4 words: you are not alone.” An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence. Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test. LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand. We are still far from AGI.
Valerio Capraro tweet media
English
234
109
1.4K
390K
Mansour
Mansour@Mansourdam·
@MParakhin @MParakhin I'm really surprised they haven't done this by now. PR reviews are such a pain right now, and I think a Pro/ Deep Think model would help to solve it.
English
0
0
1
53
Gergely Orosz
Gergely Orosz@GergelyOrosz·
I will be honest: 1. Since Opus 4.7 I get pushback in unexpected ways from Anthropic models 2. GPT-5.5 doesn't do this When I know what I want to do, I prefer #2. I don't like a vendor knowing better what I do, how I do it, with zero transparency. I humor Opus but like GPT...
Dylan Patel@dylan522p

Usage share of OpenAI grew vs Anthropic yesterday despite Mythos 5 / Fable 5 launch Multiple power users at SemiAnalysis tried Mythos / Fable Got refusals for nonsensical reasons Got pissed off at Anthropic Gave Codex a legitimate try Now they actually prefer it to 4.8 Opus

English
24
13
327
56.2K
Mansour
Mansour@Mansourdam·
@mweinbach Yeah, its a great ux. Our bet though is multimodal logging, voice-first since it's the lowest-friction way to capture detail, with strong memory handling that personalizes the app over time.
English
1
0
0
479
Max Weinbach
Max Weinbach@mweinbach·
I really like how Apple does nutrition information Rather than telling you calories and what not, it just tells you if it’s bad, OK, good, or great.
Max Weinbach tweet media
English
55
33
1.6K
122.9K
Mansour
Mansour@Mansourdam·
@AiBattle_ It's not a good benchmark , Sonnet 4.6 scores higher than Opus 4.6 lol, and there is absolutely no way 3.5 Flash outperforms Opus 4.6.
English
7
0
70
8.4K
AiBattle
AiBattle@AiBattle_·
MiniMax M2.7 scored 0% on DeepSWE. I’m really curious to see how well M3 will do The model rankings on the DeepSWE benchmark seem to reflect model performance better than other coding benchmarks
AiBattle tweet media
English
35
7
711
109K
Mansour
Mansour@Mansourdam·
If the claude Mythos rumors are true,that it uses a recurrent-depth/looped transformer with a shared block iterated for latent reasoning,Sakana DiffusionBlocks is directly relevant: it reframes those iterations as diffusion denoising steps, enabling single-pass training instead of K-step BPTT.
hardmaru@hardmaru

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

English
0
0
1
446
Mario Zechner
Mario Zechner@badlogicgames·
jesus, qwen3-tts is FANTASTIC. going for full local stt/tts/llm with parakeet, qwen3-tts, and gemma 4 via llama.cpp for my little robot. excite, excite! huggingface.co/Qwen/Qwen3-TTS…
English
57
89
1.3K
83.1K
rohan anil
rohan anil@_arohan_·
Presumably Blackwells, thats a lot.
English
1
0
18
2.7K
Mansour
Mansour@Mansourdam·
@rauchg @rauchg I still remember our conversation in Amsterdam in 2022, right before the ChatGPT moment. You said a lot of legacy SaaS wouldn't survive long term and pointed to Salesforce as a prime example: clunky UI, slow, bloated. Now with AI, we're hopefully getting there.
English
0
0
0
60
Guillermo Rauch
Guillermo Rauch@rauchg·
Almost every SaaS app inside Vercel has now been replaced with a generated app or agent interface, deployed on Vercel. Support, sales, marketing, PM, HR, dataviz, even design and video workflows. It’s shocking. The SaaSpocalypse is both understated and overstated. Over because the key systems of record and storage are still there (Salesforce, Snowflake, etc.) Understated because the software we are generating is more beautiful, personalized, and crucially, fits our business problems better. We struggled for years to represent the health of a Vercel customer properly inside Salesforce. Too much data (trillions of consumption data points), the ontology of Vercel was a mismatch to the built-in assumptions, and the resulting UI was bizarre. We generated what we needed instead. When you don’t need a UI, you just ask an agent with natural language. We’ve also been moving off legacy systems with poor, slow, outdated, and inconsistent APIs, as well as just dropping abstraction down to more traditional databases. UI is a function 𝑓 of data (always has been), and that 𝑓 is increasingly becoming the LLM.
English
258
135
2K
848.2K
Mansour retweetledi
Neil Stone
Neil Stone@DrNeilStone·
This person runs an "anti racism" organisation in the UK I kid you not
Neil Stone tweet media
English
415
2.7K
19.7K
373K
Mansour
Mansour@Mansourdam·
@raizamrtn Exactly,There are many frontend issues; streaming, for instance, is buggy and triggers multiple re-renders. It makes me wonder if anyone is actually QAing this or gathering feedback.
English
0
0
0
161
Raiza Martin
Raiza Martin@raizamrtn·
I’m sorry to say this but most of what stops me from switching from the ChatGPT app to Gemini (today) is literally front end and I know googlers will do *anything* but front end eng work!!
English
113
33
1.8K
191.9K
Mansour
Mansour@Mansourdam·
@fleetingbits The recent paper was from Google Research, not DeepMind.
English
0
0
1
86
FleetingBits
FleetingBits@fleetingbits·
When DeepMind publishes research, you can be sure that it is either Nobel Prize worthy or something DeepMind never intends to use in production - no middle ground.
English
14
29
1K
55.9K
Andrej Karpathy
Andrej Karpathy@karpathy·
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in: - more information compression (see paper) => shorter context windows, more efficiency - significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images. - input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful. - delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go. OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa. So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to. Now I have to also fight the urge to side quest an image-input-only version of nanochat...
vLLM@vllm_project

🚀 DeepSeek-OCR — the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚡ (~2500 tokens/s on A100-40G) — powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20× while keeping 97% OCR accuracy at <10×. 📄 Outperforms GOT-OCR2.0 & MinerU2.0 on OmniDocBench using fewer vision tokens. 🤝 The vLLM team is working with DeepSeek to bring official DeepSeek-OCR support into the next vLLM release — making multimodal inference even faster and easier to scale. 🔗 github.com/deepseek-ai/De… #vLLM #DeepSeek #OCR #LLM #VisionAI #DeepLearning

English
558
1.6K
13.3K
3.3M
Mansour
Mansour@Mansourdam·
Limitations: You're trusting Claude's judgment on which skills to load. Skills can execute code, so audit thoroughly. The pattern - agents with filesystem access managing their own context through selective loading - is where agentic architectures are heading.
English
1
0
2
127
Mansour
Mansour@Mansourdam·
Claude Skills is context engineering, not just a feature. The problem: Claude's default window is ~200K tokens . You can't dump all your docs/standards/workflows into prompts without degradation. Skills uses progressive disclosure - 3 tiers
English
1
1
6
1K