Ofer Mendelevitch

2.7K posts

Ofer Mendelevitch banner
Ofer Mendelevitch

Ofer Mendelevitch

@ofermend

LLMs, RAG and AI Agents

California, USA Katılım Aralık 2009
1.3K Takip Edilen1.4K Takipçiler
OpenAI Developers
OpenAI Developers@OpenAIDevs·
We’re introducing GPT-5.4 mini and nano, our most capable small models yet. GPT-5.4 mini is more than 2x faster than GPT-5 mini. Optimized for coding, computer use, multimodal understanding, and subagents. For lighter-weight tasks, GPT-5.4 nano is our smallest and cheapest version of GPT-5.4. openai.com/index/introduc…
OpenAI Developers tweet media
English
316
627
6.5K
749.5K
OpenAI
OpenAI@OpenAI·
GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…
OpenAI tweet media
English
542
682
6.3K
1.5M
Ofer Mendelevitch retweetledi
Poonam Soni
Poonam Soni@CodeByPoonam·
🚨Claude Code agents working alone was already scary. Now they can coordinate with each other. Thenvoi made it happen
Poonam Soni tweet media
English
12
12
57
12.1K
OpenAI
OpenAI@OpenAI·
GPT-5.4 is our most factual and efficient model: fewer tokens, faster speed. In ChatGPT, GPT-5.4 Thinking has improved deep web research, better context retention when it thinks for longer—and oh—you can now interrupt the model and add instructions or adjust its direction mid-response. Steering is available this week on Android and web. iOS coming soon.
English
139
217
2.9K
455.5K
OpenAI
OpenAI@OpenAI·
GPT-5.4 Thinking and GPT-5.4 Pro are rolling out now in ChatGPT. GPT-5.4 is also now available in the API and Codex. GPT-5.4 brings our advances in reasoning, coding, and agentic workflows into one frontier model.
OpenAI tweet media
English
1.9K
3.3K
23.6K
6.7M
Sam Altman
Sam Altman@sama·
GPT-5.4 is launching, available now in the API and Codex and rolling out over the course of the day in ChatGPT. It's much better at knowledge work and web search, and it has native computer use capabilities. You can steer it mid-response, and it supports 1m tokens of context.
Sam Altman tweet media
English
2K
1.2K
12.9K
1.3M
Ofer Mendelevitch
Ofer Mendelevitch@ofermend·
Had a great time connecting with folks from @togethercompute today at the AI Native Conference. Glad to see so much innovation in research: Flash-attention-4, SSD, Together.compile, etc.
Ofer Mendelevitch tweet mediaOfer Mendelevitch tweet media
English
0
0
2
72
Tanishq Kumar
Tanishq Kumar@tanishqkumar07·
I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.
English
132
454
4K
598.3K
Sundar Pichai
Sundar Pichai@sundarpichai·
Gemini 3.1 Flash-Lite is the fastest and most cost-efficient Gemini 3 series model⚡️ It outperforms 2.5 Flash with a 2.5X faster Time to First Answer Token and a 45% increase in output speed, at a fraction of the cost of larger models!
koray kavukcuoglu@koraykv

Gemini 3.1 Flash-Lite is available now! It takes an unbelievable amount of complex engineering to make AI feel instantaneous, enabling exciting new frontiers for experimentation!

English
211
268
2.1K
165.1K
Jeff Dean
Jeff Dean@JeffDean·
⚡ Excited to announce Gemini 3.1 Flash-Lite! We’ve set a new standard for efficiency and capability to give developers our fastest, most cost-effective Gemini 3 model yet. We engineered this model with thinking levels, allowing it to handle high-volume queries instantly, while scaling up its reasoning for complex edge cases. By the numbers: ⏱️ 2.5X faster time-to-first-token than 2.5 Flash while being significantly higher quality 📉 $0.25 per 1M input tokens 📊 1432 Elo on LMArena & 86.9% on GPQA Diamond Thrilled to see what developers build with this kind of speed and quality at scale. Available now in Google AI Studio and Vertex AI. blog.google/innovation-and…
Jeff Dean tweet media
English
68
122
1.3K
116.4K
Google DeepMind
Google DeepMind@GoogleDeepMind·
Gemini 3.1 Flash-Lite has landed. It’s our most cost-efficient Gemini 3 series model yet, built for intelligence at scale. Here’s what’s new 🧵
English
342
879
9K
1.8M
Ofer Mendelevitch
Ofer Mendelevitch@ofermend·
@StefanoErmon Mercury 2 is now added to @vectara grounded hallucination leaderboard - and scores at 12.3% hallucination rate - almost the same as Opus 4.6, and better (lower) than GPT-5 and Gemini-3-pro. Hoping to see this improve even more in the next iteration. github.com/vectara/halluc…
English
0
0
0
21
Stefano Ermon
Stefano Ermon@StefanoErmon·
Mercury 2 is live 🚀🚀 The world’s first reasoning diffusion LLM, delivering 5x faster performance than leading speed-optimized LLMs. Watching the team turn years of research into a real product never gets old, and I’m incredibly proud of what we’ve built. We’re just getting started on what diffusion can do for language.
English
320
587
4.2K
976.1K
Ofer Mendelevitch
Ofer Mendelevitch@ofermend·
MCP has had a rough first year: vulnerabilities, real-world data exfiltration and the first malicious server in the wild. If you’re running AI agents with MCP in production (or planning to) - you should know what the risks are. vectara.com/blog/mcps-rapi…
English
0
0
1
53
Ofer Mendelevitch
Ofer Mendelevitch@ofermend·
Hey @bcherny it just occurred to me that while @claudeai Code is thinking (after I've just typed in my guidance) - it's the perfect time for some kind of "break reminder" to pop up and remind us all to take 30 seconds and stretch :)
English
0
0
0
36
Qwen
Qwen@Alibaba_Qwen·
🚀 Introducing the Qwen 3.5 Medium Model Series Qwen3.5-Flash · Qwen3.5-35B-A3B · Qwen3.5-122B-A10B · Qwen3.5-27B ✨ More intelligence, less compute. • Qwen3.5-35B-A3B now surpasses Qwen3-235B-A22B-2507 and Qwen3-VL-235B-A22B — a reminder that better architecture, data quality, and RL can move intelligence forward, not just bigger parameter counts. • Qwen3.5-122B-A10B and 27B continue narrowing the gap between medium-sized and frontier models — especially in more complex agent scenarios. • Qwen3.5-Flash is the hosted production version aligned with 35B-A3B, featuring: – 1M context length by default – Official built-in tools 🔗 Hugging Face: huggingface.co/collections/Qw… 🔗 ModelScope: modelscope.cn/collections/Qw… 🔗 Qwen3.5-Flash API: modelstudio.console.alibabacloud.com/ap-southeast-1… Try in Qwen Chat 👇 Flash: chat.qwen.ai/?models=qwen3.… 27B: chat.qwen.ai/?models=qwen3.… 35B-A3B: chat.qwen.ai/?models=qwen3.… 122B-A10B: chat.qwen.ai/?models=qwen3.… Would love to hear what you build with it.
Qwen tweet media
English
436
1.1K
8.1K
4M
Ofer Mendelevitch
Ofer Mendelevitch@ofermend·
@StefanoErmon I’ve been waiting to see when the first diffusion based LLM will be released! Will certainly test this out
English
0
0
0
14