Assoc. Prof. Dr. M. Umut Demirezen

44.9K posts

Assoc. Prof. Dr. M. Umut Demirezen banner
Assoc. Prof. Dr. M. Umut Demirezen

Assoc. Prof. Dr. M. Umut Demirezen

@udmrzn

Associate Professor of Computer Science & Engineering, Artificial Intelligence Researcher, #artificialintelligence, #deeplearning, #machinelearning, #genai

Turkiye Katılım Şubat 2011
7.4K Takip Edilen2.3K Takipçiler
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Chubby♨️
Chubby♨️@kimmonismus·
MYTHOS BENCHMARKS, OFFICIAL. HOLY MOLY Anthropic cooked!!
Chubby♨️ tweet media
English
106
173
2.4K
321.4K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Alex Albert
Alex Albert@alexalbert__·
We released Claude Opus 4.6 just two months ago. Today we're sharing some info on our new model, Claude Mythos Preview.
Alex Albert tweet mediaAlex Albert tweet media
English
772
1.1K
15.9K
2M
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Yuchen Jin
Yuchen Jin@Yuchenj_UW·
Anthropic is truly unstoppable. Mythos is crushing Claude Opus 4.6 across every serious agentic coding benchmark. It has found vulnerabilities in the Linux kernel, a 27-year-old vulnerability in OpenBSD, and a 16-year-old vulnerability in FFmpeg. No wonder folks at big labs keep telling me AGI is already here.
Yuchen Jin tweet media
English
115
84
1.5K
101.6K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Lisan al Gaib
Lisan al Gaib@scaling01·
Mythos speeds up AI research by up to 400 times A 300X speedup over the baseline requires 40 hours of work by a human expert It also clears the >8h threshold of human equivalent work time on ALL tasks!
Lisan al Gaib tweet media
English
39
136
1.4K
63.4K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Ben Sigman
Ben Sigman@bensig·
Excited to announce a new open-source, free-to-use memory tool I have been developing with my good friend @MillaJovovich. The project is called MemPalace and it is an agentic memory tool that scored 100% on LongMemEval - the industry standard benchmark for memory… this is higher on than any other published results - free or paid - and it is available now on GitHub. You can check out Milla’s video about it on her Instagram. I’ll also put some links in the comments below - please try it out, critique it, fork it, contribute to it - and join our discord.
Ben Sigman tweet media
English
140
332
3K
1.8M
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Zhihu Frontier
Zhihu Frontier@ZhihuFrontier·
🚀 DeepSeek is rolling out a limited V4 gray release. A new mode switcher now appears in the chat UI with three options: Fast Mode (default), Expert Mode and Vision Mode。 1️⃣Fast Mode: • File uploads → text-only extraction • Likely a lightweight, low-latency model optimized for speed 2️⃣ Expert Mode: • No file uploads supported • Restriction likely for compute/cost control, since heavy models + file tokens are expensive • Likely routes to a larger, more powerful reasoning model 3️⃣ Vision Mode: • Enables multimodal inputs • Builds on earlier OCR tests • May signal DeepSeek’s multimodal capability is moving toward end users #DeepSeek #AI #LLM #Multimodal #AIGC #Tech
English
15
27
330
74.1K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Pan Lu
Pan Lu@lupantech·
Excited to share that OctoTools has been accepted to ACL 2026. 🐙 OctoTools is our training-free, extensible framework for tool-using agents on complex reasoning tasks. Grateful to the broader community for the support. Our GitHub repo has now reached 1.4K stars. 📣 Huge thanks to our amazing team: @chenbowen118, @ShengLiu_, @connect_thapa, and Joseph Boen. Special thanks to @james_y_zou. Code: github.com/octotools/octo… Project: octotools.github.io See you in San Diego!🏖️🌴 @aclmeeting #OctoTools #ACL2026
Pan Lu@lupantech

🐙 Introducing OctoTools: an agentic framework with extensible tools for complex reasoning! 🚀 🧵 🔗 Explore now: octotools.github.io OctoTools tackles challenges in complex reasoning—including visual understanding, domain knowledge retrieval, numerical reasoning, and multistep problem-solving. It introduces: 🔹 Standardized tool cards to encapsulate tool functionality 🔹 A planner for structured high-level & low-level planning 🔹 An executor to carry out tool usage Featured Highlights 💡 ✅ Standardized tool cards for seamless integration of new tools-no framework changes needed (🔎 examples: #tool-cards" target="_blank" rel="nofollow noopener">octotools.github.io/#tool-cards) ✅ Planner + Executor for structured high-level & low-level decision-making ✅ Diverse tools: visual perception, math, web search, specialized tools & more ✅ Long CoT reasoning with test-time optimization: planning, tool use, verification, re-evaluation & beyond (🔎 examples: #visualization" target="_blank" rel="nofollow noopener">octotools.github.io/#visualization) ✅ Training-free & LLM-friendly—easily extend with the latest models ✅ Task-specific toolset optimization: select an optimized subset of tools for better performance 📊 Performance: OctoTools achieves generalizable gains across 16 tasks, outperforming: 📈 GPT-4o (+9.3%) 📈 AutoGen (+10.6%) 📈 GPT-4o Functions (+7.5%) 📈 LangChain (+7.3%) 🤗 Try the live demo (supported by @huggingface @_akhaliq): huggingface.co/spaces/octotoo… 🐙 OctoTools in action on diverse real-world examples: ✅ How many r letters are in the word strawberry? ✅ What's up with the upcoming Apple Launch? Any rumors? (credit: @karpathy) ✅ Which is bigger, 9.11 or 9.9? ✅ Solve gane of 24 with [1,1,6,9] ✅ Research trends in tool agents with LLMs for scientific discovery from ArXiv, PubMed, and Nature ✅ How many baseballs are there? (visual perception, GPT-4o ❌) ✅ What is the organ on the left side of this image? (radiology, GPT-4o ❌) ✅ What are the cell types in this image? (pathology, GPT-4o ❌) ... and more! Dive deep into OctoTools: 📄 Read our 89-page paper: arxiv.org/abs/2502.11271 💻 Explore the codebase: github.com/octotools/octo… Huge thanks to our amazing team: @chenbowen118, @ShengLiu_, @connect_thapa, Joseph Boen! Special thanks to @james_y_zou, @StanfordHAI, @ChanZuckerberg for the support! 🙌 #Agent #LLMs #ToolUse #Reasoning #OctoTools

English
0
9
79
6.9K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Yukang Chen
Yukang Chen@yukangchen_·
We’re thrilled to open-source TriAttention! 🚀 🦞 Deploy OpenClaw (32B LLM) on a single 24GB RTX 4090 locally 💻Full code open-source & vLLM-ready for one-click deployment ⚡️ 2.5× faster inference speed & 10.7× less KV cache memory usage TriAttention is a novel KV cache compression method built on rigorous trigonometric analysis in the Pre‑RoPE space for efficient LLM long reasoning. Github Repo: github.com/WeianMao/triat… Paper Link: huggingface.co/papers/2604.04… Homepage: weianmao.github.io/tri-attention-…
English
21
109
844
86.6K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Yacine Mahdid
Yacine Mahdid@yacinelearning·
for those interested in distributed reinforcement learning I just finished a ~1h tutorial on the echo2 framework by @Gradient_HQ we check: - how to do async RL - infra split between rollout workers and centralized learner - interview with gradient cofounder eric yang himself!
Yacine Mahdid tweet media
English
11
39
303
9.5K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
OpenClaw🦞
OpenClaw🦞@openclaw·
OpenClaw 2026.4.5 🦞 🎬 Built-in video + music generation 🧠 /dreaming is now real 🔀 Structured task progress ⚡ Better prompt-cache reuse 🌍 Control UI + Docs now speak 12 more languages Anthropic cut us off. GPT-5.4 got better. We moved on. github.com/openclaw/openc…
English
444
878
8.6K
1.8M
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
NVIDIA Robotics
NVIDIA Robotics@NVIDIARobotics·
🌱 AI-powered farming, without chemicals. @Aigenio’s solar-powered robots use vision AI to identify and remove weeds at the plant level—helping farmers reduce herbicides and adopt more sustainable practices. Powered by simulation, real-world data and edge AI, this is what physical AI looks like in agriculture. 📖 nvda.ws/4vfwRUc
English
63
420
1.9K
106.4K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
levi
levi@levidiamode·
Day 93/365 of GPU Programming Studying parallelism today and stumbled upon this incredible blog post/book The Ultra-Scale Playbook: Training LLMs on GPU Clusters by Hugging Face that dives deep into data parallelism, expert parallelism, tensor parallelism, pipeline parallelism and context parallelism. I've read a bit about each of these methodologies before but this is the best resource I've found that really pieces them all together into a unified coherent picture. Kinda like its name implies, the team goes into actual empirical examples based on the 4000 scaling experiments (across up to 512 GPUs!) they conducted. E.g. how does tensor parallelism reduce activation memory for matmuls but still require gathering full activations for LayerNorm? When does pipeline parallelism's bubble overhead outweigh its memory savings? When and why would you combine TP/PP/DP on a specific cluster topology? What's the real memory breakdown between params, gradients, optimizer states and activations and which parallelism strategy targets which? et cetera Also loved all the beautiful and sometimes interactive diagrams that reminded me of distill.pub (which makes sense given they used distill's template to create the post). I wish more blog posts in ML would use a similar approach to help visual learners understand the content at an intuitive level. Especially now that rich visualizations/animations are so easy to spin up with LLMs. Really wonderful work by @Nouamanetazi @FerdinandMom @xariusrke @mekkcyber @lvwerra @Thom_Wolf. In times when things are going more and more closed source in, this is such a good example of what great open source AI education and research can look like.
levi tweet medialevi tweet medialevi tweet medialevi tweet media
levi@levidiamode

Day 92/365 of GPU Programming Taking a closer look at disaggregated LLM inference today, which I've been wanting to survey more after listening to the Dean <> Daly discussion at GTC. The best resource I found on the topic was this great talk by @Junda_Chen_ on the past, present and future of prefill decode disaggregation. In the lecture, Junda goes through Nvidia's dynamo, the intrinsic tradeoff spectrum between throughput & latency, TTFT, TPOT, the "goodput" metric, distinct characteristics between prefill vs decode, chunking P&D, the problem of interference, pipeline parallelism, resource & parallelism coupling, disaggregation and DistServe.

English
4
58
481
30.4K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Pan Lu
Pan Lu@lupantech·
🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐agentflow.stanford.edu 📄huggingface.co/papers/2510.05… AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇 🧩A team of four specialized agents coordinates via shared memory: Planner: plan reasoning & tool calls 🧭 Executor: invoke tools & actions 🛠 Verifier: check memory status ✅ Generator: produce final results ✍️ 💡The Magic: 🌀💫 AgentFlow directly optimizes its Planner agent live, inside the system, using our new method, Flow-GRPO (Flow-based Group Refined Policy Optimization). This is "in-the-flow" reinforcement learning. 📊The Results: AgentFlow (7B backbone) outperforms top baselines on 10 benchmarks, with average gains of: +14.9% on search 🔍 +14.0% on agentic 🤖 +14.5% on math ➗ +4.1% on science 🔬 🏆It even surpasses larger-scale models like Llama-3.1-405B and GPT-4o (~200B). Try it yourself! 🛠️Code: github.com/lupantech/Agen… 🚀Demo: huggingface.co/spaces/AgentFl… 🤖Model: huggingface.co/AgentFlow/mode… 📊Visual: #visualization" target="_blank" rel="nofollow noopener">agentflow.stanford.edu/#visualization 💬Join our Slack: join.slack.com/t/agentflow-co… #agentic #llms #RL #tooluse
Pan Lu tweet mediaPan Lu tweet mediaPan Lu tweet media
English
33
258
1.1K
114.6K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
James Zou
James Zou@james_y_zou·
Training multi-agent teams is hard. #AgentFlow comes to the rescue. We introduce Flow-GRPO, an efficient method to train multi-agent teams. Improves planning and tool use. Selected as an #ICLR2026 Oral (top 1%)🚀
Pan Lu@lupantech

🔥Introducing #AgentFlow, a new trainable agentic system where a team of agents learns to plan and use tools in the flow of a task. 🌐agentflow.stanford.edu 📄huggingface.co/papers/2510.05… AgentFlow unlocks full potential of LLMs w/ tool-use. (And yes, our 3/7B model beats GPT-4o)👇 🧩A team of four specialized agents coordinates via shared memory: Planner: plan reasoning & tool calls 🧭 Executor: invoke tools & actions 🛠 Verifier: check memory status ✅ Generator: produce final results ✍️ 💡The Magic: 🌀💫 AgentFlow directly optimizes its Planner agent live, inside the system, using our new method, Flow-GRPO (Flow-based Group Refined Policy Optimization). This is "in-the-flow" reinforcement learning. 📊The Results: AgentFlow (7B backbone) outperforms top baselines on 10 benchmarks, with average gains of: +14.9% on search 🔍 +14.0% on agentic 🤖 +14.5% on math ➗ +4.1% on science 🔬 🏆It even surpasses larger-scale models like Llama-3.1-405B and GPT-4o (~200B). Try it yourself! 🛠️Code: github.com/lupantech/Agen… 🚀Demo: huggingface.co/spaces/AgentFl… 🤖Model: huggingface.co/AgentFlow/mode… 📊Visual: #visualization" target="_blank" rel="nofollow noopener">agentflow.stanford.edu/#visualization 💬Join our Slack: join.slack.com/t/agentflow-co… #agentic #llms #RL #tooluse

English
2
35
187
24.4K
Assoc. Prof. Dr. M. Umut Demirezen retweetledi
Mathematica
Mathematica@mathemetica·
Diffusion (stochastic SDE sampler): erratic Brownian trajectories zigzagging through noise. Flow Matching (deterministic ODE integrator): clean, straight-line paths to the data modes. Same start, radically different dynamics.
English
11
184
1.7K
225.3K