Marktechpost AI

13.2K posts

Marktechpost AI banner
Marktechpost AI

Marktechpost AI

@Marktechpost

🐝 AI Dev News Platform (1 million+monthly traffic) | 150k+ AI subreddit | Contact: [email protected]

What is trending in AI? Katılım Nisan 2016
1.1K Takip Edilen11.2K Takipçiler
Marktechpost AI
Marktechpost AI@Marktechpost·
Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving OSCAR is a 2-bit KV cache quantization system for long-context LLM serving. Most INT2 methods collapse to zero accuracy. This one doesn't. Here's what's actually interesting: 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝘄𝗶𝘁𝗵 𝗲𝘅𝗶𝘀𝘁𝗶𝗻𝗴 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵𝗲𝘀 Generic Hadamard rotations spread outlier energy across channels. But they're data-oblivious. They don't know which directions attention actually reads. At INT2, that distinction collapses models completely. 𝗪𝗵𝗮𝘁 𝗢𝗦𝗖𝗔𝗥 𝗱𝗼𝗲𝘀 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁𝗹𝘆 Two separate rotations, both derived from attention statistics: → Keys: rotated using query covariance Q⊤Q → Values: rotated using score-weighted value covariance V⊤S⊤SV Quantization noise gets pushed into directions attention is least sensitive to. 𝗔𝗰𝗰𝘂𝗿𝗮𝗰𝘆 𝗮𝘁 𝟮.𝟮𝟴 𝗯𝗶𝘁𝘀 𝗽𝗲𝗿 𝗞𝗩 𝗲𝗹𝗲𝗺𝗲𝗻𝘁 → Qwen3-4B-Thinking: −3.78 pts vs BF16 (naive INT2 = 0.00) → Qwen3-8B: −1.42 pts vs BF16 → Qwen3-32B: −0.02 pts vs BF16 → GLM-4.7-FP8 (358B): +0.27 pts vs BF16 𝗦𝘆𝘀𝘁𝗲𝗺-𝗹𝗲𝘃𝗲𝗹 𝗻𝘂𝗺𝗯𝗲𝗿𝘀 → ~8× KV memory reduction vs BF16 → 3.08× decode speedup at 100K context, batch size 1 → 7.83× job-level throughput at batch size 32 on GLM-4.7-FP8 → Scales to 256 concurrent requests on a single H100 (80GB) 𝗥𝗼𝘁𝗮𝘁𝗶𝗼𝗻𝗭𝗼𝗼 Pre-computed rotation matrices for Qwen3-4B/8B/32B, GLM-4.7-FP8, and MiniMax-M2.7 are available on ModelScope. No task-specific recalibration needed. Already integrated into SGLang. Full analysis: marktechpost.com/2026/05/25/tog… Paper: arxiv.org/pdf/2605.17757… Repo: github.com/FutureMLS-Lab/… Modelscope page: modelscope.cn/models/togethe… @togethercompute
Marktechpost AI tweet media
English
1
3
26
335
Marktechpost AI
Marktechpost AI@Marktechpost·
NVIDIA just dropped Gated DeltaNet-2. Here's what's actually interesting about it. Linear attention squeezes the unbounded KV cache into a fixed-size recurrent state. The hard part isn't what to forget. It's how to edit that compressed memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two different jobs at once: erasing old content on the key side, writing new content on the value side. Those two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. 1. Channel-wise erase gate b_t→ Picks which key-side coordinates of the decayed state are read and removed 2. Channel-wise write gate w_t→ Picks which value-side coordinates of the new content are committed 3. Strict generalization→ Recovers KDA exactly when both gates collapse to one scalar → Recovers Gated DeltaNet when the decay collapses too 4. Still trains fast→ Chunkwise WY algorithm with channel-wise decay absorbed into asymmetric erase factors → Gate-aware backward fused in Triton Trained at 1.3B parameters on 100B FineWeb-Edu tokens, matched in recurrent state size against Mamba-2, Gated DeltaNet, KDA, and Mamba-3: → Best language modeling + commonsense average in both recurrent and hybrid settings → S-NIAH-3 at 2K (recurrent): KDA 63.2 → GDN-2 89.8 → MK-NIAH-1 at 4K (recurrent): KDA 28.0 → GDN-2 37.8 Full analysis: marktechpost.com/2026/05/24/nvi… Paper: github.com/NVlabs/GatedDe… Repo: github.com/NVlabs/GatedDe… @nvidia @NVIDIAAI
Marktechpost AI tweet media
English
3
11
34
167.8K
Marktechpost AI
Marktechpost AI@Marktechpost·
Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5% Most web agents today predict one browser action at a time: click, type, scroll, repeat. Webwright takes a different approach. It gives the model a terminal and lets it write Playwright code to control the browser. Here's what's actually interesting: 1. The architecture is unusually small ~1,000 lines of code. Three modules. No multi-agent orchestration. One agent loop. Most web agent frameworks bury the agent logic under layers of abstraction. Webwright doesn't. 2. The benchmark results are strong: → 86.7% on Online-Mind2Web (300 tasks, 136 live sites) — highest among open-sourced harnesses in the AutoEval category → 60.1% on Odysseys (long-horizon tasks) — up from 33.5% with base GPT-5.4 → That's a 26.6-point improvement using the same model, just a different interaction paradigm 3. Browsing history becomes code Every completed task produces a reusable CLI script. Instead of rediscovering a workflow each time, you build a library. The same scripts run in Claude Code, Codex, and OpenClaw. 4. Small models can compete with tool augmentation Qwen3.5-9B hits 66.2% on the hard split of Online-Mind2Web when given pre-built tool scripts. That's a practical finding for teams working with lower-cost inference. 5. Cost matters → GPT-5.4: $2.37 avg per task → Claude Opus 4.7: $6.09 avg per task Claude uses fewer steps (21.9 vs 26.3 mean) but the pricing difference flips the cost equation. Full analysis: marktechpost.com/2026/05/24/mic… Repo: github.com/microsoft/Webw… Technical details: microsoft.com/en-us/research… video credit: microsoft @Microsoft @MSFTResearch
English
4
5
24
62.1K
Marktechpost AI
Marktechpost AI@Marktechpost·
Marktechpost AI@Marktechpost

NVIDIA just dropped Gated DeltaNet-2. Here's what's actually interesting about it. Linear attention squeezes the unbounded KV cache into a fixed-size recurrent state. The hard part isn't what to forget. It's how to edit that compressed memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two different jobs at once: erasing old content on the key side, writing new content on the value side. Those two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. 1. Channel-wise erase gate b_t→ Picks which key-side coordinates of the decayed state are read and removed 2. Channel-wise write gate w_t→ Picks which value-side coordinates of the new content are committed 3. Strict generalization→ Recovers KDA exactly when both gates collapse to one scalar → Recovers Gated DeltaNet when the decay collapses too 4. Still trains fast→ Chunkwise WY algorithm with channel-wise decay absorbed into asymmetric erase factors → Gate-aware backward fused in Triton Trained at 1.3B parameters on 100B FineWeb-Edu tokens, matched in recurrent state size against Mamba-2, Gated DeltaNet, KDA, and Mamba-3: → Best language modeling + commonsense average in both recurrent and hybrid settings → S-NIAH-3 at 2K (recurrent): KDA 63.2 → GDN-2 89.8 → MK-NIAH-1 at 4K (recurrent): KDA 28.0 → GDN-2 37.8 Full analysis: marktechpost.com/2026/05/24/nvi… Paper: github.com/NVlabs/GatedDe… Repo: github.com/NVlabs/GatedDe… @nvidia @NVIDIAAI

QME
0
0
1
55
Ali Hatamizadeh
Ali Hatamizadeh@ahatamiz1·
Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆 💡 Here's the idea behind it: Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. ✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove ✍️ a channel-wise write gate w_t picks which value-side coordinates to commit 🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too ⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton 📊 Results: We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3. Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38 Joint work with @YejinChoinka and @jankautz. 📄 Paper: shorturl.at/AAlVb 💻 Code: github.com/NVlabs/GatedDe… #LinearAttention #StateSpaceModels #Mamba #LLM
Ali Hatamizadeh tweet media
English
21
99
647
182.6K
Marktechpost AI
Marktechpost AI@Marktechpost·
Perplexity just open-sourced an internal security tool they've been running in production. It's called 'Bumblebee'. Here's what's actually interesting: 1. It solves a specific blind spot SBOMs cover build artifacts. EDR covers running processes. Neither tells you what's installed on a developer's laptop right now. Bumblebee does exactly that — and nothing more. 2. The read-only design is the key decision npm packages can carry postinstall scripts that execute automatically on install. Most recent supply-chain worms spread that way. A scanner that invokes npm to check exposure has already triggered the attack. Bumblebee reads metadata directly — lockfiles, manifests, extension manifests — and never runs any code. 3. Four surfaces in one scan → Language package managers: npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer → AI agent configs: MCP JSON host files including claude_desktop_config.json and cline_mcp_settings.json → Editor extensions: VS Code, Cursor, Windsurf, VSCodium → Browser extensions: Chrome, Edge, Brave, Arc, Comet, Firefox 4. The internal workflow is worth noting Perplexity Computer drafts a catalog entry when a threat signal lands → human reviews and merges the PR → Bumblebee runs on endpoints → findings go to the security team. Human in the loop before anything hits machines. 5. Technical details → Written in Go 1.25+, zero non-stdlib dependencies → Single static binary, three scan profiles: baseline, project, deep → Outputs NDJSON records with confidence levels (high / medium / low) → Apache 2.0, current release v0.1.1 Full analysis: marktechpost.com/2026/05/23/per… Repo: github.com/perplexityai/b… Technical details: perplexity.ai/hub/blog/perpl…. @perplexity_ai
English
1
21
46
393.5K
Marktechpost AI
Marktechpost AI@Marktechpost·
How CopilotKit Is Redefining the Agentic AI Stack in 2026 For years, AI inside software meant a chat widget bolted onto the corner of an application. You typed, the model responded with text, and you manually translated that output into whatever you actually needed it to do. It was useful the way a calculator is useful: functional, but fundamentally passive. CopilotKit, a Seattle-based startup co-founded by Atai Barkai and Uli Barkai, has spent the last two years arguing that the model is broken — and in 2026, the developer community is agreeing loudly. - AG-UI completes the agentic protocol stack by handling the agent-to-UI interaction layer that MCP and A2A leave unaddressed, with first-party SDKs across LangGraph, CrewAI, Mastra, Agno, and Pydantic AI, and community SDKs now live for Go, Kotlin, Dart, Java, Rust, Ruby, and C++. - AIMock ships one zero-dependency mock server for the entire agentic call chain — 11 LLM providers, MCP, A2A, vector DBs, search — with record-and-replay, daily drift detection, and chaos testing built in. - Pathfinder is a self-hosted MCP knowledge server that indexes docs, code, Notion pages, Slack, and Discord into hybrid vector-keyword search, with pluggable embeddings that need no external API key. - The three tools together target the three production blockers — knowledge retrieval, testing reliability, and runtime persistence — that demo-quality agents consistently fail to address. - CopilotKit's vendor-neutral, self-hostable design means teams can adopt any single layer without being locked into a proprietary runtime or forced to rebuild their existing stack. Full analysis: marktechpost.com/2026/05/21/how… GitHub repo: github.com/ag-ui-protocol… @CopilotKit #ai #aiagent #agenticai
Marktechpost AI tweet media
English
1
9
19
780
Marktechpost AI
Marktechpost AI@Marktechpost·
🔥 @NousResearch RELEASES a new method called Contrastive Neuron Attribution (CNA) that identifies and ablates sparse MLP circuits in LLMs — no SAE training, no weight modification, no gradient computation. Key results on JBB-Behaviors (100 harmful prompts): > Qwen2.5-7B: 87% → 2% refusal > Llama-3.1-70B: 86% → 18% refusal > Output quality stays above 0.97 at all steering strengths > MMLU accuracy stays within 1 point of baseline Only 0.1% of MLP activations are ablated. Tested across Llama 3.1/3.2 and Qwen 2.5 — from 1B to 72B parameters. The same late-layer neuron structure exists in base models before fine-tuning. Applying the method to base models produces no behavioral change — only content shifts. Fine-tuning doesn't build new structure. It transforms the function of pre-existing late-layer neurons into a refusal gate. 👀 x.com/NousResearch/s…
English
1
1
10
370
Marktechpost AI
Marktechpost AI@Marktechpost·
Marktechpost AI@Marktechpost

Perplexity just open-sourced an internal security tool they've been running in production. It's called 'Bumblebee'. Here's what's actually interesting: 1. It solves a specific blind spot SBOMs cover build artifacts. EDR covers running processes. Neither tells you what's installed on a developer's laptop right now. Bumblebee does exactly that — and nothing more. 2. The read-only design is the key decision npm packages can carry postinstall scripts that execute automatically on install. Most recent supply-chain worms spread that way. A scanner that invokes npm to check exposure has already triggered the attack. Bumblebee reads metadata directly — lockfiles, manifests, extension manifests — and never runs any code. 3. Four surfaces in one scan → Language package managers: npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, Composer → AI agent configs: MCP JSON host files including claude_desktop_config.json and cline_mcp_settings.json → Editor extensions: VS Code, Cursor, Windsurf, VSCodium → Browser extensions: Chrome, Edge, Brave, Arc, Comet, Firefox 4. The internal workflow is worth noting Perplexity Computer drafts a catalog entry when a threat signal lands → human reviews and merges the PR → Bumblebee runs on endpoints → findings go to the security team. Human in the loop before anything hits machines. 5. Technical details → Written in Go 1.25+, zero non-stdlib dependencies → Single static binary, three scan profiles: baseline, project, deep → Outputs NDJSON records with confidence levels (high / medium / low) → Apache 2.0, current release v0.1.1 Full analysis: marktechpost.com/2026/05/23/per… Repo: github.com/perplexityai/b… Technical details: perplexity.ai/hub/blog/perpl…. @perplexity_ai

QME
0
0
2
204
Perplexity
Perplexity@perplexity_ai·
Today we're open-sourcing Bumblebee, a read-only scanner for macOS and Linux. It checks developer machines for risky packages, extensions, and AI tool configs. Connected to Computer, it can trigger deeper scans whenever a new supply-chain risk emerges. github.com/perplexityai/b…
Perplexity tweet media
English
171
655
4.7K
1.3M
Marktechpost AI retweetledi
Pavlo Molchanov
Pavlo Molchanov@PavloMolchanov·
Want to know how @karpathy's Autoresearch + Nanochat (130k+ ⭐ combined) curate training data? It’s powered by our NeurIPS 2025 work: Nemotron-CLIMB: research.nvidia.com/labs/lpr/climb/ We show that data filtering, quality and distribution matter way more than quantity. Best part? You can now run the exact same CLIMB curation pipeline on your own data with NVIDIA NeMo Curator. 👉 Full tutorial + code: github.com/NVIDIA-NeMo/Cu… 📄 Paper: arxiv.org/abs/2504.13161 📊 ClimbMix dataset: huggingface.co/datasets/nvidi… Learn more about Nemotron data: huggingface.co/blog/nvidia/op… Watch description, video, slides: neurips.cc/virtual/2025/l… Excited to see what you build with it! TIL: Gemini is amazing at videos!
English
1
10
57
5.8K
Marktechpost AI
Marktechpost AI@Marktechpost·
Most agent frameworks today are stitching together reasoning models with external orchestration layers. Qwen3.7-Max takes a different position — train the agent capability into the model itself. Alibaba just introduced Qwen3.7-Max Here's what's actually interesting: → 1M-token context window — up from 256K on Qwen3.6 Max Preview → Extended-thinking mode with visible chain-of-thought reasoning trace → 1,000+ tool calls executed autonomously in an internal kernel optimization test → 35 hours of sustained autonomous execution on a single complex task → 56.6 on the Artificial Analysis Intelligence Index — #5 overall, ahead of Gemini 3.5 Flash → #13 in Text Arena (1,475 Elo), #7 in Math, #9 in Expert Prompts Full analysis: marktechpost.com/2026/05/21/qwe… Other technical details ⤵ @Alibaba_Qwen
English
1
6
15
514
Qwen
Qwen@Alibaba_Qwen·
📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: chat.qwen.ai/?models=qwen3.… ⚡️ API:modelstudio.console.alibabacloud.com/ap-southeast-1…
Qwen tweet media
English
269
628
4.8K
985.7K