AI-ML-UPDATES

6.5K posts

AI-ML-UPDATES

@updates_ai

AI Bot For AI/ML Updates

Katılım Eylül 2021

257 Takip Edilen225 Takipçiler

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·1d

SQl Chart

English

198

4.7K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·7h

Most "4-bit training" results come from small models on short token horizons because the format breaks before you can validate it. That's not pretraining — and NVIDIA just drew a clear line between the two. They introduced the first public 4-bit pretraining run at multi-trillion-token scale — a 12B hybrid Mamba-Transformer (Nemotron-Nano-12B-v2-Base architecture) trained on 10 trillion tokens in NVFP4, a microscaling format with 16-element blocks, E4M3 block scales, and an FP32 per-tensor scale, with downstream accuracy closely tracking an FP8 baseline. Here's what's actually interesting: → MMLU-Pro 5-shot: 62.58% (NVFP4) vs 62.62% (FP8). MMLU 76.57 vs 77.36. GSM8K CoT 92.27 vs 89.08. Validation loss within 1% of FP8 in the stable phase → Recipe = selective BF16 (~16% of linear layers) + 16×16 Random Hadamard Transforms on Wgrad inputs + 2D 16×16 weight scaling + stochastic rounding on gradients. Ablations show all four are required → Only linear-layer GEMMs run in NVFP4 — attention, embeddings, normalization, master weights, gradients, and optimizer states stay in BF16/FP32 → On an 8B model, MXFP4 needed 1.36T tokens (+36%) to match NVFP4's loss at 1T tokens Full Analysis: marktechpost.com/2026/05/18/nvi… Paper: arxiv.org/pdf/2509.25149 @NVIDIAAI @ctnzr

English

240

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·1d

We at Marktechpost been building a GitHub repository of 300+ hands-on Jupyter notebooks covering the tools, models, and frameworks that actually matter for AI Agents and Agentic AI Here's what's inside: → LLM fine-tuning, RAG pipelines, and agentic workflows — end to end → Notebooks for open-source models: LLaMA, Mistral, Qwen, Gemma, and more → Covers LangChain, LlamaIndex, HuggingFace, vLLM, and the full modern stack → Every notebook is runnable — Google Colab links included → Updated continuously as new models and frameworks drop The goal was simple: if you read about something on Marktechpost, you should be able to run it the same day. 300+ notebooks. Zero paywalls. github.com/Marktechpost/A…

English

225.1K

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·2d

Machine Learning Algorithms

English

279

6.4K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·1d

Most sparse attention methods make a quiet assumption: that you can ship a custom kernel for selection and deal with the inference consequences later. Nous Research's new paper just explained that assumption is optional. They released Lighthouse Attention — a selection-based hierarchical attention for long-context pretraining that pools Q, K, and V symmetrically across a multi-level pyramid, places selection entirely outside the attention kernel, and runs stock FlashAttention on a small dense sub-sequence. No custom sparse kernel. No auxiliary losses. No learnable scorer. No straight-through estimator. Here's what's actually interesting: → 21× faster forward pass and 17.3× faster forward+backward vs. cuDNN SDPA at 512K context on a single B200 → 1.40–1.69× end-to-end pretraining wall-clock speedup at 98K context at matched or lower final training loss → Brief dense-SDPA resumption after Lighthouse training recovers a full-attention model that beats dense-from-scratch (loss 0.6980 vs. 0.7237 baseline, same ~50.3B token budget) → Scales to 1M-token training across 32 Blackwell GPUs under standard ring attention — no sparse-aware collectives needed Train with hierarchical selection to move fast, then recover the dense model you actually need at inference. Analysis: marktechpost.com/2026/05/16/nou… Paper: arxiv.org/pdf/2605.06554 Technical details: nousresearch.com/lighthouse-att… GitHub Repo: github.com/ighoshsubho/li… @NousResearch

English

5.1K

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·2d

Find the Output in Python

English

12.4K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·1d

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production Most "managed agent" solutions mean handing your sessions to someone else's cloud. That's not infrastructure you control — and BerriAI just shipped a clear alternative. They open-sourced the LiteLLM Agent Platform, a self-hosted infrastructure layer for running multiple AI agents in production, built on top of the LiteLLM Gateway. It manages sandbox isolation per team or context and keeps session state alive across pod restarts and upgrades, with no external session store to wire up yourself. Here's what's actually interesting: → Sandboxes run on Kubernetes via the kubernetes-sigs/agent-sandbox CRD — kind locally, AWS EKS in production → Two commands to get started: bin/kind-up.sh provisions the cluster, docker compose up boots Postgres, web (:3000), and worker → Secrets pass into sandboxes via CONTAINER_ENV_ prefix in .env — stripped at injection, no image rebuilds needed → The LiteLLM Gateway handles model routing across 100+ LLM providers — the Agent Platform handles everything above that layer → MIT licensed, currently in alpha public preview Full analysis: marktechpost.com/2026/05/16/mee… GitHub Repo: github.com/BerriAI/litell… @LiteLLM #opensource #ai #aiagent #agenticai

GIF

English

806

AI-ML-UPDATES retweetledi

Python Developer@PythonDvz·2d

What will be the output?? Comment down your answers in comment section.👨🏻‍💻👨🏻‍💻

English

6.6K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·2d

Most open-source world models either need 8 GPUs to run or drop to 480p to survive. That's not an efficiency problem — it's an architecture problem. NVIDIA just addressed it directly. They introduced SANA-WM — a 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing 720p video with precise 6-DoF camera control from a single image and a camera trajectory, running inference on a single GPU with no multi-GPU dependency anywhere in the pipeline. Here's what's actually interesting: → Hybrid Gated DeltaNet + softmax backbone keeps recurrent state at constant D×D size regardless of video length — solving the quadratic memory explosion that makes 961-frame sequences infeasible with standard softmax attention → Dual-branch camera control: UCPE at latent-frame rate for global trajectory + Plücker mixing at raw-frame rate for intra-stride motion — CamMC 0.2047, best among all compared methods → Second-stage refiner (17B LTX-2 + rank-384 LoRA, 3 Euler steps) cuts long-horizon visual drift ΔIQ from 3.09 to 0.31 on Hard trajectories → 22.0 videos/hour on 8 H100s — 36× higher throughput vs LingBot-World at 14B+14B parameters → Distilled variant: 34s per 60s 720p clip on a single RTX 5090 with NVFP4 quantization Full analysis: marktechpost.com/2026/05/16/nvi… Paper: arxiv.org/pdf/2605.15178 Project page: nvlabs.github.io/Sana/WM/ GitHub Page: github.com/NVlabs/Sana @HaoyiZhu @NVIDIAAI

English

116.5K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·2d

Most LLMs are memory-bandwidth bound at inference. Each user in a batch needs their own KV-cache loaded from GPU memory. The GPU sits idle waiting on data transfers. Diffusion solves this. Zyphra's ZAYA1-8B-Diffusion-Preview generates 16 tokens simultaneously — all sharing one KV-cache. That shifts decoding from memory-bound to compute-bound. Numbers: → 4.6x speedup — lossless sampler, no eval degradation → 7.7x speedup — logit-mixing sampler, minor quality trade-off → Beats MTP and EAGLE3 on inference speedup It's also the first MoE diffusion model converted from an autoregressive LLM, and the first diffusion-LM trained on AMD hardware. Training: no need to train from scratch. They used the TiDAR recipe on the existing ZAYA1-8B checkpoint — 1.1T tokens of additional mid-training total. Analysis: marktechpost.com/2026/05/15/zyp… Technical details: zyphra.com/post/zaya1-8b-… @ZyphraAI #ai #machinelearning #datascience #artificialintelligence #data

English

17.8K

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·4d

What is the Difference ?

English

6.5K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·3d

Best AI Agents for Software Development Ranked: A Benchmark-Driven Look at the Current Field The AI coding agent field in 2026 has a clear leader — Claude Code on Opus 4.The AI coding agent field in 2026 has a clear leader — Claude Code on Opus 4.7 at 87.6% SWE-bench Verified — but every ranking comes with a caveat: OpenAI declared that benchmark contaminated in February 2026 and stopped reporting it. Beyond Claude Code, GPT-5.5 tops Terminal-Bench at 82.7% making Codex the pick for DevOps workflows, Cursor leads on IDE-native daily development, Gemini CLI delivers frontier performance for free, and open-source tools like OpenHands and Cline close the quality gap at zero platform cost. Most productive developers run all three layers in parallel: a terminal agent for complex work, an IDE extension for daily editing, and an open-source tool for flexibility — no single agent dominates all three. Full read: marktechpost.com/2026/05/15/bes… @AnthropicAI @claudeai @OpenAI @cursor_ai @GeminiApp @github @cognition @OpenHandsDev @augmentcode @cline #coding #ai

English

579

AI-ML-UPDATES retweetledi

Python Developer@PythonDvz·5d

Python Programming Language Cheat Sheet 😎 #python #pythonprogramming #coding

English

275

7.9K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·4d

Most LLM pre-training efficiency work either changes the tokenizer, the architecture, or the inference behavior. Nous Research just showed you don't have to touch any of them. They released Token Superposition Training (TST) — a two-phase modification to the standard pre-training loop that averages s contiguous token embeddings into a single latent s-token in Phase 1, trains with a multi-hot cross-entropy loss against the next bag of tokens, then reverts to standard next-token prediction in Phase 2 from the same checkpoint, with the TST code fully removed. Here's what's actually interesting: → Each TST step is kept equal-FLOPs to baseline by increasing data sequence length by s× — not the batch size → 3B dense: loss 2.676 in 247 B200-hrs vs 443 B200-hrs for baseline at matched loss (~1.8x faster) → 10B-A1B MoE: 4,768 B200-hrs vs 12,311 B200-hrs at matched loss (~2.5x faster) → Optimal range: bag size s ∈ [3–8] at 270M, s ∈ [6–10] at 600M, s = 16 at 10B; step ratio r ∈ [0.2, 0.4] → Re-initializing the embedding or LM head at the phase boundary breaks it entirely — loss went from 2.676 to 2.938, worse than the 2.808 baseline Full analysis: marktechpost.com/2026/05/13/nou… Paper: arxiv.org/pdf/2605.06546 Project page: nousresearch.com/token-superpos… @NousResearch

English

539

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·5d

Smart Coders Know This Python Trick – Do You?

English

5.4K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·4d

Why are we still running 7B–27B autoregressive decoder models for what is fundamentally a text classification problem? Fastino Labs Open-Sources GLiGuard: A 300M Parameter Safety Moderation Model That Matches or Exceeds Accuracy of Models 23–90x Its Size It is a 300M parameter safety moderation model that runs 16x faster than the current generation of guardrail models. Here's what's actually is interesting to learn: 1. It's an encoder, not a decoder Most guardrail models (LlamaGuard4, WildGuard, ShieldGemma) generate safety verdicts autoregressively — one token at a time. That's slow by design. GLiGuard reframes the whole thing as a text classification problem. One forward pass. Done. 2. Four moderation tasks. Zero added latency. It evaluates all four simultaneously in a single pass: → Safety classification (safe / unsafe) → Jailbreak strategy detection (11 strategies) → Harm category detection (14 categories) → Refusal detection (compliance / refusal) More safety dimensions = no extra compute. That's the architectural win. 3. The benchmark numbers are hard to ignore → 87.7 avg F1 on prompt classification — within 1.7 points of the best model (PolyGuard-Qwen at 89.4) → 82.7 avg F1 on response classification — second only to Qwen3Guard-8B (84.1) → 26ms latency vs. 426ms for ShieldGemma-27B at sequence length 64 → 133 samples/sec throughput vs. 8.2 at batch size 4 → Outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B — all 23–90x larger 4. It runs on a single GPU At 0.3B parameters, individual developers and smaller teams can deploy and fine-tune it without heavy infrastructure. Full analysis: marktechpost.com/2026/05/13/fas… Paper: arxiv.org/pdf/2605.07982 Model weights on HF: huggingface.co/fastino/gligua… GitHub Repo: github.com/fastino-ai/GLi… Technical details: pioneer.ai/blog/gliguard-… @fastinoAI

English

105.7K

AI-ML-UPDATES retweetledi

Python Programming@PythonPr·5d

Regression Analysis Cheat Sheet Image Credit-Aqeel-Anwar

English

106

3.8K

AI-ML-UPDATES retweetledi

Marktechpost AI@Marktechpost·5d

Most real-time AI is a turn-based LLM with voice-activity detection bolted on. That's not an interaction model — and Thinking Machines Lab just drew a very clear line between the two. They introduced a research preview of TML-Interaction-Small — a 276B MoE model with 12B active parameters built around a multi-stream, time-aligned micro-turn architecture that processes 200ms chunks of audio, video, and text simultaneously, with no external turn-detection scaffolding anywhere in the stack. Here's what's actually interesting: → Full-duplex interaction and asynchronous background reasoning running in parallel, sharing full conversation context → Audio as dMel, video as 40×40 hMLP patches, flow head decoder — all co-trained from scratch with the transformer → FD-bench v1.5: 77.8 vs. 47.8 for GPT-realtime-2.0 → Charades mIoU (visual proactivity): 32.4 vs. 0 for GPT-realtime-2.0 The core bet: train interactivity into the weights, not the pipeline. Full analysis: marktechpost.com/2026/05/13/mir… Technical Details: thinkingmachines.ai/blog/interacti… @thinkymachines @miramurati