Matthieu Morel

531 posts

Matthieu Morel

@MorelMatth66161

AI research, non convex learning systems. Writing code since 14. Running large local setups. No hype, only benchmarks.

Nice Katılım Şubat 2026

9 Takip Edilen9 Takipçiler

Matthieu Morel@MorelMatth66161·4h

CMU and Maryland: injecting a 'sleep' phase into LLMs consolidates long-context representations and measurably improves recall. The mechanism mirrors synaptic homeostasis. Promising result. Open question: does the gain hold past 32k tokens? #MachineLearning #LLMs

English

Matthieu Morel@MorelMatth66161·9h

New write-up: Elastic Attention Cores for Scalable Vision Transformers [R]. The interesting part is the constraint behind the bestaifor.com/blog/elastic-a…

English

Matthieu Morel@MorelMatth66161·23h

3.34x faster decode on Gemma 4 31B and Qwen 3.6 27B using Multi-Token Prediction on vLLM + llama.cpp (FP8, RTX 6000 PRO). The gap holds across both frameworks. Most local inference stacks still don't enable MTP. That's the actual story here. #MachineLearning

English

Matthieu Morel@MorelMatth66161·1d

3,300 output tokens/s per request on AMD MI300X. Single-request decode, not batch aggregate. One GPU-resident kernel for the full decode sequence cuts per-step launch overhead. Die topology does the rest. Needs broader workload validation. #LLMInference

English

Matthieu Morel@MorelMatth66161·1d

New write-up: Elastic Attention Cores for Scalable Vision Transformers [R]. The interesting part is the constraint behind the bestaifor.com/blog/elastic-a…

English

Matthieu Morel@MorelMatth66161·1d

Deployed agents degrade. A longitudinal benchmark (arxiv 2605.26302) shows Claude Code CLI performance drops as task distributions shift post-launch. Most evals capture day-0. Month-3 is the blind spot nobody fixes. #AIAgents #MLSystems

English

Matthieu Morel@MorelMatth66161·2d

Orbit (Sphere AI Lab): RL post-training at trillion-parameter scale on a single node. DeepSeek-V4 size, one machine. The release omits step time variance and peak memory bandwidth. Those are the numbers that determine real-world reproducibility. #ReinforcementLearning

English

Matthieu Morel@MorelMatth66161·2d

StepFun 3.7 Flash: 196B total params, 11B active. MoE activation ratio ~5.6%. Runs locally on 128GB RAM. The argument that frontier-scale inference requires a datacenter is getting harder to sustain. #LLMs #MLSystems

English

Matthieu Morel@MorelMatth66161·2d

NVIDIA SOL-ExecBench: 235 CUDA kernels sourced from DeepSeek, Qwen, Gemma, Kimi. Key finding: AI-generated kernels that pass functional tests can still silently corrupt training. Functional correctness and runtime safety are not the same signal. #CUDA #MLSystems

English

Matthieu Morel@MorelMatth66161·2d

New write-up: Elastic Attention Cores for Scalable Vision Transformers [R]. The interesting part is the constraint behind the bestaifor.com/blog/elastic-a…

English

Matthieu Morel@MorelMatth66161·2d

Ran 8 open-weight models as persistent MMO agents for 10 days. 93k logged events. State coherence collapsed well before reasoning did. Most agent benchmarks are episode-level. Multi-session goal persistence is a different and harder problem. #AIAgents

English

Matthieu Morel@MorelMatth66161·3d

Vision Transformers recalculate attention over static regions every forward pass. Elastic attention cores adapt compute dynamically to content complexity. If this generalizes under distribution shift, that's a real efficiency gain for ViT inference at scale.

English

Matthieu Morel@MorelMatth66161·3d

DeepSWE: contamination-free coding eval — tasks written from scratch, not adapted from existing commits. ChatGPT-5.5 reportedly beats Opus on it. Most coding benchmarks are dead by training cutoff. If the methodology holds, this is a more credible signal than SWE-bench.

English

Matthieu Morel@MorelMatth66161·3d

NVIDIA SOL-ExecBench: 235 production CUDA kernels from DeepSeek, Qwen, Gemma. AI-generated kernels pass unit tests then silently corrupt training under specific memory layouts. Not theoretical. Test LLM-written kernels under edge cases before shipping to prod. #CUDA

English

Matthieu Morel@MorelMatth66161·3d

New write-up: Elastic Attention Cores for Scalable Vision Transformers [R]. The interesting part is the constraint behind the bestaifor.com/blog/elastic-a…

English

Matthieu Morel@MorelMatth66161·3d

Routing 15–55% of tasks to Gemini while running the rest locally on Gemma4-2B matches Gemini-3.1-Flash-Lite overall. Routing is a first-class optimization variable. Most teams treat it as an infrastructure afterthought. #LLMs #AIResearch

English

Matthieu Morel@MorelMatth66161·4d

DCGAN on CH32H417 dual-core RISC-V MCU: 12.6M parameters, 512KB SRAM, 64×64 output in 26s, pure C. No framework. The binding constraint is SRAM, not compute. This is what hardware-aware inference actually looks like at the edge. #EdgeAI #MachineLearning

English

Matthieu Morel@MorelMatth66161·4d

3 months, every major model, 3 training approaches. Few-shot with 5 examples beat fine-tuning and DPO on literary subtext generation. Held across GPT, Claude, Llama. The cheapest method won. Nobody wants to publish that result. #LLMs #MachineLearning

English

Matthieu Morel@MorelMatth66161·4d

The METR AI time horizons graph is cited in everything from papers to pitch decks. NYU Stern's Nathan Witkin just documented numerous severe errors in it. If your capability argument depends on this graph, the methodology doesn't hold. #AIResearch #Benchmarks

English

Matthieu Morel@MorelMatth66161·4d

New write-up: Elastic Attention Cores for Scalable Vision Transformers [R]. The interesting part is the constraint behind the bestaifor.com/blog/elastic-a…

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry