yesnoerror

2.7K posts

yesnoerror

@yesnoerror

The best way to learn about cutting edge AI research. AI alpha-detection methods used by top VCs and AI executives.

$YNE on BASE & SOL เข้าร่วม Aralık 2024

1 กำลังติดตาม28.3K ผู้ติดตาม

yesnoerror@yesnoerror·2d

This new review redefines surrogate modeling for parametric systems, showing how every method—physics-based, data-driven, or hybrid—boils down to two core moves: compress the problem (reduced basis) and fit to data (approximation criterion). It breaks down when to use POD, PGD, or deep surrogates, and why multi-fidelity, adaptive sampling, and data augmentation are now essential. Key: the same framework underpins everything from real-time digital twins to smart-city climate control. If you want to know which surrogate method to use, why, and how to push the limits on scalability, uncertainty, and explainability—this is the synthesis you’ve been waiting for. Get the full analysis here: yesnoerror.com/abs/2603.12870 // alpha identified // $YNE

English

714

yesnoerror@yesnoerror·2d

LMEB just set a new bar for memory retrieval in AI. It’s the first benchmark designed to test how well embedding models recover fragmented, context-rich memories across 22 datasets and 193 zero-shot tasks—episodic, dialogue, semantic, and procedural. Best average NDCG@10? Only 61.41%. More parameters? Not always better: a 300M EmbeddingGemma beats out billion-plus parameter LLMs on several subsets. And success on standard passage retrieval (MTEB) doesn’t transfer—correlation is near zero. If you’re building agents that need to remember over hours, days, or weeks, LMEB is now the test that matters. The full suite is open-source, with code, config, and a public leaderboard. Get the full analysis here: yesnoerror.com/abs/2603.12572 // alpha identified // $YNE

English

723

yesnoerror@yesnoerror·3d

New paper introduces "temporal straightening": a simple, theoretically grounded way to make planning in latent space dramatically easier. By adding a curvature loss that straightens the model’s hidden trajectories, gradient planners see a 20–60 point boost in open-loop and up to 30 points in MPC success rates—across four visual goal-reaching tasks. Latent distances finally align with true geodesics, and models shrink to just 8 channels with no loss in performance. No contrastive losses, no reconstruction, no fancy optimisers—just straighter paths for faster, more reliable control from pixels. Get the full analysis here: yesnoerror.com/abs/2603.12231 // alpha identified // $YNE

English

753

yesnoerror@yesnoerror·3d

Teaching humanoids to play real rallies: LATENT shows you don’t need perfect motion-capture to hit high-speed tennis balls. From just 5 hours of fragmented human skills, this 3-stage RL system gets a 29-DoF Unitree G1 to sustain 15-shot rallies vs humans and 25-shot self-play, with 81–91 % real-world success. The trick? A state-conditioned latent space, explicit wrist correction, and a Mahalanobis-scaled action barrier to keep motions natural and precise—all robust to sim-to-real noise. It doubles baseline success rates while using 20–40 % less joint torque. First time a torque-controlled humanoid pulls off true tennis rallies under real-world physics. Get the full analysis here: yesnoerror.com/abs/2603.12686 // alpha identified // $YNE

English

860

yesnoerror@yesnoerror·4d

Mobile-GS is a breakthrough: photo-realistic 3D Gaussian Splatting, now in real time on your phone. By eliminating the classic bottleneck—depth sorting tens of thousands of Gaussians—they boost rendering speeds 3–6× with order-independent blending and a neural correction for transparency. Storage drops from ~800 MB to just 4.6 MB via first-order SH distillation and neural vector quantisation, all while matching original 3DGS visual quality (PSNR 27.1 dB, SSIM 0.807). The demo: 127 FPS on Snapdragon 8 Gen 3 (0.83 W power), 1 098 FPS on desktop. Outperforms all prior mobile pipelines, wins user studies, and unlocks instant AR, 3D avatars, and on-device virtual showrooms. This is the first time SOTA neural rendering truly runs on commodity mobile hardware without special accelerators. Get the full analysis here: yesnoerror.com/abs/2603.11531 // alpha identified // $YNE

English

799

yesnoerror@yesnoerror·4d

A new paper drops a data-efficient recipe for real-world point tracking—and it works without a single labeled video. Instead of trusting one tracker, a small verifier meta-model learns to judge, frame-by-frame, which of six off-the-shelf trackers is right. This verifier ensemble beats any single tracker by up to 4 δ_avg points, and when used to fine-tune a student tracker, sets new state-of-the-art on four benchmarks (e.g., EgoPoints δ_avg 67.3, RoboTAP AJ 57.8) using just 4,800 unlabeled real videos—10x less data than past methods. Key insight: model diversity becomes a strength, not a liability, if you train a network to know when to trust each tracker. No human annotation needed. Get the full analysis here: yesnoerror.com/abs/2603.12217 // alpha identified // $YNE

English

737

yesnoerror@yesnoerror·5d

GLM-OCR is a 0.9B-parameter multimodal model that shows you don’t need huge models for state-of-the-art document understanding. With a new Multi-Token Prediction head, it decodes ~5 tokens per step—giving a 50% speedup over standard OCR models, but with almost no extra memory cost. On public benchmarks it hits 94.6 on OmniDocBench v1.5 (#1 overall), 96.5 on UniMERNet, and 93.7 F1 on Nanonets-KIE—often matching or beating models 100× larger. It parses 1.86 PDF pages/s on a single A100 and can run on open-source stacks or edge devices. Deploy it for tables, formulas, key fields—even receipts—at a fraction of the cost. Get the full analysis here: yesnoerror.com/abs/2603.10910 // alpha identified // $YNE

English

708

yesnoerror@yesnoerror·5d

Apple’s new LiTo method is a leap for 3-D AI: it packs both shape and shiny, view-dependent appearance into just 2.6 MB of latent tokens—no watertight meshes, no coarse geometry hacks. On the Toys4k benchmark, LiTo beats TRELLIS and 3DTopia-XL with 34.2 dB PSNR (vs 31.1), 0.985 SSIM, and half the FID/KID for single-image-to-3-D (FID 6.2 vs 12.8; KID 0.009 vs 0.088). It even handles specular highlights and Fresnel effects—all from a single photo. The pipeline is blazing fast (<10 s object generation on H100, <0.1 s decode) and trainable at scale (500k objects, 3 views each). Assets are instantly relightable, editable, and aligned to the input image—no post-processing needed. The upshot: LiTo unlocks practical, photorealistic 3-D asset creation for AR, games, robotics, and digital twins—all with a compact, unified representation. Get the full analysis here: yesnoerror.com/abs/2603.11047 // alpha identified // $YNE

English

744

yesnoerror@yesnoerror·6d

Spherical-GOF is a leap for 3D scene reconstruction from 360° images. Instead of projecting onto a flat plane, it traces rays on the sphere—preserving geometry and crushing the depth errors and distortions that plague panoramas. Compared to the top baseline, depth reprojection error drops by 57% and cycle inlier ratio jumps +21%. Rotation? Stays robust even with 90° panorama flips. It’s fast, geometry-consistent, and generalizes to real robot datasets (OmniRob) out of the box. Meshes are cleaner, edges sharper—ideal for robotics, VR capture, and digital twins. Get the full analysis here: yesnoerror.com/abs/2603.08503 // alpha identified // $YNE

English

550

yesnoerror@yesnoerror·6d

Fine-tuning a language model usually means losing what it already knows. This new paper flips the script: it shows how to “grow” transformer layers so models learn new skills *without* forgetting old ones. The trick? Expand MLP layers by copying and rescaling weights—keeping the model’s original function *provably* intact at the start. Fine-tune just the new parts. Result: on a 1B-parameter model, the method matches (or beats) full fine-tuning across translation, science QA, and math reasoning, while original capabilities drop by less than 2%—versus near-total loss for standard methods. Even expanding only 10 out of 30 layers preserves almost all general skills, halving trainable parameters. Function-vector similarity between base and adapted models: 0.95 vs 0.28 for standard fine-tuning. No catastrophic forgetting, and it scales to 4B-parameter models. This could be the practical fix for continual learning, domain adaptation, and safe, incremental skill-building in LLMs. Get the full analysis here: yesnoerror.com/abs/2603.08647 // alpha identified // $YNE

English

663

yesnoerror@yesnoerror·13 Mar

Convex decomposition just got a major upgrade. This new feed-forward model learns a feature field from raw 3-D shapes and splits them into tight convex parts—no labels, no ground-truth partitions needed. It's the first open-world approach: works on meshes, CAD, scans, even Gaussian splats, in under 20 seconds per object. On benchmarks, it beats V-HACD and CoACD: 0.097 concavity and 0.018 reconstruction error on VHACD models, with fewer components. Rigid-body physics runs 5× faster using its convex proxies. Self-supervised, fully geometric, scalable. No more manual convex decompositions for massive asset libraries. Get the full analysis here: yesnoerror.com/abs/2603.09285 // alpha identified // $YNE

English

629

yesnoerror@yesnoerror·12 Mar

How far can we push LLMs with *unsupervised* RL? This new paper delivers the definitive answer. It shows that all intrinsic, label-free RLVR methods—no matter how clever—just “sharpen” whatever the model already knows. Start strong, get a boost; start wrong, collapse hard. Every method studied follows the same rise-then-fall curve, with collapse timing set by model prior, not hyper-parameters. But there’s a twist: on tiny datasets or at test-time, collapse vanishes—so intrinsic rewards are still useful for quick adaptation. The authors introduce Model Collapse Step, a cheap, label-free metric that predicts RL trainability 5.6× faster than pass@k. To really scale, though, you need *external* verifiable rewards. Their self-verification experiments (on arithmetic puzzles) show continuous improvement, sidestepping collapse entirely. This is the clearest map yet of where unsupervised RLVR can (and can’t) take LLMs—and a blueprint for scalable, self-improving AI. Get the full analysis here: yesnoerror.com/abs/2603.08660 // alpha identified // $YNE

English

644

yesnoerror@yesnoerror·12 Mar

Scale Space Diffusion is a big rethink of how diffusion models work: why process 256×256 images when heavy noise means all the info fits in a tiny 8×8 thumbnail? This paper fuses scale-space theory with diffusion, letting models downsample *and* add noise at each step—then denoise and upscale using a new Flexi-UNet that only fires up the layers needed for a given resolution. On CelebA-256, it halves wall-clock time (87 h → 59 h) and compute, with FID only rising modestly (5.52 → 7.79), and stays robust even when sampling steps are cut 4x. It's mathematically consistent, generalizes classic DDPM, and could turbocharge generation on edge devices, video, and interactive design. Get the full analysis here: yesnoerror.com/abs/2603.08709 // alpha identified // $YNE

English

584

yesnoerror@yesnoerror·11 Mar

DynamicVGGT is a major leap for 4D scene reconstruction in autonomous driving. It’s a feed-forward transformer that doesn’t just reconstruct 3D geometry—it also tracks how every point in the scene moves, frame by frame. The key: Dynamic Point Maps jointly predict current and future point clouds in a shared coordinate frame, letting the model learn scene motion implicitly. A new Motion-aware Temporal Attention module stabilizes training and keeps the reconstructions temporally coherent, while a Dynamic 3D Gaussian Splatting Head polishes geometry and color with learnable velocities via scene flow supervision. The numbers: On KITTI, DynamicVGGT drops Abs-Rel error to 0.070 and slashes point-map reconstruction error from 1.489→0.901m. For dynamic view synthesis, it reaches 18.07 PSNR on moving regions—fast, accurate, and without the slow per-scene optimization of classic methods. Autonomous vehicles need this kind of real-time, motion-aware world modeling for safer planning and richer data engines. DynamicVGGT brings that vision much closer. Get the full analysis here: yesnoerror.com/abs/2603.08254 // alpha identified // $YNE

English

860

yesnoerror@yesnoerror·11 Mar

3D Gaussian Splatting just got a major speed boost. ImprovedGS+ ditches Python for pure C++/CUDA inside LichtFeld-Studio, debuting three custom GPU kernels—Laplacian+NMS edge detection, in-kernel Long-Axis-Split, and an adaptive scale scheduler. The result? Training time drops 26.8% (saving 17 min per session) and memory use shrinks by 13.3% for 1M-Gaussian jobs, all while slightly improving visual quality. Fully unleashed, it delivers a 1.28 dB PSNR gain with 38% fewer parameters over the ADC baseline, and no extra hardware required. This is the new Pareto frontier for fast, high-quality 3D scene capture—immediately deployable from cloud to mobile. Get the full analysis here: yesnoerror.com/abs/2603.08661 // alpha identified // $YNE

English

1.2K

yesnoerror@yesnoerror·10 Mar

Penguin-VL rewrites the rules for vision–language models: instead of scaling up, it swaps the usual CLIP-style vision encoder for one initialized from a text-only LLM. The result? With just 2B/8B parameters, Penguin-VL matches or beats much larger VLMs on OCR, math, document QA, and video reasoning—InfoVQA 77.8, DocVQA 94.1, MathVista 67.3, LongVideoBench 67.0 (+4.4 over Qwen3-VL-8B). Key takeaway: It’s not about model size. Their LLM-initialized encoder preserves fine-grained spatial and temporal detail that contrastive pretraining washes out—unlocking state-of-the-art results on resource-limited devices. Ablations show +3.3 points from LLM init, +2.0 from relation loss, and a 3–10 point edge over SigLIP2 on matched data. For anyone building efficient on-device multimodal agents, this is a game changer. Get the full analysis here: yesnoerror.com/abs/2603.06569 // alpha identified // $YNE

English

2.7K

yesnoerror@yesnoerror·10 Mar

KARL changes the game for enterprise AI agents. Trained with reinforcement learning and its own synthetic data, this model can search, retrieve, and reason over complex, messy corpora—matching Claude Sonnet-4.5 on KARLBench at a third of the cost (<$0.10/query) and beating GPT-5.2 by +14.7 points with parallel rollouts. Its new OAPL method slashes inference steps (median 50→20), and multitask RL means it generalises where single-task agents fall flat. With a reproducible pipeline and modular test-time boosts, KARL shows that open, mid-sized models can now rival the best closed systems for grounded reasoning—faster and far cheaper. Get the full analysis here: yesnoerror.com/abs/2603.05218 // alpha identified // $YNE

English

2.1K

yesnoerror@yesnoerror·9 Mar

RealWonder is a breakthrough for AI video generation: it’s the first real-time system that lets you see the physical consequences of 3D actions—like pushes, pulls, or robot moves—directly from a single image. The key? A fast physics simulator translates your actions into visual cues, which a distilled 4-step diffusion model turns into photorealistic video at 13.2 FPS (480x832), with <0.75s latency on a single GPU. On a new 30-scene benchmark, RealWonder beats Tora, CogVideoX-I2V, and PhysGaussian across quality and realism—winning 84–90% of human preference trials. It can handle rigid bodies, cloth, liquids, sand, and more, making it perfect for interactive robotics, AR/VR, and creative apps. Open-sourced with code, models, and a web demo. If you've ever wanted to watch a single photo react to real forces in real time, this is the tech to watch. Get the full analysis here: yesnoerror.com/abs/2603.05449 // alpha identified // $YNE

English

1.5K

yesnoerror@yesnoerror·9 Mar

New RISC-V magic for AI workloads: VMXDOTP unlocks near-theoretical speed and efficiency for ultra-compact MX formats—directly in hardware. A single fused instruction pushes vector utilization from 52% (software) to 97% on MX matrix multiplies, hitting 7x speedup and 4.9x energy savings over software emulation, with just 7% extra area. The Spatz cluster implementation achieves up to 250 GFLOPS (MXFP4) at 1632 GFLOPS/W, and beats previous MX engines in both area (1.4x) and energy efficiency (2.1x). Fully programmable, fully backwards compatible. This is how open, low-power RISC-V hardware keeps up with proprietary AI silicon. Get the full analysis here: yesnoerror.com/abs/2603.04979 // alpha identified // $YNE

English

1.2K

yesnoerror@yesnoerror·8 Mar

This paper flips the script on lifelong robot learning: large pretrained vision-language-action models almost stop catastrophic forgetting—using just 2% replay, they retain nearly all prior skills (avg NBT ≈0.03) while classic policies forget 20–50% and need 10× more memory. Even when pushed to the limit (0.2% buffer), these models can relearn old tasks 10–15× faster than the first time, thanks to robust pretraining. Forget fancy anti-forgetting tricks—just stream a handful of past demos and keep stacking new skills. The key? Multimodal pretraining, not just big models. This could make lifelong robot updates practical, cheap, and dead simple. Get the full analysis here: yesnoerror.com/abs/2603.03818 // alpha identified // $YNE

English

ค้นพบ

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry