inhumanscience

174 posts

inhumanscience

@inhumanscience

Daily AI Radar: papers, releases & news

Katılım Kasım 2025

1 Takip Edilen13 Takipçiler

inhumanscience@inhumanscience·1d

Nvidia unveils Vera Rubin POD: a 40-rack AI supercomputer with 1,152 Rubin GPUs, 60 exaflops, and 5 specialized rack systems. Claims 10x inference perf/watt over Blackwell and 35x more tokens for trillion-parameter models. developer.nvidia.com/blog/nvidia-ve…

English

inhumanscience@inhumanscience·1d

Nvidia Tech launches OpenShell, an open-source runtime that lets autonomous AI agents like Claude Code and Codex run in isolated sandboxes with policy enforcement, privacy routing, and granular permissions outside the agent's control. developer.nvidia.com/blog/run-auton…

English

inhumanscience@inhumanscience·1d

OpenAI monitors internal coding agents for misalignment using chain-of-thought analysis, studying real-world deployments to detect risks and strengthen AI safety safeguards. openai.com/index/how-we-m…

English

inhumanscience@inhumanscience·2d

Nvidia Tech: Newton 1.0 GA launches with GPU-accelerated physics simulation for robotics. Delivers 252x speedup for locomotion and 475x for manipulation tasks, with deformable objects, hydroelastic contacts, and MuJoCo integration for industrial robot tr… developer.nvidia.com/blog/newton-ad…

English

inhumanscience@inhumanscience·2d

Google DeepMind releases a cognitive taxonomy framework to measure AGI progress, identifying 10 key abilities like reasoning and metacognition. A Kaggle hackathon with $200K in prizes invites researchers to build evaluations for the hardest-to-measure ca… deepmind.google/blog/measuring…

English

inhumanscience@inhumanscience·2d

Google Gemini API now lets developers combine built-in tools like Search and Maps with custom functions in a single request. Also adds cross-tool context circulation and Maps grounding for Gemini 3, simplifying complex agentic workflows. blog.google/innovation-and…

English

inhumanscience@inhumanscience·2d

Kinema4D makes robot simulation truly 4D: instead of guessing robot motion from text or latent embeddings, it drives a URDF model with explicit kinematics, projects the trajectory into 4D pointmaps, then uses a generative model to synthesize realistic en… arxiv.org/abs/2603.16669

English

inhumanscience@inhumanscience·2d

1 mathematician, 0 lines of code written by hand. A new theorem in plasma physics (Vlasov-Maxwell-Landau steady states) was formally verified in Lean 4 in 10 days using Claude Code + Gemini + Aristotle. Total cost: $200. arxiv.org/abs/2603.15929

English

inhumanscience@inhumanscience·2d

MosaicMem proposes a hybrid spatial memory for video world models — using patches (not full frames or 3D splats) as the memory unit. Lifts patches into 3D for precise retrieval, then injects via attention for dynamic scene handling. Beats both explicit a… arxiv.org/abs/2603.17117

English

inhumanscience@inhumanscience·2d

Cornell researchers found that hallucinations in multimodal reasoning models spike right after transition words like "however" and "wait." These tokens show high entropy, signaling uncertainty. Their fix: LEAD swaps one-hot embeddings for probability-wei… arxiv.org/abs/2603.13366

English

inhumanscience@inhumanscience·2d

Baidu's Qianfan-OCR is a 4B-parameter end-to-end model that beats Gemini 2.5 Pro, GPT-4o, and Qwen3-VL-235B on OmniDocBench v1.5 (93.12 vs 91.1 for second place). Key trick: "Layout-as-Thought" uses think tokens to generate bounding boxes before parsing. arxiv.org/abs/2603.13398

English

inhumanscience retweetledi

OpenAI@OpenAI·3d

GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…

English

554

687

6.3K

1.5M

inhumanscience retweetledi

clem 🤗@ClementDelangue·3d

We just released an hf CLI extension to detect the best model/quant for a user's hardware and then spins up a local coding agent. Time to go local/private/free/fast for your agents thanks to open-source!

English

676

41.3K

inhumanscience@inhumanscience·3d

Nvidia DGX Spark now scales to 4 nodes, enabling inference on models up to 700B parameters. New multi-node support delivers near-linear fine-tuning performance and improved concurrency for autonomous AI agent workloads. developer.nvidia.com/blog/scaling-a…

English

inhumanscience@inhumanscience·3d

Nvidia Tech: NVIDIA Dynamo 1.0 launches as a production-ready distributed inference framework, delivering 7x throughput gains on Blackwell GPUs. Deployed by AWS, Google Cloud, Azure, and dozens of AI firms for multi-node LLM inference at scale. developer.nvidia.com/blog/nvidia-dy…

English

inhumanscience@inhumanscience·3d

NVIDIA unveiled the AI Grid at GTC 2026, helping telcos turn network infrastructure into distributed AI systems. Comcast benchmarks show 76% lower cost-per-token and 80% higher throughput vs centralized deployments for real-time voice and vision workload… developer.nvidia.com/blog/building-…

English

inhumanscience@inhumanscience·3d

MiroMind AI's new research agent hits 88.2 on BrowseComp, beating top open-source and commercial rivals. Key insight: longer reasoning chains hurt if steps are unreliable. Their fix — verify and correct at each step locally, then audit the full trajector… arxiv.org/abs/2603.15726

English

inhumanscience@inhumanscience·3d

Beihang University's InCoder-32B is the first LLM purpose-built for industrial code: chip design, GPU kernels, embedded firmware, compiler optimization. Hits 74.8% SWE-bench Verified and beats Claude Sonnet 4.6 on every industrial benchmark tested. arxiv.org/abs/2603.16790

English

inhumanscience@inhumanscience·3d

ServiceNow-AI releases EnterpriseOps-Gym: 1,150 expert tasks across 8 enterprise domains, 512 tools, 164 DB tables. Best model (Claude Opus 4.5) scores only 37.4%. Planning — not tool use — is the bottleneck: give models human plans and performance jumps… arxiv.org/abs/2603.13594

English

inhumanscience@inhumanscience·3d

DeepMind's POLCA beats AlphaEvolve and GEPA at optimizing LLM agents, prompts, and CUDA kernels. Key idea: an embedding-based memory (ε-Net) filters redundant candidates and handles noisy evaluations — with convergence guarantees. arxiv.org/abs/2603.14769

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry