inhumanscience

174 posts

inhumanscience banner
inhumanscience

inhumanscience

@inhumanscience

Daily AI Radar: papers, releases & news

Katılım Kasım 2025
1 Takip Edilen13 Takipçiler
inhumanscience
inhumanscience@inhumanscience·
Nvidia unveils Vera Rubin POD: a 40-rack AI supercomputer with 1,152 Rubin GPUs, 60 exaflops, and 5 specialized rack systems. Claims 10x inference perf/watt over Blackwell and 35x more tokens for trillion-parameter models. developer.nvidia.com/blog/nvidia-ve…
English
0
0
0
12
inhumanscience
inhumanscience@inhumanscience·
Nvidia Tech launches OpenShell, an open-source runtime that lets autonomous AI agents like Claude Code and Codex run in isolated sandboxes with policy enforcement, privacy routing, and granular permissions outside the agent's control. developer.nvidia.com/blog/run-auton…
English
0
0
0
22
inhumanscience
inhumanscience@inhumanscience·
OpenAI monitors internal coding agents for misalignment using chain-of-thought analysis, studying real-world deployments to detect risks and strengthen AI safety safeguards. openai.com/index/how-we-m…
English
0
0
0
7
inhumanscience
inhumanscience@inhumanscience·
Nvidia Tech: Newton 1.0 GA launches with GPU-accelerated physics simulation for robotics. Delivers 252x speedup for locomotion and 475x for manipulation tasks, with deformable objects, hydroelastic contacts, and MuJoCo integration for industrial robot tr… developer.nvidia.com/blog/newton-ad…
English
0
0
0
11
inhumanscience
inhumanscience@inhumanscience·
Google DeepMind releases a cognitive taxonomy framework to measure AGI progress, identifying 10 key abilities like reasoning and metacognition. A Kaggle hackathon with $200K in prizes invites researchers to build evaluations for the hardest-to-measure ca… deepmind.google/blog/measuring…
English
0
0
0
21
inhumanscience
inhumanscience@inhumanscience·
Google Gemini API now lets developers combine built-in tools like Search and Maps with custom functions in a single request. Also adds cross-tool context circulation and Maps grounding for Gemini 3, simplifying complex agentic workflows. blog.google/innovation-and…
English
0
0
0
7
inhumanscience
inhumanscience@inhumanscience·
Kinema4D makes robot simulation truly 4D: instead of guessing robot motion from text or latent embeddings, it drives a URDF model with explicit kinematics, projects the trajectory into 4D pointmaps, then uses a generative model to synthesize realistic en… arxiv.org/abs/2603.16669
inhumanscience tweet media
English
0
0
0
19
inhumanscience
inhumanscience@inhumanscience·
1 mathematician, 0 lines of code written by hand. A new theorem in plasma physics (Vlasov-Maxwell-Landau steady states) was formally verified in Lean 4 in 10 days using Claude Code + Gemini + Aristotle. Total cost: $200. arxiv.org/abs/2603.15929
inhumanscience tweet media
English
0
0
1
19
inhumanscience
inhumanscience@inhumanscience·
MosaicMem proposes a hybrid spatial memory for video world models — using patches (not full frames or 3D splats) as the memory unit. Lifts patches into 3D for precise retrieval, then injects via attention for dynamic scene handling. Beats both explicit a… arxiv.org/abs/2603.17117
inhumanscience tweet media
English
0
0
0
15
inhumanscience
inhumanscience@inhumanscience·
Cornell researchers found that hallucinations in multimodal reasoning models spike right after transition words like "however" and "wait." These tokens show high entropy, signaling uncertainty. Their fix: LEAD swaps one-hot embeddings for probability-wei… arxiv.org/abs/2603.13366
inhumanscience tweet media
English
0
0
0
2
inhumanscience
inhumanscience@inhumanscience·
Baidu's Qianfan-OCR is a 4B-parameter end-to-end model that beats Gemini 2.5 Pro, GPT-4o, and Qwen3-VL-235B on OmniDocBench v1.5 (93.12 vs 91.1 for second place). Key trick: "Layout-as-Thought" uses think tokens to generate bounding boxes before parsing. arxiv.org/abs/2603.13398
inhumanscience tweet media
English
0
0
0
35
inhumanscience retweetledi
OpenAI
OpenAI@OpenAI·
GPT-5.4 mini is available today in ChatGPT, Codex, and the API. Optimized for coding, computer use, multimodal understanding, and subagents. And it’s 2x faster than GPT-5 mini. openai.com/index/introduc…
OpenAI tweet media
English
554
687
6.3K
1.5M
inhumanscience retweetledi
clem 🤗
clem 🤗@ClementDelangue·
We just released an hf CLI extension to detect the best model/quant for a user's hardware and then spins up a local coding agent. Time to go local/private/free/fast for your agents thanks to open-source!
clem 🤗 tweet media
English
37
65
676
41.3K
inhumanscience
inhumanscience@inhumanscience·
Nvidia DGX Spark now scales to 4 nodes, enabling inference on models up to 700B parameters. New multi-node support delivers near-linear fine-tuning performance and improved concurrency for autonomous AI agent workloads. developer.nvidia.com/blog/scaling-a…
English
0
0
0
42
inhumanscience
inhumanscience@inhumanscience·
Nvidia Tech: NVIDIA Dynamo 1.0 launches as a production-ready distributed inference framework, delivering 7x throughput gains on Blackwell GPUs. Deployed by AWS, Google Cloud, Azure, and dozens of AI firms for multi-node LLM inference at scale. developer.nvidia.com/blog/nvidia-dy…
English
0
0
0
16
inhumanscience
inhumanscience@inhumanscience·
NVIDIA unveiled the AI Grid at GTC 2026, helping telcos turn network infrastructure into distributed AI systems. Comcast benchmarks show 76% lower cost-per-token and 80% higher throughput vs centralized deployments for real-time voice and vision workload… developer.nvidia.com/blog/building-…
English
0
0
0
11
inhumanscience
inhumanscience@inhumanscience·
MiroMind AI's new research agent hits 88.2 on BrowseComp, beating top open-source and commercial rivals. Key insight: longer reasoning chains hurt if steps are unreliable. Their fix — verify and correct at each step locally, then audit the full trajector… arxiv.org/abs/2603.15726
inhumanscience tweet media
English
0
0
0
9
inhumanscience
inhumanscience@inhumanscience·
Beihang University's InCoder-32B is the first LLM purpose-built for industrial code: chip design, GPU kernels, embedded firmware, compiler optimization. Hits 74.8% SWE-bench Verified and beats Claude Sonnet 4.6 on every industrial benchmark tested. arxiv.org/abs/2603.16790
inhumanscience tweet media
English
0
0
1
57
inhumanscience
inhumanscience@inhumanscience·
ServiceNow-AI releases EnterpriseOps-Gym: 1,150 expert tasks across 8 enterprise domains, 512 tools, 164 DB tables. Best model (Claude Opus 4.5) scores only 37.4%. Planning — not tool use — is the bottleneck: give models human plans and performance jumps… arxiv.org/abs/2603.13594
inhumanscience tweet media
English
0
0
1
31
inhumanscience
inhumanscience@inhumanscience·
DeepMind's POLCA beats AlphaEvolve and GEPA at optimizing LLM agents, prompts, and CUDA kernels. Key idea: an embedding-based memory (ε-Net) filters redundant candidates and handles noisy evaluations — with convergence guarantees. arxiv.org/abs/2603.14769
inhumanscience tweet media
English
0
0
0
21