Robots Digest 🤖

2K posts

Robots Digest 🤖

Robots Digest 🤖

@robotsdigest

Follow @RobotsDigest for latest in Robotics, Humanoids, and Hardware + AI.

GET POSTS IN INBOX: Katılım Ağustos 2025
0 Takip Edilen5K Takipçiler
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
PhysX-Omni treats those as first-class outputs of generation rather than post-processing annotations. For embodied systems, meshes have: • support constraints • material behavior • contact dynamics • articulated structure • affordances for agents
English
1
0
1
74
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
Most 3D generative pipelines produce assets that look correct but fall apart the moment physics enters the loop. PhysX-Omni targets the missing layer: generating simulation-native 3D assets with geometry, articulation, material properties, and functional semantics jointly modeled. The output is usable for not just rendering demos, but also: • robotics simulators • manipulation tasks • physically grounded scene synthesis • embodied training pipelines
English
2
8
15
624
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
SCRIPT uses a two-stage training pipeline: • Stage I: flow-matching pretraining on large motion datasets • Stage II: PPO-based RL post-training inside physics simulation RL fine-tuning improves physical stability, text alignment, and long-horizon execution by injecting stochastic noise into diffusion sampling and optimizing hybrid physical + semantic rewards. Model scales from 206M to 1.23B parameters, pushing diffusion policies toward foundation-model-scale humanoid control.
English
1
0
5
332
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
SCRIPT introduces a scalable diffusion-policy framework for language-driven humanoid control in physics simulation. Instead of generating offline motion clips, it trains a closed-loop policy that directly controls a humanoid while staying physically stable. The core idea is JAST-DiT, a diffusion transformer that jointly models actions, body states, and text tokens through shared attention. The policy predicts future action-state chunks, executes only the first action, then replans continuously in a receding-horizon loop.
English
2
8
46
2.6K
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
At inference, all reconstruction and prediction heads are removed. The robot keeps only a compact GaussianDream prefix that conditions action generation, avoiding Gaussian rendering, video rollout, or planners during execution. Results on LIBERO, RoboCasa, and real robots show strong gains in spatial reasoning, pick-and-place precision, and long-horizon manipulation while remaining lightweight enough for closed-loop control.
Robots Digest 🤖 tweet media
English
1
0
4
512
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
GaussianDream argues current Vision-Language-Action (VLA) robot models mainly imitate actions from videos, but do not explicitly model how the environment will change after interaction. Existing 3D VLAs add geometry like depth or point clouds, yet mostly capture only the current scene. World models can predict futures, but usually rely on expensive video rollouts or latent simulations that are too slow for real-time robotic control.
Robots Digest 🤖 tweet media
English
2
11
68
3.4K
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
What is impressive here is not just exploration performance but transfer. After curiosity pretraining on HM3D, the same RGB-only policy can be fine-tuned for: apple picking image-goal navigation unseen AI-generated worlds And it outperforms agents trained from scratch on task rewards alone. No explicit maps. No planners. No hierarchical exploration modules. Just: persistent world reconstruction + long-context memory + curiosity-driven RL. A strong argument that scalable exploration may emerge from better memory rather than more handcrafted navigation structure.
English
2
0
6
577
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
Most curiosity-driven RL agents fail in long-horizon exploration because they forget. They revisit the same places, treat them as novel again, and collapse into repetitive loops. This paper fixes that with two ideas: 1)A persistent world model using online 3D Gaussian Splatting 2)A transformer agent with episodic memory over RGB history The agent explores purely from curiosity rewards derived from reconstruction error between predicted and observed views. No task rewards, maps, depth sensors, or localization at test time. Result: emergent behaviors like corridor traversal, backtracking, doorway seeking, and strong zero-shot generalization across photorealistic 3D worlds.
English
3
12
62
3.3K
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
The performance numbers: • low-microsecond control loops on CPU • up to 100M+ dynamics evaluations/sec on GPU • 2–5× faster Python controllers than Pinocchio/MuJoCo bindings • near-MJX/BRAX GPU throughput with much faster JIT compile times
Robots Digest 🤖 tweet media
English
1
0
3
421
Robots Digest 🤖
Robots Digest 🤖@robotsdigest·
Robot learning stacks are hitting an infrastructure bottleneck. Classical rigid-body dynamics engines were built for single robots and recursive CPU execution — not massive parallel simulation, differentiable control, or accelerator-native learning pipelines. FRAX rethinks rigid-body dynamics entirely in JAX.
Robots Digest 🤖 tweet media
English
2
7
77
3.9K