Avijit Ghosh
5.7K posts

Avijit Ghosh
@evijit
Lead Technical AI Policy Researcher @huggingface 🤗 . Working on @evaluatingevals and @huggingscience


New: The charitable foundation tied to Nvidia CEO Jensen Huang and his wife, Lori Huang, has agreed to rent GPUs from CoreWeave. The Huang Foundation plans to donate the GPU hours to “university and other non-profit research institutes to develop open science and AI research.” It has donated $108 million in “GPU compute time grants” to date. story: theinformation.com/briefings/nvid…




High performance #RISCV (RVA23) K3 SBC coming soon! Up to 32GB DDR5, 60T int4 NPU, able to run Qwen3.5 35B-A3B @ 15tps~ Support Ubuntu2604 ! Vote for your preferred config and get early access when it launches next month! sipeed.com/k3/vote


Most open VLA models are not really open. They release weights and call it reproducibility. The training data is withheld. The training code is withheld. The deployment pipeline is withheld. You get a checkpoint file and a paper. You cannot verify the data quality. You cannot reproduce the training run. You cannot adapt it to your robot without starting from scratch. Researchers from Allen AI released MolmoAct2, the first VLA that is open. Weights, training code, complete datasets. • MolmoAct2-BimanualYAM Dataset: 720 hours of teleoperated trajectories across 28 real-world tasks, the largest open bimanual dataset available. • MolmoAct2-SO100/101 Dataset: 38,059 episodes curated from 1,222 public datasets. • MolmoAct2-DROID Dataset: Quality-filtered Franka trajectories with re-annotated instructions. The system deploys out-of-the-box on three platforms spanning the low-to-medium cost range. Bimanual YAM, SO-100/101, DROID Franka. No additional fine-tuning required. The backbone is Molmo2-ER, trained on a 3.3M sample corpus for embodied reasoning: metric distance estimation, free space detection, cross-view object tracking, scene geometry reconstruction. The skills general-purpose VLMs do not test. Results Look Promissing 63.8% average across 13 embodied reasoning benchmarks. Outperforms GPT-5 and Gemini Robot ER-1.5 on 9 of 13 tasks. Outperforms π0.5 across 7 simulation and real-world benchmarks. The architecture uses per-layer KV conditioning between the VLM and a flow-matching action expert trained with DiT-style transformers. This bridges discrete reasoning tokens to continuous control trajectories while exposing the attention state the VLM itself uses. This is the deployment model NeuraCore advocates for: standardized ecosystems with reproducible training data. Custom infrastructure for every embodiment is technical debt that prevents fleet scaling. Nice work from @hq_fang, @DJiafei, and the team at @allen_ai




Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

Love seeing Naomi Osaka honor the CLRS Algorithms textbook at this year's Met Gala







