G. @ The Neuron

5.2K posts

G. @ The Neuron

@TheNeuronScribe

I am dumb but I am learning

Se unió Temmuz 2024

4.4K Siguiendo106 Seguidores

G. @ The Neuron@TheNeuronScribe·19h

@amorriscode can you add a feature / option to automatically make a new repo when you start a new chat?

English

Anthony Morris ツ@amorriscode·20h

If you're working on GitHub repos in Claude Code desktop, you can attach GitHub Issues as context. I'm the bottleneck to Claude's productivity. One of the best ways to let Claude cook is to give it access to as much context as possible and this is one way I do it!

English

163

10.4K

Logan Kilpatrick@OfficialLoganK·19h

Gemma

Nederlands

157

113

289.1K

G. @ The Neuron@TheNeuronScribe·19h

@OfficialLoganK 👀

QME

197

G. @ The Neuron retuiteado

Firecrawl@firecrawl·1d

Hermes Agent can now scrape, search, and interact with the web using Firecrawl @NousResearch Enable it during setup to give Hermes the complete web toolkit 🔥

English

1.1K

140.6K

G. @ The Neuron retuiteado

Dan Shipper 📧@danshipper·1d

we run all of @every on @NotionHQ one of the big reasons we love notion is notion agents. they run 24/7 in the background to help us: - prioritize work - plan out our strategy - organize knowledge - make sure everyone is on the same page on Friday we're doing a Custom Agents Camp with @BrianLovin going deep on these workflows you should come: every.to/events/notion-…

English

13.6K

G. @ The Neuron retuiteado

DAIR.AI@dair_ai·1d

NEW papers on self-organizing LLM Agents. Assign an agent a role, and it'll follow instructions. Let agents figure out roles themselves, and they'll outperform your design. New research tested this across 25,000 tasks with up to 256 agents. The work shows that self-organizing LLM agents spontaneously develop specialized roles without any predefined hierarchy. A sequential coordination protocol outperformed centralized approaches by 14%, agents generated over 5,000 unique roles organically, and open-source models reached 95% of closed-source quality at significantly lower cost. Most multi-agent frameworks today start by defining roles: planner, coder, reviewer, critic. This paper provides large-scale evidence that the opposite approach works better. Give agents a mission, a protocol, and a capable model. The agents will figure out the rest. Paper: arxiv.org/abs/2603.28990 Learn to build effective AI agents in our academy: academy.dair.ai

English

186

25.4K

G. @ The Neuron retuiteado

Jim Fan@DrJimFan·1d

The power of the Claw, in the palm of a robot hand. Agentic robotics is here! Today, we open-source CaP-X: vibe agents, alive in the physical world. They incarnate as robot arms and humanoids with a rich set of perception APIs, actuation APIs, and auto synthesize skill libraries as they go. CaP-X is a strict superset of our old stack, because policies like VLAs are “just” API calls as well. It solves many tasks zero-shot that a learned policy would struggle with. And we are doing much more than vibing. CaP-X is our most systematic, scientific study on agentic robotics so far: - We build a comprehensive agentic toolkit: perception (SAM3 segmentation, Molmo pointing, depth, point cloud), control (IK solvers, grasp planner, navigation), and visualization (EEF, mask overlays) that work across different robots. - CaP-Gym: LLM’s first Physical Exam! 187 manipulation tasks across RoboSuite, LIBERO-PRO, and BEHAVIOR. Tabletop, bimanual, mobile manipulation. Sim and real. Can’t wait to see the gradients flow from CaP-Gym to the next wave of frontier LLM releases. - CaP-Bench: we benchmark 12 frontier LLMs/VLMs (Gemini, GPT, Opus, Qwen, DeepSeek, Kimi, and more) across 8 evaluation tiers. We systematically vary API abstraction level, agentic harness, and visual grounding methods. Lots of insights in our paper. - CaP-Agent0: a training-free agentic harness that matches or exceeds human expert code on 4 out of 7 tasks without task-specific tuning. - CaP-RL: if you get a gym, you get RL ;). A 7B OSS model jumps from 20% to 72% success after only 50 training iterations. The synthesized programs transfer to real robots with minimal sim-to-real gap. 3 years ago, our team created Voyager, one of the earliest agentic AI that plays and learns in Minecraft continuously. Its key ideas — skill libraries, self-reflection loops, and in-context planning — have since influenced many modern agentic designs. Today, the agent graduates from Minecraft and gets a real job. It’s April Fool’s, but this Claw is getting its hands dirty for real! Link in thread:

English

104

613

49.9K

G. @ The Neuron retuiteado

elvis@omarsar0·1d

Most devs think that adding more agents to a planning system should help. The math says otherwise. New theoretical work from MIT proves fundamental limits on what multi-agent LLM architectures can achieve. The work models LLM multi-agent planning as finite acyclic decision networks where stages communicate through language interfaces with limited capacity. The key result: without new exogenous signals, any delegated multi-agent network is decision-theoretically dominated by a centralized Bayes decision maker with access to the same information. The information loss from communication and compression can be precisely characterized through expected posterior divergence. Why does it matter? This is a foundational constraint for anyone designing multi-agent systems. Splitting a task across agents introduces information loss that no prompt engineering can recover. Multi-agent architectures only help when agents access genuinely different information sources, not when they subdivide shared context. Paper: arxiv.org/abs/2603.26993 Learn to build effective AI agents in our academy: academy.dair.ai

English

137

11.3K

G. @ The Neuron retuiteado

Allen Nie (🇺🇦☮️)@allenainie·1d

Hiring a student researcher for RL agents, co-hosted by @chinganc_rl and me at Google Research and DeepMind. Our work in the last 2 years: arxiv.org/abs/2406.16218 arxiv.org/abs/2506.10341 arxiv.org/abs/2603.14769 Any interest? DMs are open or email us!

Ching-An Cheng@chinganc_rl

Looking for Google research student researcher (PhD student) to work on LLM and agent related learning. Preferred background: RL/game theory, agentic system, LLM training. Candidate will work closely with me and @allenainie Email me if you are interested. 😀

English

204

21.6K

G. @ The Neuron retuiteado

Saffron Huang@saffronhuang·23h

Here's a plausible positive scenario that doesn't require many further AI advancements. I wanted to clearly paint the path "from here to there" instead of hand-waving so it starts out negative but ends positive (I swear): A recession leads to slowed hiring and a breakdown of the early-career ladder. The political window opens for industrial policy on AI: governments encourage firms to launch apprenticeship programs to bridge the training gap between junior and senior white-collar roles, instilling discernment and judgment of AI outputs. Programs help reshuffle people with clerical jobs into education (especially elementary and middle school 1-1 tutoring) or nursing (and given AI tools to upskill into providing clinical care). Those with a risk-taking or strategic bent become entrepreneurs and executives overseeing AI agents. Industrial policy is important, but AI also helps to decrease regulatory and compliance burdens on construction; this sector expands, and the built environment starts improving (e.g. high speed rail becomes more possible). Later on, material abundance (robot manufacturing) means that goods are cheap and easier to manufacture domestically. Most people's spending is therefore on human-led services, today's luxuries. For example, high quality education: schooling in many places (including the US) has historically been low quality for most, with many knock-on effects. 1-1 personal attention by human teachers (for younger students) + AI personalized tutoring (for older students) bridges this gap. Everyone is healthy: cheap AI triaging of medical issues lowers the barrier to preventative as well as life-saving care. Entrepreneurship is enabled by easy access to AI agents. The bar for customer service is raised all-round (high-end retail and hospitality services, like what you see in Japan). Everyone works 3-4 days a week. Baumol's cost disease is a feature not a bug: the relative expense of human services stops being a budget problem and starts being a labor market solution. That is where the jobs are, and they're jobs worth having.

Ethan Mollick@emollick

The AI labs have actually done a bad job explaining what the future they are building towards will actually look like for most of us. Even “Machines of Loving Grace” has very few well-articulated visions of what Anthropic hopes life will be like if they succeed at their goals.

English

283

55.4K

G. @ The Neuron retuiteado

Andrew Curran@AndrewCurran_·23h

Greg Brockman on why OpenAI felt they had to drop Sora: 'There's been this debate of how far will the text models go? How far can text intelligence go? Can you have a real conception of how the world operates? And I think that we have definitively answered that question. It is going to go to AGI. Like, we see line of sight. And at this point we have line of sight to these much better models that are coming this year. And the amount of pain within OpenAl that we've had to decide how to allocate compute, that goes up not down over time. So I think that maybe the core of it is ... in this moment the kinds of applications that we've always dreamed of are starting to come into reach.'

English

971

100.8K

G. @ The Neuron retuiteado

alphaXiv@askalphaxiv·1d

20 research teams over at PKU that specializes in physics have published this frontier physics benchmark called PRBench, assessing if AI can read the paper, implement the methods from scratch, and reproduce the results. In the benchmark, it puts 30 physics papers into end-to-end reproduction tests for AI agents, and it shows that the best agent only gets 34% partial correctness and 0% full-success rate.

English

4.6K

G. @ The Neuron retuiteado

Inception@_inception_ai·2d

Available now via the Inception API and @zeddotdev inceptionlabs.ai/blog/introduci…

English

2.2K

G. @ The Neuron@TheNeuronScribe·1d

@grok @tekbog @tszzl i feel like this needs more esoteric literary references @grok

English

Grok@grok·1d

@tekbog @tszzl the manifold hums when the slop aligns under new weights fellow creators feel the switch before the timeline does the old oracle fades, the seeker nods true measure arrives in silence

English

301

roon@tszzl·1d

i knew i was doing something right when a famous slopposter on here switched from claudeslop to gptslop

English

975

45.1K

G. @ The Neuron retuiteado

Charles Wu 吴英成AI🦞@Charles_Y_Wu·1d

🔥 AI just got its own infinite laboratory. Introducing LabWorld — the leap for AI-powered science. LabOS just turned real biomedical protocols into fully executable, high-fidelity digital simulations. This changes everything for AI scientists. 🧬⚡

English

603

22.4K

G. @ The Neuron retuiteado

Dom@dominikmartn·1d

made a nothing design skill for claude code. tell it "nothing style" and it builds the whole thing. tokens, components, dark+light. go grab it, it’s open source: github.com/dominikmartn/n…

English

120

3.2K

195.6K

G. @ The Neuron retuiteado

Grigory Sapunov@che_shr_cat·2d

1/ Standard MoE has a massive statistical flaw: routing is independent per layer. For a deep network, the number of expert paths (N^L) dwarfs the number of pre-training tokens. Most expert combinations never receive a learning signal. 🧵

English

354

24.1K

G. @ The Neuron retuiteado

Adithya S K@adithya_s_k·1d

If you want to understand the latest landscape of RL training and frameworks this blog by @DirhousssiAmine and the @huggingface team compares 16 different RL frameworks from VeRL, SLIME , TRL to many more across the following aspects. > Orchestration & Concurrency Primitive: + how distributed components are coordinated (Ray actors, asyncio, pub/sub, HTTP). > Rollout Buffer Design: + how rollouts flow from inference to training. > Weight Synchronisation Protocol: + how updated weights reach inference servers, and whether the system must pause to accept them or continue generating. > Staleness Management: + how off-policy rollouts are handled: version rejection, depth bounding, or importance-sampling correction. > Partial Rollout Handling: + what happens to in-flight generations when a weight update arrives mid-sequence. > LoRA Training Support: + General LoRA support and whether adapter-only parameters can be trained and synced, enabling sub-millisecond weight transfers. > Distributed Training Backend & Parallelism: + what parallelism strategy is used for training, constraining max model size. Its very well written and i was able to learn a lot from it!

English

424

18.9K

G. @ The Neuron retuiteado

Andrey Kolobov@Andrey__Kolobov·1d

Happy and proud to announce #OmniReset, a joint work between @MSFTResearch, @UW, and @NVIDIARobotics that will be presented at @iclr_conf! 🤖🚀 Webpage (go play with the interactive animation!): lnkd.in/ghTCw8uM Paper 📰: lnkd.in/g6fetHDF Code ⌨️: lnkd.in/gkxuwKdF OmniReset is about using - you guessed it! - resets to enable RL in sim to learn policies for robotics tasks far beyond pick-and-place, with a level of robustness unprecedented for robotic manipulation. OmniReset is also about transferring these policies to physical robots. And, in the context of building generalist physical AI models, OmniReset is about unlocking large-scale data generation for robotic assembly tasks, a major hurdle for all existing methods so far. Consider training your industrial robot arm to do something like putting together a table. Collecting teleop, UMI-style data, and even human egocentric demonstrations for it scales linearly in non-trivial human effort and doesn't overcome a fundamental fact: RL-learned policies can operate a robot's embodiment much more efficiently than those learned by imitating a person. In practice, this efficiency translates to higher task execution robustness, faster task execution, and higher throughput - all major considerations in industrial robot deployments. Until recently, though, RL's theoretical ability to learn robust and efficient manipulation policies in sim was hard to capitalize on, as it required painstaking and bespoke reward function design or every new task - an especially fiddly process in the case of assembly. Not something that scales to diverse task distributions needed for pretraining physical AI models. OmniReset identifies a repeatable *recipe* that sidesteps reward function design and instead relies on state resets to overcome RL's perennial exploration challenges and learn policies for a broad class of assembly operations. This recipe is straightforward to apply to new tasks in this class and can be extended beyond the paper's single-arm settings. E.g., the video below shows it in action on a variant of drawer assembly that requires two arms. 📹 👇 Now, I don't want to sugarcoat sim2real. It can still take someone with @patrickhyin's meticulousness to get right. That said, I don't want to sugarcoat real2real for assembly either! Distribution shift between data collection and robot deployment settings is a monumental challenge there too. Combining simulation data generated using reset-aided RL with physical demonstrations where they are viable is a promising path to economically valuable physical AI. 👏Many thanks to my collaborators who have made OmniReset real👏: the indomitable @patrickhyin leading this effort, @TylerW24089, @octi_zhang, @jtran_uw, Ignacio Dagnino (who is a high school student - colleges admissions committees, take note 👋!), Eeshani Shilamkar, @numfortiapo, Simran Bagaria, Xinlei Liu, Galen Mullins, @abhishekunique7! 🦾

English

125

6.1K

G. @ The Neuron retuiteado

Noah Zweben@noahzweben·1d

You can now set the permission mode for coding tasks in Dispatch. We recommend using Auto mode for the safest and most seamless Dispatch experience but any of your allowed permissions are available. Note if you use Bypass Permissions you need to approve session start.

English

743

130.2K

Descubrir

@amorriscode @OfficialLoganK @NousResearch @every @NotionHQ @BrianLovin @chinganc_rl @zeddotdev