George Morgan

2.5K posts

George Morgan banner
George Morgan

George Morgan

@vr4300

CEO @symbolica

San Francisco | London Katılım Kasım 2008
305 Takip Edilen3K Takipçiler
Sabitlenmiş Tweet
George Morgan
George Morgan@vr4300·
I'm extremely proud to share that the @symbolica research team has achieved a monumental result in program synthesis. We have been able to reach SOTA on ARC-AGI-2 (85.28% @ $6.94/task) using @agenticasdk as a neurosymbolic program synthesis engine. This engine is not ARC specific. It is 350 lines of highly generic code that can be readily adapted to any other task. This is what sets this result apart from other bespoke models or agent system designs that have historically performed well on ARC. This result is a clear demonstration that the path forward to improve the reasoning capabilities of AI systems is by leveraging structure: types, composition, and program execution. Symbolic AI has laid dormant for decades but the field is on the precipice of making one of the greatest comebacks in history. Blog: symbolica.ai/blog/arcgentica Code: github.com/symbolica-ai/a…
Agentica@agenticasdk

We set a new ARC-AGI-2 SotA: 85.28% using an Agentica agent (~350 lines) that writes and runs code.

English
15
37
432
52.3K
Morgan
Morgan@morganlinton·
The cofounder and CTO of Perplexity, @denisyarats just said internally at Perplexity they’re moving away from MCPs and instead using APIs and CLIs 👀
Morgan tweet media
English
329
380
5.1K
2.8M
George Morgan retweetledi
Millin Gabani
Millin Gabani@trillhause_·
Wow, this has to be the most underrated article in agent world right now. Completely redefines how we define agents. People may catch up to this implementation of agents in 6 months. Extremely promising.
Millin Gabani tweet media
English
17
21
447
47K
George Morgan
George Morgan@vr4300·
The research team did indeed put this together. However, category theory is applied to build our own foundation models. Agentica is "category theory inspired" insofar as it was designed by category theorists and leverages types, composition, program synthesis, etc but it's not part of the same research track.
English
2
0
0
105
Faez Shakil
Faez Shakil@f_aezs·
@vr4300 Are you using any of the previous cat theory work for this
English
1
0
1
101
⭕
@Donogzs·
@vr4300 @fchollet @agenticasdk Does your agent scaffold allow the RLM to define env variables for the REPL? Like in such a way that the RLM or subRLM can reference in later turns? I imagine that would be useful
English
1
0
0
75
Agentica
Agentica@agenticasdk·
We have now solved all publicly available ARC-AGI-3 puzzles.🧩
English
40
76
1.1K
206.9K
Jake
Jake@jake_researcher·
@agenticasdk Does this mean ARC-AGI is now saturated? What's the next benchmark that's resistant to overfitting?
English
2
0
2
3.6K
danhelo ♱
danhelo ♱@danthaeon·
@vr4300 yep great for well documented pip packages. But I mean for obscure large codebases with hard to parse functionalities. something like agentica agent clusters 3-4 of the functions into an agentic function, then builds an agent on top of those. It does the developers' job.
English
1
0
1
49
George Morgan
George Morgan@vr4300·
🥹
danhelo ♱@danthaeon

@agenticasdk is the future and one of those "mathematical beauty" moments that are so rare for programming. the real abstraction layer for agentic engineering. it just makes sense.

ART
1
1
9
1.4K
George Morgan
George Morgan@vr4300·
@danthaeon You can currently plug any pip package into Agentica and it just works! The agent will read the code graph and figure out what to do. Is that what you mean?
English
1
0
1
40
danhelo ♱
danhelo ♱@danthaeon·
@vr4300 guys please run agentica agents that build agentica optimized tooling for common SDKs maybe even make it an abstraction so it's "plug and play" for any SDK you bring. don't know if the model's "taste" is there yet but it shouldn't be too hard to do
English
1
0
0
48
George Morgan
George Morgan@vr4300·
1. Yes it sees each game only once. 2. It is not yet human level. We observe that it performs fairly close to human baseline on almost all of the levels but it seems to get stuck on a few of them (as shown in this video) before eventually recovering. This totally blows the action budget. Our scores are below. We haven't yet optimized the harness. I am confident that we can drastically improve the performance before the rest of the puzzles come out. We will be sure to rerun it and release our official final scores publicly when they do! • ft09, 344 actions, 39.15% • vc33, 2092 actions, 42.87% – L5: 1604 actions vs 92 baseline • ls20, 3703 actions, 69.77% – L7: 3240 actions vs 82 baseline
English
4
10
107
6.1K
Justin Waugh
Justin Waugh@JustinWaugh·
@agenticasdk Congrats! One of my favorite plots was the Level (Y axis) vs. Turns (x axis) that they released early of humans. How does your system compare on those? (from the video above, looks like a lot of "random walk") I'm also curious of total cost and total time (in wall clock time)?
English
3
0
12
5.8K
François Chollet
François Chollet@fchollet·
@agenticasdk 1. Is it seeing each game only once? (it is of course possible to brute-force any game given infinite trials, but that is not the goal here) 2. Is it using a number of actions per game comparable to what humans need? (upon seeing the game for the first time)
English
9
6
276
38.9K
Chrys Bader
Chrys Bader@chrysb·
unpopular (maybe?) opinion: MCP is dead in the water @openclaw has shown me that api & cli will win. every MCP server you connect loads its tool definitions into your context window. name, description, parameter schema, all of it. connect 10 servers with 5 tools each and you've burned 50 tool definitions worth of tokens before your conversation even starts. context bloat will never be a good thing - performance-wise or economically. i assume this is why @steipete left it out of @openclaw. the "exec" tool paired with on-demand skills is all you need. it can run any command invented since the beginning of computers. a resurgence of glory for ancient, but powerful tools like curl, sed, awk, grep. command line tools once mastered by the greats, but long forgotten and buried underneath abstractions developed for us lesser mortals. now available to us all, piloted by the smartest models on earth. every founder gets their own mass army of greybeards. the inertia required for MCP adoption, imo, is too great to overcome the momentum @openclaw has breathed into api + cli + skills. the common defenses people bring up: • "MCP gives you typed schemas and validation" — so does a well-documented CLI • "MCP gives you explicit permissions" — so does a sandbox with an allowlist • "MCP is a standard" — a standard that scales poorly is still a standard that scales poorly lastly, i've heard many MCP servers are just wrapping existing APIs - that kind of redundancy and unnecessary indirection should be a red flag. so, let's drop it and redirect our efforts into cli tools & apis with accompanying skills.
Chrys Bader tweet media
English
283
89
1.6K
330.5K
gooby
gooby@gooby_esq·
Made it to level 3 on LS20 in 203 steps (ARC AGI 3) with a DSPy RLM based multi agent system (agent is still working as we speak). Making good progress on this tbh. I've got a main RLM agent that solves the game board and three reflection agents that reflect on the game state and past history after every move or every batched set of moves. None of the original instructions are specific to LS20, they are general to the arc agi 3 format. One reflection agent inspects the game visually (normal dspy predict module with image input), one comments and creates a knowledge base for game mechanics and game board items (another RLM that gets the full game state history and other metadata about the turns), and one comments and creates a knowledge base specifically on the REPL history and suggests efficiency and improvements for how to interact in the RLM REPL for this specific game (basically its building up a sort of skill.md for this specific game for how to effectively use the REPL). I made numpy and panda availble in the REPL as well. basically what i've built is a custom dspy optimizer that rewrites the solver's instructions based on what we've learned about the game in between each batched set of moves. I'm trying to set up the custom live viewer I made so I can make it public and you can tune in and watch the runs. Using gemini 3.1 under the hood for everything. (I have gemini credits). If @OpenAI or @AnthropicAI want to throw me some credits I'll do try with those models too 😅
gooby tweet media
English
6
5
114
13.6K
dan
dan@irl_danB·
@agenticasdk @vr4300 is this the same RLM with slifhtly different instructions? nice work I haven’t cracked l2 Level 3 yet
English
1
0
6
2K
Agentica
Agentica@agenticasdk·
ARC-AGI-3 ft09 solved in 346 steps
English
11
30
339
55.5K