fils

1.2K posts

fils

@fils

Data Guy and wave man 浪人 Hobo Programmer, bouncing from free service to free service then moving on. Nice software you have there... do you have a free tier?

Iowa Katılım Nisan 2008

1.1K Takip Edilen346 Takipçiler

fils@fils·6h

Job Opportunity: Strategic Consultant, Open Science, Data Resilience (American Geophysical Union - AGU) Enjoyed being a part of the related meeting in Berlin on this topic by AGU. Glad to see them make this position available to support the work. paycomonline.net/v4/ats/web.php…

English

fils retweetledi

alex zhang@a1zhang·2d

Some awesome initial experiments on training small RLMs :) A direction I think will be super super important moving forward for fully seeing the capabilities of RLMs vs. traditional agentic systems

alphaXiv@askalphaxiv

Reinforcing Recursive Language Models Can a 4B model learn to recursively call itself to answer hard long-context questions? We RL fine-tuned a small model to behave as a native RLM. On evidence selection across scientific papers, our 4B RLM matches Sonnet 4.6 in quality while running significantly faster and cheaper.

English

292

27.5K

fils retweetledi

alex zhang@a1zhang·4d

how did I miss this! related to training RLMs :)

Apurva Gandhi@apurvasgandhi

Sub-agents are a promising inference-time scaling primitive: • Expand an agent's working memory • Divide-and-conquer hard problems • Solve problems faster with parallel execution But how do we train a model to best take advantage of sub-agents and make sure we get these benefits? Very excited to release RAO: Recursive Agent Optimization. RAO is an end-to-end reinforcement learning approach for training LLM agents to spawn, delegate to, and coordinate with recursive copies of themselves (that can themselves spawn other agents) - turning recursive inference into a learned capability. 1/10

English

347

49.9K

fils@fils·4d

Last Starfighter looses job to AI! A tragic story, all too common today. The last Starfighter, High schooler Alex Rogan has lost his job to AI. Read how Alex will be replaced as Google's DeepMind announces plan to train AI on player actions in quarter-million-player MMORPG Eve Online! Is no job safe?! tomshardware.com/tech-industry/…

English

fils@fils·4d

So isn't "Code Mode" very analogous to RLM? I you take RLM via DSPy for example and then make tools and pass them in, (ref: dspy.ai/api/modules/RL… ) it seems very similar. Is Code Mode part of Claude Code? If so is it fair to say this the same destination via two routes? Refs: * arxiv.org/abs/2512.24601 * youtube.com/watch?v=5RAFKE…

YouTube

English

179

Akshay 🚀@akshay_pachaar·5d

The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: - Playwright MCP eats 13.7K tokens - Chrome DevTools MCP eats 18K - A 5-server setup burns 55K tokens before any work The defenders pushed back: - CLIs break on multi-tenant apps - No typed contracts, so the agent guesses at outputs - On unfamiliar APIs, agents waste turns parsing text Both sides were arguing about the wrong thing. In November 2025, Anthropic published "Code execution with MCP" and reframed it from first principles. The problem was never the protocol. It was the habit of dumping every tool's full description into the model's context the moment a session starts. Add the data those tools return, passed through the model on every step, and a single workflow can balloon to 150K tokens. Most of which the model never needed. The fix is to flip the model's job. Instead of the model calling tools through its context, the model writes code that calls tools through a runtime. The runtime is where tools live. The model only sees what it imports. In Anthropic's example, a Google Drive transcript flows into a Salesforce CRM update. The old way loaded both tool schemas and piped the entire transcript through the model twice. The new way is ten lines of TypeScript that import what they need. Same task, 2K tokens. A 98.7% drop. Cloudflare pushed the idea to its limit. They collapsed their entire 2,500-endpoint API from 1.17M tokens of schemas down to 1K tokens, by exposing just two functions: search and execute. The agent writes code that searches the catalog, then executes only what matches. The new pattern has a name: Code Mode. It is a runtime where the agent writes code that mixes two primitives. Bash, for anything with a binary already installed like git or curl. Typed module imports, for proprietary APIs where the type signatures load only when the agent actually imports the tool. That second part is the unlock. Types travel with imports, so the agent gets a strict contract for the tools it picks, and pays nothing for the ones it skips. MCP's typed contracts plus CLI's lazy loading, in one runtime. The agent picks per task. "MCP is dead" was the wrong takeaway. Anthropic just reported 300M MCP SDK downloads, up from 100M at the start of the year. The protocol is not dying. It is the fastest growing piece of agent infrastructure right now. What died was loading every tool upfront. That was always a bad idea. If you are building agents in 2026, the rule is simple. Tool definitions belong in code, not in context. The model writes a few lines that call them. The runtime does the rest. That is what the debate was actually about.

Akshay 🚀@akshay_pachaar

x.com/i/article/2053…

English

476

64.3K

fils@fils·7 May

Hugging Face for Science at huggingscience.co This is very interesting. So I am exploring at what an agent optimized data repository looks like. So finding "Hugging Science" by Hugging Face was interesting. It is, so they say, a site optimized for your AI agent, and supports quite a few major domain specific data formats with large file support (huggingface.co/docs/datasets/…). They have projects to get involved with, design challenges ( huggingscience.co/#/getting-star… ) etc. I don't see many geo-science datasets here yet. A call out to my community I guess. Related paper: AI for scientific discovery is a social problem ( sciencedirect.com/science/articl… ) Is llms.txt still a thing?: huggingscience.co/llms.txt

English

fils retweetledi

Sooraj@iAnonymous3000·1 May

x.com/i/article/2050…

ZXX

5.4K

fils retweetledi

Gephi@Gephi·5 May

📣 Big news: We’re launching Gephi 0.11! 🤩 Download this new version and spread the word! ℹ️ Learn more here: gephi.wordpress.com/2026/05/05/gep… 🖥️ and download Gephi 0.11 on gephi.org

English

9.5K

fils retweetledi

Omar Khattab@lateinteraction·5 May

Wow, it's already May 5th. Don't miss the early-bird registration TODAY for the first ACM conference on AI systems @CAISconf. CAIS will have a packed program of really exciting keynotes, paper presentations, workshops, and demos. See you in San Jose in late May!

English

140

8.9K

fils@fils·4 May

Nice short take on MCP for SPARQL resources from sparna.fr. I like their core 3 items. I wonder if a few example of type to type valid paths might also be added. I use that to help generate SPARQL. However, their SHACL elements might do all that and more! So, I just need to try this. Looks really nice! sparna.fr/en/posts/mcp-p…

English

fils@fils·3 May

Transforming Research Visibility with RAiD at Oak Ridge National Laboratory Interesting blog post. Also fascinated with the Acorn CLI mentioned in this article: acorn.ornl.gov lyrasis.org/transforming-r…

English

fils@fils·3 May

Workshop "Establishing a knowledge graph community in biomedical science" Registration deadline: May 15th 2026 biocypher.org/community/2026…

English

fils@fils·1 May

So, you could chat with Andrew Ng: andrewng.org/ai-andrew But, you really want Eliza! telefonicatech.com/en/blog/eliza-… en.wikipedia.org/wiki/ELIZA 1965! That's what I'm talking about!

English

fils@fils·1 May

10 years of FAIR! 🙌 From the site: To celebrate this milestone, Scientific Data invites researchers to submit manuscripts related to FAIR-aligned infrastructure, policy, or standardisation to this collection. August 14th deadline nature.com/collections/ee…

English

fils retweetledi

Sam Hogan 🇺🇸@samhogan·30 Nis

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

English

124

1.4K

127K

fils@fils·23 Nis

So here is the paper in Nature: nature.com/articles/s4158… I doubt I can even remotely comprehend it. However, I'm rather certain this is how it all went wrong in the Doom games! :) See: doom.fandom.com/wiki/Argent_En…

TheNewPhysics@CharlesMullins2

🚨 BREAKING: A new Nature paper may have opened a path toward electromagnetic fields approaching the Schwinger limit. Using efficiency-optimized relativistic plasma harmonics, researchers report conditions needed for coherent harmonic focusing potentially compressing energy in space and time to extraordinary intensities. If this scales, it could help probe: • photon-photon scattering • quantum vacuum structure • strong-field QED This may be more than a laser advance. It may be a new way to study whether vacuum itself has hidden structure. Could coherent harmonic focusing become the next leap after chirped pulse amplification? Follow me for more frontier physics.

English

fils retweetledi

Senzing, Inc.@senzing·21 Nis

⚡ Drowning in data but starving for context? This one's for you. The authors of "Bridging Knowledge, Data, and AI" join @pacoid's #GraphPowerHour this Friday to bring the semantic layer from theory into practice. 📅 April 24 | 2:00 PM ET 🔗 hubs.li/Q04cYgmZ0

English

223

fils retweetledi

isaac 🧩@isaacbmiller1·21 Nis

DSPy 3.2.0 is out! Here are a few highlights: - dspy.RLM improvements around parsing, tool execution, and failure recovery. Expect greater reliability in the bridge between Python and Deno. - @MaximeRivest is leading an ongoing effort to decouple DSPy from LiteLLM. This release has the first interface improvements in this direction - Input fields warn on type mismatches. Passing a value that doesn't match a signature's declared type now logs a warning by Michael Isaac - BetterTogether Allows Chaining Optimizers by @dilarafsoylu. You can chain multiple GEPA runs together, or combine prompt optimization and fine tuning. Thank you to all who contributed! See the full release notes below for more details.

English

315

25.8K

fils retweetledi

Raymond Weitekamp@raw_works·20 Nis

x.com/i/article/2046…

ZXX

511

64.1K

fils retweetledi

Raymond Weitekamp@raw_works·18 Nis

crazy preliminary results from qwen 3.5 last night: Preliminary — Qwen3.5 + dspy.RLM on LongCoT-Mini: 27B lands at #2 (33%) — behind only GPT 5.2, +11pp ahead of Gemini 3 Pro. 9B lands at #4 (17%) — still beats Sonnet 4.5. RLMs unambiguously SOTA on this, more soon!

Raymond Weitekamp@raw_works

ok so the default DSPy.RLM is literally going to destroy this benchmark before the end of the day. running now for sonnet 4.5... 🏆 Scoreboard (live) RLM: 90/94 (95.7%) Vanilla: 0/94 (0.0%) anyone want to pay for the opus run? 😉

English

208

20.9K

Keşfet

@CAISconf @a1zhang @lateinteraction @pacoid @MaximeRivest @dilarafsoylu @elonmusk @BarackObama