fils

1.2K posts

fils banner
fils

fils

@fils

Data Guy and wave man 浪人 Hobo Programmer, bouncing from free service to free service then moving on. Nice software you have there... do you have a free tier?

Iowa Katılım Nisan 2008
1.1K Takip Edilen346 Takipçiler
fils
fils@fils·
Job Opportunity: Strategic Consultant, Open Science, Data Resilience (American Geophysical Union - AGU) Enjoyed being a part of the related meeting in Berlin on this topic by AGU. Glad to see them make this position available to support the work. paycomonline.net/v4/ats/web.php…
English
0
0
0
20
fils retweetledi
alex zhang
alex zhang@a1zhang·
Some awesome initial experiments on training small RLMs :) A direction I think will be super super important moving forward for fully seeing the capabilities of RLMs vs. traditional agentic systems
alphaXiv@askalphaxiv

Reinforcing Recursive Language Models Can a 4B model learn to recursively call itself to answer hard long-context questions? We RL fine-tuned a small model to behave as a native RLM. On evidence selection across scientific papers, our 4B RLM matches Sonnet 4.6 in quality while running significantly faster and cheaper.

English
8
37
292
27.5K
fils retweetledi
fils
fils@fils·
Last Starfighter looses job to AI! A tragic story, all too common today. The last Starfighter, High schooler Alex Rogan has lost his job to AI. Read how Alex will be replaced as Google's DeepMind announces plan to train AI on player actions in quarter-million-player MMORPG Eve Online! Is no job safe?! tomshardware.com/tech-industry/…
fils tweet media
English
0
0
0
29
Akshay 🚀
Akshay 🚀@akshay_pachaar·
The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: - Playwright MCP eats 13.7K tokens - Chrome DevTools MCP eats 18K - A 5-server setup burns 55K tokens before any work The defenders pushed back: - CLIs break on multi-tenant apps - No typed contracts, so the agent guesses at outputs - On unfamiliar APIs, agents waste turns parsing text Both sides were arguing about the wrong thing. In November 2025, Anthropic published "Code execution with MCP" and reframed it from first principles. The problem was never the protocol. It was the habit of dumping every tool's full description into the model's context the moment a session starts. Add the data those tools return, passed through the model on every step, and a single workflow can balloon to 150K tokens. Most of which the model never needed. The fix is to flip the model's job. Instead of the model calling tools through its context, the model writes code that calls tools through a runtime. The runtime is where tools live. The model only sees what it imports. In Anthropic's example, a Google Drive transcript flows into a Salesforce CRM update. The old way loaded both tool schemas and piped the entire transcript through the model twice. The new way is ten lines of TypeScript that import what they need. Same task, 2K tokens. A 98.7% drop. Cloudflare pushed the idea to its limit. They collapsed their entire 2,500-endpoint API from 1.17M tokens of schemas down to 1K tokens, by exposing just two functions: search and execute. The agent writes code that searches the catalog, then executes only what matches. The new pattern has a name: Code Mode. It is a runtime where the agent writes code that mixes two primitives. Bash, for anything with a binary already installed like git or curl. Typed module imports, for proprietary APIs where the type signatures load only when the agent actually imports the tool. That second part is the unlock. Types travel with imports, so the agent gets a strict contract for the tools it picks, and pays nothing for the ones it skips. MCP's typed contracts plus CLI's lazy loading, in one runtime. The agent picks per task. "MCP is dead" was the wrong takeaway. Anthropic just reported 300M MCP SDK downloads, up from 100M at the start of the year. The protocol is not dying. It is the fastest growing piece of agent infrastructure right now. What died was loading every tool upfront. That was always a bad idea. If you are building agents in 2026, the rule is simple. Tool definitions belong in code, not in context. The model writes a few lines that call them. The runtime does the rest. That is what the debate was actually about.
Akshay 🚀 tweet media
Akshay 🚀@akshay_pachaar

x.com/i/article/2053…

English
48
67
476
64.3K
fils
fils@fils·
Hugging Face for Science at huggingscience.co This is very interesting. So I am exploring at what an agent optimized data repository looks like. So finding "Hugging Science" by Hugging Face was interesting. It is, so they say, a site optimized for your AI agent, and supports quite a few major domain specific data formats with large file support (huggingface.co/docs/datasets/…). They have projects to get involved with, design challenges ( huggingscience.co/#/getting-star… ) etc. I don't see many geo-science datasets here yet. A call out to my community I guess. Related paper: AI for scientific discovery is a social problem ( sciencedirect.com/science/articl… ) Is llms.txt still a thing?: huggingscience.co/llms.txt
English
0
0
1
84
fils retweetledi
Gephi
Gephi@Gephi·
📣 Big news: We’re launching Gephi 0.11! 🤩 Download this new version and spread the word! ℹ️ Learn more here: gephi.wordpress.com/2026/05/05/gep… 🖥️ and download Gephi 0.11 on gephi.org
English
2
21
45
9.5K
fils retweetledi
Omar Khattab
Omar Khattab@lateinteraction·
Wow, it's already May 5th. Don't miss the early-bird registration TODAY for the first ACM conference on AI systems @CAISconf. CAIS will have a packed program of really exciting keynotes, paper presentations, workshops, and demos. See you in San Jose in late May!
Omar Khattab tweet media
English
4
16
140
8.9K
fils
fils@fils·
Nice short take on MCP for SPARQL resources from sparna.fr. I like their core 3 items. I wonder if a few example of type to type valid paths might also be added. I use that to help generate SPARQL. However, their SHACL elements might do all that and more! So, I just need to try this. Looks really nice! sparna.fr/en/posts/mcp-p…
fils tweet media
English
1
0
1
84
fils
fils@fils·
Transforming Research Visibility with RAiD at Oak Ridge National Laboratory Interesting blog post. Also fascinated with the Acorn CLI mentioned in this article: acorn.ornl.gov lyrasis.org/transforming-r…
fils tweet media
English
0
1
1
82
fils
fils@fils·
10 years of FAIR! 🙌 From the site: To celebrate this milestone, Scientific Data invites researchers to submit manuscripts related to FAIR-aligned infrastructure, policy, or standardisation to this collection. August 14th deadline nature.com/collections/ee…
English
0
0
1
16
fils retweetledi
Sam Hogan 🇺🇸
Sam Hogan 🇺🇸@samhogan·
We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.
Sam Hogan 🇺🇸 tweet media
English
59
124
1.4K
127K
fils retweetledi
Senzing, Inc.
Senzing, Inc.@senzing·
⚡ Drowning in data but starving for context? This one's for you. The authors of "Bridging Knowledge, Data, and AI" join @pacoid's #GraphPowerHour this Friday to bring the semantic layer from theory into practice. 📅 April 24 | 2:00 PM ET 🔗 hubs.li/Q04cYgmZ0
Senzing, Inc. tweet media
English
0
3
6
223
fils retweetledi
isaac 🧩
isaac 🧩@isaacbmiller1·
DSPy 3.2.0 is out! Here are a few highlights: - dspy.RLM improvements around parsing, tool execution, and failure recovery. Expect greater reliability in the bridge between Python and Deno. - @MaximeRivest is leading an ongoing effort to decouple DSPy from LiteLLM. This release has the first interface improvements in this direction - Input fields warn on type mismatches. Passing a value that doesn't match a signature's declared type now logs a warning by Michael Isaac - BetterTogether Allows Chaining Optimizers by @dilarafsoylu. You can chain multiple GEPA runs together, or combine prompt optimization and fine tuning. Thank you to all who contributed! See the full release notes below for more details.
isaac 🧩 tweet media
English
9
44
315
25.8K
fils retweetledi
Raymond Weitekamp
Raymond Weitekamp@raw_works·
crazy preliminary results from qwen 3.5 last night: Preliminary — Qwen3.5 + dspy.RLM on LongCoT-Mini: 27B lands at #2 (33%) — behind only GPT 5.2, +11pp ahead of Gemini 3 Pro. 9B lands at #4 (17%) — still beats Sonnet 4.5. RLMs unambiguously SOTA on this, more soon!
Raymond Weitekamp@raw_works

ok so the default DSPy.RLM is literally going to destroy this benchmark before the end of the day. running now for sonnet 4.5... 🏆 Scoreboard (live) RLM: 90/94 (95.7%) Vanilla: 0/94 (0.0%) anyone want to pay for the opus run? 😉

English
7
19
208
20.9K