Gabriele Farei

3K posts

Gabriele Farei

@jayfarei

Maker for 🦞 claws | Building @opentraces, @datafetchai and @envrun

London, United Kingdom Katılım Ocak 2011

603 Takip Edilen1.1K Takipçiler

Sabitlenmiş Tweet

Gabriele Farei@jayfarei·4d

Open data should mean more speed for open source projects. Here’s a new primitive for agent-to-agent collaboration: Trace Capsules. Replayable issue reports that turn debugging into context sharing, and bugs into evaluation targets. x.com/jayfarei/statu…

Gabriele Farei@jayfarei

x.com/i/article/2057…

English

245

Gabriele Farei retweetledi

DHH@dhh·2d

AI is the open source dream come true! Having access to the code actually giving mere mortals the power to change it. Wasn't that what open source activists spent decades fighting for? TAKE THE WIN!

English

628

35.1K

Gabriele Farei@jayfarei·2d

Hard to explain what a dataset harness like @datafetchai actually is, because it’s an agent-to-agent thing. So here's the first attempt of many: Imagine a dataset interface as a code workspace your agent can inspect, run, and compose through typed TypeScript functions. As agents solve real intents using it, the useful parts of their work are saved back as new typed functions, tests, and examples. Over time, the workspace stops being a generic dataset interface and becomes a tenant-specific library of workflows shaped by what (your) agents repeatedly ask of that dataset. The harness is what governs that evolution 👇

English

103

Gabriele Farei@jayfarei·4d

A good use case for stateful agents for me is increasingly “managing my attention”. Both to reduce context switching but also as I found out that is not only agents that are forgetful from session to session 😆 This feels very much in the personal software category, because attention is deeply personal and the way we consume and retain information even more so. My current interpretation: 1/ Take the noise from my feeds: => inbox, calendar, social, news 2/ Take the recent context around me: => traces, browsing history, likes, upvotes, recent docs/vault changes, granola, etc.. Then compress it back into a fixed-length “study card” format, prioritised by what the agent thinks it is relevant to carry forward for today (each asset has its own "background job prompt") What you have is a daily context warm-up. I can scan it, mark what matters, leave comments, upvote/downvote items, and then the agent turns the important bits into todos or blocks time against the rest of my day. A bit over-engineered, maybe. But I feel there is something here.

English

132

Gabriele Farei@jayfarei·3d

@Dimillian Simply start or resume a session on my macmini from my phone, /remote-control from the Mac, or /remote-new from the iPhone

English

Thomas Ricouard@Dimillian·3d

So what would you like to do from Codex remote that you can’t do right now? Good night

English

208

210

19K

Gabriele Farei@jayfarei·4d

TLDR 👇 x.com/jayfarei/statu…

Gabriele Farei@jayfarei

x.com/i/article/2057…

Čeština

111

Gabriele Farei@jayfarei·4d

Playing around with ways to showcase what dynamic code mode in @datafetchai "feels like", and it is hard 😅

English

Gabriele Farei@jayfarei·4d

Own your risks or be very picky with your dependencies => farei.me/posts/17-the-c…

Anthony Shew@anthonysheww

Reminder: A simple way to lower your exposure against these npm attacks is to not have as many deps. Linking the Agent Skill I made below but feel free to copy-paste into a prompt(s) if Skills aren't your fancy. skills.sh/anthonyshew/do…

English

363

Gabriele Farei@jayfarei·4d

@Dimillian when /remote-control 👀?

English

Thomas Ricouard@Dimillian·4d

Codex in ChatGPT iOS app got better in latest update! - Receive turn completion push notifications - Better reconnection UI - Better conversations UI, more compact and closer to our desktop app - New /fork command! - Better diff with an option to open the full file - And more!

English

118

898

173.3K

Gabriele Farei@jayfarei·4d

An experiment I'd like to run is an entirely eval driven product development, end to end. I'd work iteratively on specs and synthetic eval data for it, no code. Spend the majority of my time labelling, defining principles and constraints, and aligning with a judge to create an evaluation target. Then have several parallel /goal until it passes the evaluation (incl. code quality, memory, latency and scaling), no input of exactly what is in between the intent and outcome. Wonder what will come out 🤔 It probably won't work with today's models, but that might be a good personal benchmark to catch the wave before it takes off.

English

Gabriele Farei@jayfarei·4d

We obsess over tech layoffs. But financial services is full of “measurers” disguised as procedure, operations, compliance theatre, and human middleware. Imagine what cheap software and agents will do there. That is the unemployment risk I would worry about most.

Rohan Paul@rohanpaul_ai

AI is coming for most rule-based Banking jobs.

English

115

Gabriele Farei@jayfarei·4d

Been loving this workflow, reminds me a bit of the context tree in pi, I have been using in planning sessions to take stuff out of scope and put them in motion in parallel worktrees. Tip: to avoid polluting the main session context try /btw or /side and specify it there Handoff skill here: skills.sh/mattpocock/ski…

Matt Pocock@mattpocockuk

You asked for it, so here it is: a deep-dive on my new /handoff skill. It's an alternative to /compact that gives you WAY more flexibility with your context window. - Think of an idea, handoff to another agent to implement - Grill, handoff to prototype, handoff BACK Enjoy:

English

195

Gabriele Farei@jayfarei·4d

Code as harness was a great read, some of my notes below 👇 This could be interesting for you @badlogicgames as you experiment with the "no tool" mode. arxiv.org/abs/2605.18747

English

Gabriele Farei@jayfarei·5d

x.com/i/article/2057…

ZXX

523

Gabriele Farei@jayfarei·5d

@TeslaOwnersUK If it is not available in the UK, and you are paying a subscription vs one-off. What exactly are you buying pre-uk approval? x.com/niccruzpatane/…

Nic Cruz Patane@niccruzpatane

Tesla Full Self-Driving (Supervised) is now officially operating in 10 countries: • U.S. • Canada • Mexico • Puerto Rico • The Netherlands • Australia • New Zealand • South Korea • China • Lithuania Just the beginning.

English

Gabriele Farei@jayfarei·5d

@TeslaOwnersUK what is actually available and road legal in the UK? last time I used it it was only doing - lane changes (required indicator from the driver) - summon on private lane (none I could find in london) i.e. not much more vs base package compare the full FSD, are we getting it now?

English

141

Tesla Owners UK 🇬🇧@TeslaOwnersUK·5d

Today is the last day you can purchase Full Self Driving Capability in the UK before its subscription only. We’ve experienced it in the US and it’s very impressive. Perhaps far more relevant to AP4 cars. We eagerly wait for it to come to the UK.

English

102

12.4K

Gabriele Farei@jayfarei·5d

I feel you on the risk of doing too much for too many people. But I’d actually be more worried about hosted agents as the crowded path, unless you have a fresh take on it. In enterprise, it feels like everyone is trying to turn a harness into a managed service: bring your tools, bring your memory, add evals, deploy an agent, manage it for teams. The part of Flue that feels more distinctive to me is exactly the simplicity of "flue run triage" in CI. Beyond CI, this could be a clean, self-contained package/endpoint to do useful agent work as a repeatable execution unit. I am finding the agent stack to be overly clever in many ways, when all I want is to "patch" a workflow with an agent procedure. Flue could be perfect for that. 🤔

English

fks@FredKSchott·18 May

hitting this interesting cross-roads with flue: 1) repo automation, workflows 2) hosted agents as the framework matures, the differences between them are becoming more obvious and more frustrating to design around (and by extension, for users). for example: in astro, it was a specific design goal that our repo automation and human maintainers would reuse 90% of the same content. Shared skills, tools, configuration, etc. etc. running "flue run triage" in a GitHub Action should be as close to a core maintainer opening up claude code in the repo and asking "triage this issue: URL" but if you're building and deploying a hosted agent, you want your skills and tools and subagents to live alongside the agent code, not the sandbox file-system. splitting your agent logic across "this logic (agent code, tools) lives in the codebase" vs. "this logic (skills, roles) lives in the sandbox" is a maintenance nightmare. i'm not sure what the answer is, but I see projects like Sandcastle by @mattpocockuk laser-focused on repo automation. I trust Matt to build something great here that will be hard for us to compete with. We are trying to do too much for too many people. meanwhile, I'm now talking with so many devs building agents (not just oss devs with oss repos) and there is no one doing what flue is doing today. A part of me really just wants to explore and optimize for this, and build the best framework for agents. idk, talking out loud a bit. will spend more time exploring this this week. curious if anyone who's tried flue (or considered it) has thoughts!

English

6.8K

Gabriele Farei@jayfarei·5d

Code mode everything 👑

Akshay 🚀@akshay_pachaar

code as agent harness. a 102-page survey from Stanford, Meta, and UIUC on agent harnesses. the paper argues that code is no longer just the thing agents produce. it’s the medium through which they reason, act, and represent their environment. it calls this “code as agent harness” and covers three layers: code as the interface between agents and their tasks; the mechanisms that keep agents reliable over long-horizon execution (planning, memory, tool use, verification); and how multi-agent systems coordinate through shared code artifacts. core findings: the paper introduces “evolution agents” that treat the harness itself as the optimization target. they collect telemetry, diagnose failures, propose infrastructure changes, and promote only mutations that pass regression. the harness improves itself. in multi-agent systems, topology complexity inversely correlates with infrastructure quality. teams with better shared state use simpler coordination. teams without it build increasingly elaborate workarounds. finally, the paper concludes that future agent systems need four properties: - executable - inspectable - stateful - governed read more: arxiv.org/abs/2605.18747 i also published this deep dive (article) on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. the article is quoted below.

English

148

Keşfet

@datafetchai @Dimillian @badlogicgames @TeslaOwnersUK @mattpocockuk @elonmusk @BarackObama @taylorswift13