Gabriele Farei

3K posts

Gabriele Farei banner
Gabriele Farei

Gabriele Farei

@jayfarei

Maker for 🦞 claws | Building @opentraces, @datafetchai and @envrun

London, United Kingdom Katılım Ocak 2011
603 Takip Edilen1.1K Takipçiler
Gabriele Farei retweetledi
DHH
DHH@dhh·
AI is the open source dream come true! Having access to the code actually giving mere mortals the power to change it. Wasn't that what open source activists spent decades fighting for? TAKE THE WIN!
English
18
27
628
35.1K
Gabriele Farei
Gabriele Farei@jayfarei·
Hard to explain what a dataset harness like @datafetchai actually is, because it’s an agent-to-agent thing. So here's the first attempt of many: Imagine a dataset interface as a code workspace your agent can inspect, run, and compose through typed TypeScript functions. As agents solve real intents using it, the useful parts of their work are saved back as new typed functions, tests, and examples. Over time, the workspace stops being a generic dataset interface and becomes a tenant-specific library of workflows shaped by what (your) agents repeatedly ask of that dataset. The harness is what governs that evolution 👇
English
0
1
1
103
Gabriele Farei
Gabriele Farei@jayfarei·
A good use case for stateful agents for me is increasingly “managing my attention”. Both to reduce context switching but also as I found out that is not only agents that are forgetful from session to session 😆 This feels very much in the personal software category, because attention is deeply personal and the way we consume and retain information even more so. My current interpretation: 1/ Take the noise from my feeds: => inbox, calendar, social, news 2/ Take the recent context around me: => traces, browsing history, likes, upvotes, recent docs/vault changes, granola, etc.. Then compress it back into a fixed-length “study card” format, prioritised by what the agent thinks it is relevant to carry forward for today (each asset has its own "background job prompt") What you have is a daily context warm-up. I can scan it, mark what matters, leave comments, upvote/downvote items, and then the agent turns the important bits into todos or blocks time against the rest of my day. A bit over-engineered, maybe. But I feel there is something here.
Gabriele Farei tweet media
English
0
1
1
132
Gabriele Farei
Gabriele Farei@jayfarei·
@Dimillian Simply start or resume a session on my macmini from my phone, /remote-control from the Mac, or /remote-new from the iPhone
English
0
0
0
60
Thomas Ricouard
Thomas Ricouard@Dimillian·
So what would you like to do from Codex remote that you can’t do right now? Good night
English
208
2
210
19K
Gabriele Farei
Gabriele Farei@jayfarei·
Playing around with ways to showcase what dynamic code mode in @datafetchai "feels like", and it is hard 😅
Gabriele Farei tweet media
English
0
1
0
66
Thomas Ricouard
Thomas Ricouard@Dimillian·
Codex in ChatGPT iOS app got better in latest update! - Receive turn completion push notifications - Better reconnection UI - Better conversations UI, more compact and closer to our desktop app - New /fork command! - Better diff with an option to open the full file - And more!
English
118
32
898
173.3K
Gabriele Farei
Gabriele Farei@jayfarei·
An experiment I'd like to run is an entirely eval driven product development, end to end. I'd work iteratively on specs and synthetic eval data for it, no code. Spend the majority of my time labelling, defining principles and constraints, and aligning with a judge to create an evaluation target. Then have several parallel /goal until it passes the evaluation (incl. code quality, memory, latency and scaling), no input of exactly what is in between the intent and outcome. Wonder what will come out 🤔 It probably won't work with today's models, but that might be a good personal benchmark to catch the wave before it takes off.
English
0
0
0
56
Gabriele Farei
Gabriele Farei@jayfarei·
Been loving this workflow, reminds me a bit of the context tree in pi, I have been using in planning sessions to take stuff out of scope and put them in motion in parallel worktrees. Tip: to avoid polluting the main session context try /btw or /side and specify it there Handoff skill here: skills.sh/mattpocock/ski…
Matt Pocock@mattpocockuk

You asked for it, so here it is: a deep-dive on my new /handoff skill. It's an alternative to /compact that gives you WAY more flexibility with your context window. - Think of an idea, handoff to another agent to implement - Grill, handoff to prototype, handoff BACK Enjoy:

English
0
0
0
195
Gabriele Farei
Gabriele Farei@jayfarei·
@TeslaOwnersUK what is actually available and road legal in the UK? last time I used it it was only doing - lane changes (required indicator from the driver) - summon on private lane (none I could find in london) i.e. not much more vs base package compare the full FSD, are we getting it now?
English
1
1
1
141
Tesla Owners UK 🇬🇧
Tesla Owners UK 🇬🇧@TeslaOwnersUK·
Today is the last day you can purchase Full Self Driving Capability in the UK before its subscription only. We’ve experienced it in the US and it’s very impressive. Perhaps far more relevant to AP4 cars. We eagerly wait for it to come to the UK.
Tesla Owners UK 🇬🇧 tweet media
English
30
9
102
12.4K
Gabriele Farei
Gabriele Farei@jayfarei·
I feel you on the risk of doing too much for too many people. But I’d actually be more worried about hosted agents as the crowded path, unless you have a fresh take on it. In enterprise, it feels like everyone is trying to turn a harness into a managed service: bring your tools, bring your memory, add evals, deploy an agent, manage it for teams. The part of Flue that feels more distinctive to me is exactly the simplicity of "flue run triage" in CI. Beyond CI, this could be a clean, self-contained package/endpoint to do useful agent work as a repeatable execution unit. I am finding the agent stack to be overly clever in many ways, when all I want is to "patch" a workflow with an agent procedure. Flue could be perfect for that. 🤔
English
0
0
1
33
fks
fks@FredKSchott·
hitting this interesting cross-roads with flue: 1) repo automation, workflows 2) hosted agents as the framework matures, the differences between them are becoming more obvious and more frustrating to design around (and by extension, for users). for example: in astro, it was a specific design goal that our repo automation and human maintainers would reuse 90% of the same content. Shared skills, tools, configuration, etc. etc. running "flue run triage" in a GitHub Action should be as close to a core maintainer opening up claude code in the repo and asking "triage this issue: URL" but if you're building and deploying a hosted agent, you want your skills and tools and subagents to live alongside the agent code, not the sandbox file-system. splitting your agent logic across "this logic (agent code, tools) lives in the codebase" vs. "this logic (skills, roles) lives in the sandbox" is a maintenance nightmare. i'm not sure what the answer is, but I see projects like Sandcastle by @mattpocockuk laser-focused on repo automation. I trust Matt to build something great here that will be hard for us to compete with. We are trying to do too much for too many people. meanwhile, I'm now talking with so many devs building agents (not just oss devs with oss repos) and there is no one doing what flue is doing today. A part of me really just wants to explore and optimize for this, and build the best framework for agents. idk, talking out loud a bit. will spend more time exploring this this week. curious if anyone who's tried flue (or considered it) has thoughts!
English
12
1
67
6.8K
Gabriele Farei
Gabriele Farei@jayfarei·
Code mode everything 👑
Akshay 🚀@akshay_pachaar

code as agent harness. a 102-page survey from Stanford, Meta, and UIUC on agent harnesses. the paper argues that code is no longer just the thing agents produce. it’s the medium through which they reason, act, and represent their environment. it calls this “code as agent harness” and covers three layers: code as the interface between agents and their tasks; the mechanisms that keep agents reliable over long-horizon execution (planning, memory, tool use, verification); and how multi-agent systems coordinate through shared code artifacts. core findings: the paper introduces “evolution agents” that treat the harness itself as the optimization target. they collect telemetry, diagnose failures, propose infrastructure changes, and promote only mutations that pass regression. the harness improves itself. in multi-agent systems, topology complexity inversely correlates with infrastructure quality. teams with better shared state use simpler coordination. teams without it build increasingly elaborate workarounds. finally, the paper concludes that future agent systems need four properties: - executable - inspectable - stateful - governed read more: arxiv.org/abs/2605.18747 i also published this deep dive (article) on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent. the article is quoted below.

English
0
0
0
148