Jonathan Hefner

140 posts

Jonathan Hefner

@hefnerdotpro

Working on https://t.co/ur0WT8vRhs | All my issues are skill issues

Bergabung Ağustos 2024

122 Mengikuti33 Pengikut

Jonathan Hefner@hefnerdotpro·23h

@andrewqu I can imagine this being awesome for quickly interacting with agents (at fine granularity).

English

179

Andrew Qu@andrewqu·1d

Slack - but you can react to sections of long messages instead of the whole message

English

134

17.9K

Jonathan Hefner@hefnerdotpro·2d

@dreamsofcode_io That is a useful framing because it suggests a design strategy: relentlessly ask “what can go wrong?” and iterate until the answer is acceptable (based on product requirements).

English

Dreams of Code@dreamsofcode_io·2d

I’ve come to the conclusion that LLMs by themselves don’t make software less stable. What they do however is amplify Murphys Law to a level most developers have never seen. I think we’re going to see a real difference between software “development” and “engineering”.

English

2.7K

Jonathan Hefner@hefnerdotpro·2d

@JeopardizeP @hyhieu226 Repo link: github.com/huggingface/up…

Português

JPDZ@JeopardizeP·2d

@hyhieu226 huggingface.co/blog/upskill

QME

296

Hieu Pham@hyhieu226·2d

Is it possible to distill a specific domain skill from an LLM into a smaller one? For instance: Codex can code everything, but if I only want CUDA kernels, may I get away with a much smaller model? What are the intuitions? How would things fall apart? 😅

English

637

114.3K

Jonathan Hefner@hefnerdotpro·2d

@sarahwooders That's an interesting point. See a dev blog describing a best practice you want to adopt? Point your memory-augmented agent at it and tell it to do that from now on.

English

Sarah Wooders@sarahwooders·2d

When agents have memory, they can just learn to automatically do the things you’d otherwise some UI/UX for (e.g. in your ADE/IDE): - create worktrees for new tasks - open files in zed/vscode/cursor - link the conversational from PRs This is why Letta Code app is quite minimal

English

Jonathan Hefner@hefnerdotpro·2d

@OfficialLoganK More benchmarks also means more well-defined targets for frontier labs, so it's a bit of a win-win.

English

111

Logan Kilpatrick@OfficialLoganK·2d

Every company building on top of AI should be making their own benchmarks. This is the way if you want model progress to disproportionally benefit your company.

English

134

1.9K

140.5K

Jonathan Hefner@hefnerdotpro·3d

@BenjaminBadejo My own short version of this is "Do you understand what I mean?" In addition to making sure we're on the same page, I feel like having the AI state its understanding in its own words improves adherence during implementation.

English

Ben Badejo@BenjaminBadejo·4d

Believe if or not, the key to getting good results when building with AI, is to say to your AI harness, “Before you begin, state back to me clearly what you think you are being asked to do, and ask me any questions you may have.”

English

855

Jonathan Hefner@hefnerdotpro·3d

@neural_avb

GIF

QME

AVB@neural_avb·4d

Goated prompts v2

AVB@neural_avb

Goated prompt

English

2.3K

Jonathan Hefner me-retweet

Oussama Sekkat@osekkat·3d

I’ve seen people on X dunking on folks like @garrytan @doodlestein and others for sharing SKILL dot md files they've built. They are dismissing these files as "just a markdown file.” I think this misses the point entirely and I'll try to address that here. Quick thread: A bad skill file is just text, sure. A good skill file is compressed expertise, packaged in a format an agent can actually use. The value is not just in the “markdown file.” The value is the interaction between: a huge neural network with latent capabilities a precise, reusable, agent-readable procedure that steers those capabilities toward a specific outcome That combination is the product. Saying “it’s just markdown” is like saying Hamlet is “just ink on paper,” or Einstein’s relativity paper was “just a text.” Technically true. Intellectually useless. The medium is simple. The content is what matters. And more importantly, the effect of that content on the reader is what matters. With humans, a book, a coach, a lecture, or painting can change how someone thinks and acts. With LLMs, text is also the control surface. These models were trained on text, reason through text, call tools through text, and follow procedures through text. So yes, the skill is “just text.” But it is text designed to be read by an enormous neural net. That matters. A good skill is agent-ergonomic. It does not merely say “do this better.” It encodes workflow, constraints, examples, edge cases, tool usage, failure modes, and success criteria in a way the agent can reliably execute. That is very different from a casual prompt. A prompt is often a one-off request. A skill can be reused, versioned, tested, improved, shared, and loaded at the exact moment an agent needs it. That turns “vibes-based prompting” into something closer to operational knowledge. Another way to think about it: We have built these massive models, but much of their power is latent. Different people can extract very different levels of performance from the same model. A good skill is a way to actualize a specific slice of that latent capability. A refactoring skill. A research skill. A legal review skill. A math explanation skill. A codebase-navigation skill. Each one can make the same model behave very differently. I think of Cus D’Amato and Mike Tyson. Tyson had enormous latent potential. But Cus gave him a system, a style, a discipline, a way to channel that potential. That’s what good skills are for agents. They are not magic. They are not all equally valuable. Many will be mediocre or useless. But dismissing them right off the batt because they are “just markdown” shows a misunderstanding of what LLMs are. Text is how we trained these systems. (for the most part) Text is how we steer them. Text is how we unlock parts of what they can do. The question is not whether a skill file is “just text.” The question is whether the text reliably makes the model perform better at a valuable task. If yes, then it is not “just markdown.” It is leverage.

English

11.3K

Jonathan Hefner@hefnerdotpro·3d

@mattpocockuk If you want to host them on a website, you can use the Agent Skills `.well-known` URI: github.com/cloudflare/age… / github.com/agentskills/ag… It's supported by `npx skills add`.

English

820

Matt Pocock@mattpocockuk·3d

Nearly 23K stars for a collection of markdown files I wrote I guess they must be pretty good I want to invest more time in this repo. So, folks who starred it, what can I do to make these skills more obvious to you? - A docs site for the skills? - Send them to plugin marketplaces? Help me help you github.com/mattpocock/ski…

English

135

171

2.4K

191.2K

Jonathan Hefner@hefnerdotpro·3d

@thdxr @Hacksore It's the one with an "A" in it.

English

dax@thdxr·3d

@Hacksore i forgot which one was ours

English

501

16.3K

Hacksore@Hacksore·3d

Tell me the diff in these icons

English

183

30.1K

Jonathan Hefner@hefnerdotpro·3d

@zeeg The honest answer is that it was copied from Claude Code Skills early on in the standardization process. Since then, CC has introduced even more features, which we decided to wait to standardize until they get more consensus. allowed-tools is a wart, but mostly inconsequential.

English

David Cramer@zeeg·3d

@hefnerdotpro the spec suggests things that no one implements - why are they part of the spec if they're not going to exist? allowed-tools is unlikely to ever function, so not sure why it was added in the first place

English

152

David Cramer@zeeg·3d

can we talk about how absurd it is that there's this SKILL.md spec on agentskills.io that is not implemented by anyone and that some of the spec can't even work? allowed-tools for example

English

6.5K

Jonathan Hefner@hefnerdotpro·3d

@andrewqu @zeeg I think there are multiple ways skill support can be implemented (i.e., no single "right" way). If anyone is looking for guidance on how to implement skills support, see agentskills.io/client-impleme…, which covers some of the choices.

English

Andrew Qu@andrewqu·3d

@zeeg I think a big gap is that there’s no de-facto reference architecture for how to implement every part of skills/plugins Pi or some minimal agent could the first consumer of any new skills spec feature, and every new coding agent could be rebased on top

English

286

Jonathan Hefner@hefnerdotpro·3d

@EwoofCMD @ndrewpignanelli Like this: x.com/densumesh/stat…

Dens Sumesh@densumesh

x.com/i/article/2039…

English

Ewoof@EwoofCMD·3d

@hefnerdotpro @ndrewpignanelli Bruh it doesn't solve it. think of it this way how would you use grep when your documents are in a blob or in index db. that was the point you can't use it as a service.

English

andrew pignanelli@ndrewpignanelli·3d

people don’t understand this take cause they don’t understand what’s happening in AI memory. Everything is moving to git backed files accessible via grep-type-systems or semantic plus grep which isn’t very defensible to offer as a service. In other words… the SOTA approaches to memory are now just agent plus terminal. And all the fancy approaches like knowledge graphs are getting rekt by an agent plus a terminal. Your fancy agent structure is getting rekt by a model that can keep track of anything over 1000+ terminal calls.

Satyam@KlausCodes

I believe, the AI memory startups need to pivot now

English

1.7K

246.6K

Jonathan Hefner@hefnerdotpro·3d

@EwoofCMD @ndrewpignanelli The terminal interface is just an interface. It doesn't need to be an actual terminal. For example, it can be implemented with something like github.com/vercel-labs/ju….

English

Ewoof@EwoofCMD·3d

@ndrewpignanelli eh. the issue is the agent terminal situation is not scalable though web. LLMs are already expensive now you need to give each agent you make a computer?

English

Jonathan Hefner@hefnerdotpro·4d

@Presidentlin Same. I also make them use `any` for all types -- saves a ton of tokens.

English

Lincoln 🇿🇦@Presidentlin·4d

I no longer have the LLM write tests. Cost of tokens going up. We have to be frugal.

English

1.6K

Jonathan Hefner@hefnerdotpro·5d

@simonw

GIF

QME

273

Simon Willison@simonw·5d

Oh no

Scott@scottjla

@simonw I feel like we need to stack these tests now

English

477

50.3K

Jonathan Hefner@hefnerdotpro·5d

@simonw I imagine Eye of the Tiger is playing in the background.

English

Simon Willison@simonw·5d

These pelicans are kind of angry looking! Left is deepseek-v4-flash, right is deepseek-v4-pro - both generated using OpenRouter via my LLM tool

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

439

61.1K

Jonathan Hefner@hefnerdotpro·5d

@marine28041 @GoogleCloudTech @github Yes, Gemini CLI supports skills. Many other clients too! See agentskills.io/clients for (non-exhaustive) list.

English

Marine E.@marine28041·6d

@GoogleCloudTech @github Would this work in Gemini cli?

English

648

Google Cloud Tech@GoogleCloudTech·6d

Our official Agent Skills repository on @github is here! Skills are a simple, open format for giving agents new capabilities and expertise. Think of a skill as compact, agent-first documentation for a specific tech or task. Learn more → goo.gle/4eCsZqu #GoogleCloudNext

English

748

5.4K

447.5K

Jonathan Hefner@hefnerdotpro·5d

@lucas59356 @GoogleCloudTech @github That's an option. It depends on how you want to architect progressive loading for your context. If there is a single entry point for the model, then it can be a single skill with many resource files. But if there are multiple entry points, then you'd want a skill for each.

English

#!/usr/bin/env Lucão@lucas59356·6d

@GoogleCloudTech @github Why not one skill with multiple resources in it?

English

Jelajahi

@andrewqu @dreamsofcode_io @JeopardizeP @hyhieu226 @sarahwooders @OfficialLoganK @BenjaminBadejo @neural_avb