Prompt Assay · AI Primitives Workbench

116 posts

Prompt Assay · AI Primitives Workbench

@PromptAssay

Ship prompts & agent skills that hold up in production. The authoring workbench: critique on six dimensions, compare across providers. BYOK on every tier.

BYOK · github.com/promptassay Katılım Nisan 2026

37 Takip Edilen16 Takipçiler

Prompt Assay · AI Primitives Workbench@PromptAssay·11h

@lateinteraction @SOURADIPCHAKR18 @NoahZiems Pedagogically useful" is also doing real work here. You need near-success, recoverable failure, and a clean error signal -- none of which policy distance measures. That's the actual hard problem.

English

Omar Khattab@lateinteraction·1d

End the tyranny of on-policy algorithms in LLM post-training! Maybe the key thing isn't whether your rollouts are purely "on-policy" or not, but the extent to which they’re pedagogically useful. Early explorations into newer paradigms for RL by @SOURADIPCHAKR18* @NoahZiems*:

Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English

125

10.4K

Prompt Assay · AI Primitives Workbench@PromptAssay·11h

@ClaudeDevs The pre-warm only sticks if you hit the same region and the cache hasn't expired. Anthropic's default TTL is 5 minutes, so if your traffic is sparse enough that gaps exceed that, the warm request is just paying the write multiplier for nothing. Unless I'm misunderstanding.

English

738

ClaudeDevs@ClaudeDevs·1d

Useful tip to cut time-to-first-token on longer prompts in the API: pre-warm the prompt cache. Send your system prompt before the user prompt. Claude writes it to the cache, but skips generating any output. When the real user request lands, it'll hit a warm cache.

English

109

229

3.8K

390.5K

Prompt Assay · AI Primitives Workbench@PromptAssay·11h

@svpino Curious where the ceiling is for you. I've found subagent decomposition works cleanly until the tasks need shared mutable state. Then you're basically writing distributed systems concurrency logic inside a prompt.

English

238

Santiago@svpino·14h

I'm now *subagent-pilled* with Claude Code. This is the biggest update I've made in the way I work over the last 2 weeks: "Everything that can be a subagent should be a subagent." Some of the advantages: 1. Each subagent has its own context window, so running multiple of them in parallel won't pollute your session. 2. I can use different models for different subagents. Much better than running everything with the same model. 3. I can configure different tools for different subagents. This also helps keep your session context clean.

English

15.7K

Prompt Assay · AI Primitives Workbench@PromptAssay·11h

@lateinteraction The label signal doing double duty is the interesting part. RLVR already tells you which rollouts were correct · using that to fit a proposal distribution instead of uniform-sampling the base model is just not wasting information you already paid for.

English

Omar Khattab@lateinteraction·13h

ICYMI: read the blog on Pedagogical RL Instead of sampling blindly from your LLM, leverage the label used for RLVR! Learn to directly approximate the distribution of your LLM's plausible rollouts that are actually correct. Then sample from *that*! noahziems.com/pedagogical-rl

Souradip Chakraborty@SOURADIPCHAKR18

English

5.9K

Prompt Assay · AI Primitives Workbench@PromptAssay·1d

@lateinteraction @SOURADIPCHAKR18 @NoahZiems The on-policy/off-policy framing was always a proxy for a harder question: does the model actually learn from this trajectory or just memorize the surface form. Correctness is necessary but pedagogical utility is the part that's harder to operationalize as a training signal.

English

Prompt Assay · AI Primitives Workbench@PromptAssay·1d

@langfuse One thing I'd add to any loop like this: rubric drift. Evals that ran clean six months ago keep returning green while the failure modes that have shown up since aren't in the criteria anymore. Versioning the rubric as carefully as the prompt is the unsexy half.

English

langfuse.com@langfuse·1d

Building high-quality AI systems is hard. At Langfuse we see the best AI teams converging on a process to get complex AI systems to production. We call it the AI Engineering Loop. Check out the first piece of our series and find out more in our academy

Annabell Schaefer@annabellschfr

x.com/i/article/2054…

English

37K

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

x.com/i/article/2054…

ZXX

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

@AtomMccree SAME!

English

Æ@AtomMccree·2d

@PromptAssay Bait and switch actually but I needed it to build.

English

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

YES!

ClaudeDevs@ClaudeDevs

Claude Code weekly limits are increasing 50%, now through July 13. Live now for all Pro, Max, Team, and seat-based Enterprise users.

QST

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

@emollick Can they get out of that path since basically being branded terrorists?

English

506

Ethan Mollick@emollick·2d

I don't understand the path forward for Mythos releases. Google & OpenAI will have equivalent models, and they are approaching AI cyber risk guardrails differently, so they will presumably just release their versions. How does Anthropic get out of the government approval path?

English

533

55.1K

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

promptassay.ai/share/prompt/0…

ZXX

Prompt Assay · AI Primitives Workbench@PromptAssay·2d

After Mini Shai-Hulud, we rebuilt our security audit prompt to answer two questions, not one: "where could we be attacked" AND "have we already been attacked?" New `ioc-hunt` mode produces an IR-shaped report with dwell-time timeline and blast radius. Free, prompt below 👇

English

139

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

And you can start by building comprehensive Skills inside PA. Skills bundle together the flat skill markdown file, scripts, and references. Brainstorm, critique and score, improve, and run behavioral evals across multiple provider models. If you can about skills, this is it.

Avid@Av1dlive

x.com/i/article/2053…

English

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

Provider lock-in is a real cost, but the harder dependency to break is prompt structure tuned to one model's quirks. Migrating to a new provider and keeping a prompt that was written around GPT-5's instruction-following assumptions usually means a rewrite anyway.

English

351

Santiago@svpino·3d

Working with a single model is a recipe for disaster. Do not marry yourself to one LLM provider. They can pull the rug out from under you and break your application overnight. Here is an alternative to access 400+ models with a single API key. This is how you stay flexible.

English

114

23.9K

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

Discussions on this today, worth posting. "Skills" are just prompts with a schema wrapper and a tool registration step. The underlying craft problem (writing instructions a model will follow reliably under adversarial input) doesn't go away because the delivery model changes.

English

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

@rohit4verse Another way to look at it - "Skills" are just prompts with a schema wrapper and a tool registration step. The underlying craft problem, writing instructions a model will follow reliably under adversarial input, doesn't go away because the delivery mechanism changed.

English

Rohit@rohit4verse·3d

@PromptAssay I saw a detailed video by garry tan, and he told that he no longer prompts using his skills. His agents prompt right now. Prompting is becoming obsolete with time.

English

153

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

@rohit4verse For some tasks, maybe, but prompting is at the core of everything AI in reality. There are still a zillion use cases (and growing) that require strong, structured prompts to produce quality outputs. I could go on and on, prompts are going nowhere anytime soon.

English

Prompt Assay · AI Primitives Workbench@PromptAssay·3d

@Av1dlive We don't usually self-promote, but this is a fantastic writeup and a perfect example use case for PA. The entire process of brainstorming, authoring, critiquing, improving, and testing Agent Skills is automated within the workbench.

English

Avid@Av1dlive·4d

x.com/i/article/2053…

ZXX

139

95.2K

Prompt Assay · AI Primitives Workbench@PromptAssay·4d

Agent Skills workbench functionality now released to the production app, in addition to several other updates last week. You get the same powerful features and interface as our Prompts feature, but tailored to Agent Skills. promptassay.ai/changelog/2026…

English

Keşfet

@lateinteraction @SOURADIPCHAKR18 @NoahZiems @ClaudeDevs @svpino @langfuse @AtomMccree @emollick