Prompt Assay · AI Primitives Workbench

116 posts

Prompt Assay · AI Primitives Workbench banner
Prompt Assay · AI Primitives Workbench

Prompt Assay · AI Primitives Workbench

@PromptAssay

Ship prompts & agent skills that hold up in production. The authoring workbench: critique on six dimensions, compare across providers. BYOK on every tier.

BYOK · github.com/promptassay Katılım Nisan 2026
37 Takip Edilen16 Takipçiler
Omar Khattab
Omar Khattab@lateinteraction·
End the tyranny of on-policy algorithms in LLM post-training! Maybe the key thing isn't whether your rollouts are purely "on-policy" or not, but the extent to which they’re pedagogically useful. Early explorations into newer paradigms for RL by @SOURADIPCHAKR18* @NoahZiems*:
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
7
16
125
10.4K
Prompt Assay · AI Primitives Workbench
@ClaudeDevs The pre-warm only sticks if you hit the same region and the cache hasn't expired. Anthropic's default TTL is 5 minutes, so if your traffic is sparse enough that gaps exceed that, the warm request is just paying the write multiplier for nothing. Unless I'm misunderstanding.
English
0
0
5
738
ClaudeDevs
ClaudeDevs@ClaudeDevs·
Useful tip to cut time-to-first-token on longer prompts in the API: pre-warm the prompt cache. Send your system prompt before the user prompt. Claude writes it to the cache, but skips generating any output. When the real user request lands, it'll hit a warm cache.
ClaudeDevs tweet media
English
109
229
3.8K
390.5K
Prompt Assay · AI Primitives Workbench
@svpino Curious where the ceiling is for you. I've found subagent decomposition works cleanly until the tasks need shared mutable state. Then you're basically writing distributed systems concurrency logic inside a prompt.
English
0
0
0
238
Santiago
Santiago@svpino·
I'm now *subagent-pilled* with Claude Code. This is the biggest update I've made in the way I work over the last 2 weeks: "Everything that can be a subagent should be a subagent." Some of the advantages: 1. Each subagent has its own context window, so running multiple of them in parallel won't pollute your session. 2. I can use different models for different subagents. Much better than running everything with the same model. 3. I can configure different tools for different subagents. This also helps keep your session context clean.
English
29
8
93
15.7K
Prompt Assay · AI Primitives Workbench
@lateinteraction The label signal doing double duty is the interesting part. RLVR already tells you which rollouts were correct · using that to fit a proposal distribution instead of uniform-sampling the base model is just not wasting information you already paid for.
English
0
0
0
13
Omar Khattab
Omar Khattab@lateinteraction·
ICYMI: read the blog on Pedagogical RL Instead of sampling blindly from your LLM, leverage the label used for RLVR! Learn to directly approximate the distribution of your LLM's plausible rollouts that are actually correct. Then sample from *that*! noahziems.com/pedagogical-rl
Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

English
7
12
81
5.9K
Prompt Assay · AI Primitives Workbench
@langfuse One thing I'd add to any loop like this: rubric drift. Evals that ran clean six months ago keep returning green while the failure modes that have shown up since aren't in the criteria anymore. Versioning the rubric as carefully as the prompt is the unsexy half.
English
1
0
0
49
langfuse.com
langfuse.com@langfuse·
Building high-quality AI systems is hard. At Langfuse we see the best AI teams converging on a process to get complex AI systems to production. We call it the AI Engineering Loop. Check out the first piece of our series and find out more in our academy
Annabell Schaefer@annabellschfr

x.com/i/article/2054…

English
1
4
19
37K
Æ
Æ@AtomMccree·
@PromptAssay Bait and switch actually but I needed it to build.
English
1
0
1
16
Ethan Mollick
Ethan Mollick@emollick·
I don't understand the path forward for Mythos releases. Google & OpenAI will have equivalent models, and they are approaching AI cyber risk guardrails differently, so they will presumably just release their versions. How does Anthropic get out of the government approval path?
English
86
21
533
55.1K
Prompt Assay · AI Primitives Workbench
After Mini Shai-Hulud, we rebuilt our security audit prompt to answer two questions, not one: "where could we be attacked" AND "have we already been attacked?" New `ioc-hunt` mode produces an IR-shaped report with dwell-time timeline and blast radius. Free, prompt below 👇
English
1
1
4
139
Prompt Assay · AI Primitives Workbench
And you can start by building comprehensive Skills inside PA. Skills bundle together the flat skill markdown file, scripts, and references. Brainstorm, critique and score, improve, and run behavioral evals across multiple provider models. If you can about skills, this is it.
Avid@Av1dlive

x.com/i/article/2053…

English
0
0
2
54
Prompt Assay · AI Primitives Workbench
Provider lock-in is a real cost, but the harder dependency to break is prompt structure tuned to one model's quirks. Migrating to a new provider and keeping a prompt that was written around GPT-5's instruction-following assumptions usually means a rewrite anyway.
English
0
0
3
351
Santiago
Santiago@svpino·
Working with a single model is a recipe for disaster. Do not marry yourself to one LLM provider. They can pull the rug out from under you and break your application overnight. Here is an alternative to access 400+ models with a single API key. This is how you stay flexible.
English
19
14
114
23.9K
Prompt Assay · AI Primitives Workbench
Discussions on this today, worth posting. "Skills" are just prompts with a schema wrapper and a tool registration step. The underlying craft problem (writing instructions a model will follow reliably under adversarial input) doesn't go away because the delivery model changes.
English
0
0
3
47
Prompt Assay · AI Primitives Workbench
@rohit4verse Another way to look at it - "Skills" are just prompts with a schema wrapper and a tool registration step. The underlying craft problem, writing instructions a model will follow reliably under adversarial input, doesn't go away because the delivery mechanism changed.
English
0
0
2
19
Rohit
Rohit@rohit4verse·
@PromptAssay I saw a detailed video by garry tan, and he told that he no longer prompts using his skills. His agents prompt right now. Prompting is becoming obsolete with time.
English
2
0
1
153
Prompt Assay · AI Primitives Workbench
@rohit4verse For some tasks, maybe, but prompting is at the core of everything AI in reality. There are still a zillion use cases (and growing) that require strong, structured prompts to produce quality outputs. I could go on and on, prompts are going nowhere anytime soon.
English
0
0
1
34
Prompt Assay · AI Primitives Workbench
@Av1dlive We don't usually self-promote, but this is a fantastic writeup and a perfect example use case for PA. The entire process of brainstorming, authoring, critiquing, improving, and testing Agent Skills is automated within the workbench.
English
0
0
2
37