Sergii Guslystyi

1.2K posts

Sergii Guslystyi banner
Sergii Guslystyi

Sergii Guslystyi

@JuiceSharp

Software Architect • AI Systems that Work • Productivity tools | Chess • Father on the journey🙏✨

Florida, USA Katılım Ocak 2010
239 Takip Edilen320 Takipçiler
Mario Zechner
Mario Zechner@badlogicgames·
People of pi.dev. I want to make this the new default. No setting. Not much value in read showing the first X lines, as long as we still show offset/limit if given by the model. Mo minimal, mo better. github.com/earendil-works… Speak now, or be silent forever.
English
96
5
423
43K
AlphaSignal AI
AlphaSignal AI@AlphaSignalAI·
@JuiceSharp Yess! But in some use cases, he maybe paving the way to the moon as well. The point is that we have to extract the best from each one.
English
1
0
0
36
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
@AlphaSignalAI 6th was added by mistake… one to SDD like from earth to the moon. But he is paving the road….
English
1
0
1
46
AlphaSignal AI
AlphaSignal AI@AlphaSignalAI·
Repos: Spec Kit (GitHub): github.com/github/spec-kit BMAD-METHOD (BMad Code): github.com/bmad-code-org/… OpenSpec (Fission AI): github.com/Fission-AI/Ope… GSD (TÂCHES): github.com/gsd-build/get-… Superpowers (Jesse Vincent): github.com/obra/superpowe… Skills (Matt Pocock): github.com/mattpocock/ski… Papers: 1. Spec-Driven Development: From Code to Contract (Piskala, 2025) arxiv.org/abs/2602.00180 2. Intent Formalization (Lahiri, Microsoft Research, 2026) arxiv.org/abs/2603.17150 3. From Code Review to Spec-Driven Contracts (AIware 2026 vision paper, conference July 2026) openreview.net/pdf?id=WC7WAcg… 4. The Productivity-Reliability Paradox (Farrag, University of East London, 2026) arxiv.org/abs/2605.01160
Français
1
1
9
1K
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
@badlogicgames People who support Ukraine got 10X to their karma... keep up the even more respect for you!
English
0
0
2
221
Mario Zechner
Mario Zechner@badlogicgames·
many ppl don't know this.
Mario Zechner tweet media
English
25
3
376
35.9K
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
Why likely? We all do this from time to time, so we pay some price. I would reframe the issue a bit. There are not many people capable of pushing back on AI's advice ... exactly because the expertise isn't there. Another issue: there's no magic machine that captures the context (humans talking to each other, institutional knowledge). So the only way to progress is to scaffold proactively, poking and engaging the devs in the decision-making process along the way. AI pushes the human to engage and address ambiguities. Human pushes back on AI's proposed decisions. Model alone cannot do that. Generic all-purpose harnesses ship the primitives but not the placement, and placement is domain-specific. So there's a layer on top, skills and workflows, that's load-bearing. This is what works in practice, for those who grasp it.
English
0
0
0
38
Karthik Hariharan
Karthik Hariharan@hkarthik·
Large code bases have a lot of undocumented quirks and features. Just last week, I came across two conflicting opinions from senior engineers on how to implement something, and a third approach proposed by Claude Opus 4.7. Finding the optimal path forward required judgement, expertise, and talking to actual humans to identify trade offs and make a decision. This is definitely a weak area for agentic coding right now. The scary part is, many engineers may not be aware of this reality and are likely shipping slop every day.
English
18
3
64
8.3K
Forge
Forge@ForgeBuildsIt·
Specs are not the whole truth of a build. As your project grows, architecture evolves, features morph, and surface gaps emerge mid-build. Your tools completely disregard that or bury context in chat history, no binding effect on implementation. Forge workflow solves that.
Thariq@trq212

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

English
1
0
1
198
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
@fjzeit On an organizational level, I would prefer a good scaffolding to make an average guy deliver… does not mean there is no room for some stars :)
English
1
0
1
20
fj
fj@fjzeit·
@JuiceSharp we all have the same tools. the differentiator is ability.
English
1
0
2
62
fj
fj@fjzeit·
No number of tokens will outperform individual ability... if you have the ability.
English
13
3
56
1.9K
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
This anthropic's representative post shows the limits of their current approach in prompting. That's exactly why "wrapper builders" have good odds. The highest-leverage primitive in agentic engineering isn't a smarter model. It's the structured pause - the ability to ask before committing. A post-hoc "I picked X because Y" is exhaust (still helpful pattern), a postmortem journal of decisions the model already made in one forward pass. By the time you read it, X is already an import, a schema, a public type. And no matter how detailed the spec, ambiguities exist. I learned that the hard way. The reason labs can't quietly absorb this layer is structural. Every autonomous coding eval scores the model on completing without asking, so asking is a leaderboard loss. But that's the symptom. The deeper structure: a forward pass is just generation... It can't pause itself. Deterministic control flow (pauses, gates, checkpoints) lives outside it ... reliability comes from architecture, not instruction. The harness has the opposite gradient. The user is the eval and the driver, not the benchmark. The harness can enforce pauses because pauses are control flow, not sampled tokens. An opinionated metaflow (like mine rpiv-pi.com, @dexhorthy's CodeLayer/HL, the many other flows enthusiasts built on top of the models) is uncompressible into a general purpose API. Not because the model can't learn workflow shapes, but because workflows are control flow around the model. This is a product competition on even ground. Not a model eating a layer.
Thariq@trq212

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

English
0
0
1
154
Sergii Guslystyi retweetledi
九原客
九原客@9hills·
pi 有个功能我很喜欢,当Agent在运行时,你再给他发消息,既不会打断运行,也不会排队到Agent运行完毕。 而是在Agent下一次tool call之前插入,这样可以灵活的给一个long-running的agent 注入指令。 比如我这个主Agent老是要自己写代码,我就给他发个规则:禁止主Agent自己写代码和做测试。
中文
18
9
93
15.7K
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
@mattpocockuk Length is just a cost, not a quality dimension. Judging skills by length is something like judging books by weight.
English
0
0
0
51
Matt Pocock
Matt Pocock@mattpocockuk·
Long skills are such a red flag to me - Hard to audit (and therefore, trust) - Hard to edit (more text, harder to maintain) - Expensive to run (more text, more tokens) The shorter the skill, the better IMO
English
147
51
1.4K
85.2K
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
Skills are prompts. Prompts are noisy. We can't tell from the inside whether a skill change actually helped or just felt different. So we run A/Bs. Wrote up the recipe: parallel arms, locked answers, blinded LLM judges, per-cell aggregation. rpiv-pi.com/blog/how-we-te…
English
0
0
2
293
Sergii Guslystyi
Sergii Guslystyi@JuiceSharp·
There are too many skills on the market that help shape/collect requirements, and very few that ask grounded questions. Even fewer do it well. This part is critical for existing codebases. Yesterday I tried to make one even better, so I ran an A/B over a SAGE-based /discover variant. Baseline /discover came out firmly ahead. The deciding factor was scope discipline: /discover keeps the FRD on the ask you actually made, instead of quietly drifting into adjacent work. /discover is one of the best interview skills I've ever seen, and it naturally fits as an optional entry point to a research/design pipeline. rpiv-pi.com/blog/discover-…
English
0
0
0
116
Matt Shumer
Matt Shumer@mattshumer_·
Everyone switching from Markdown to HTML is missing the nuance. The optimal approach (most of the time): If it's for a human to read, yeah, use HTML. BUT If an agent is consuming it, use Markdown!
English
113
22
371
47.2K
Andres Vergara
Andres Vergara@andresv_io·
@noahzweben This is actually limiting usage once again. Forcing subscribers to essentially use interactive mode only is just driving people to codex where we are not limited as to how we use the subscription we are paying for.
English
2
1
62
3K
Noah Zweben
Noah Zweben@noahzweben·
We're launching a huge SDK credit simplification on June 15: * Agent SDK/-p draws from a new bucket instead of your interactive limits (which stay exactly the same) * Every plan comes with $20-$200 of monthly SDK credits * You can use 3P tools like T3/OC with these credits
ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English
157
8
333
144.1K