Sergii Guslystyi

1.2K posts

Sergii Guslystyi

@JuiceSharp

Software Architect • AI Systems that Work • Productivity tools | Chess • Father on the journey🙏✨

Florida, USA Katılım Ocak 2010

239 Takip Edilen320 Takipçiler

Sergii Guslystyi@JuiceSharp·8h

@badlogicgames Best possible way

English

Mario Zechner@badlogicgames·12h

People of pi.dev. I want to make this the new default. No setting. Not much value in read showing the first X lines, as long as we still show offset/limit if given by the model. Mo minimal, mo better. github.com/earendil-works… Speak now, or be silent forever.

English

423

43K

Sergii Guslystyi@JuiceSharp·1d

7th to the table rpiv-pi.com. The article answers why SDD and I know exactly the answer on Pi + SDD = ? … we will see some benefits quite soon.

AlphaSignal AI@AlphaSignalAI

x.com/i/article/2057…

English

175

Sergii Guslystyi@JuiceSharp·1d

@AlphaSignalAI Sure, must admit he is quite over productive recently …

English

AlphaSignal AI@AlphaSignalAI·1d

@JuiceSharp Yess! But in some use cases, he maybe paving the way to the moon as well. The point is that we have to extract the best from each one.

English

AlphaSignal AI@AlphaSignalAI·2d

x.com/i/article/2057…

ZXX

294

53.4K

Sergii Guslystyi@JuiceSharp·1d

@AlphaSignalAI 6th was added by mistake… one to SDD like from earth to the moon. But he is paving the road….

English

AlphaSignal AI@AlphaSignalAI·2d

Repos: Spec Kit (GitHub): github.com/github/spec-kit BMAD-METHOD (BMad Code): github.com/bmad-code-org/… OpenSpec (Fission AI): github.com/Fission-AI/Ope… GSD (TÂCHES): github.com/gsd-build/get-… Superpowers (Jesse Vincent): github.com/obra/superpowe… Skills (Matt Pocock): github.com/mattpocock/ski… Papers: 1. Spec-Driven Development: From Code to Contract (Piskala, 2025) arxiv.org/abs/2602.00180 2. Intent Formalization (Lahiri, Microsoft Research, 2026) arxiv.org/abs/2603.17150 3. From Code Review to Spec-Driven Contracts (AIware 2026 vision paper, conference July 2026) openreview.net/pdf?id=WC7WAcg… 4. The Productivity-Reliability Paradox (Farrag, University of East London, 2026) arxiv.org/abs/2605.01160

Français

Sergii Guslystyi@JuiceSharp·2d

@badlogicgames People who support Ukraine got 10X to their karma... keep up the even more respect for you!

English

221

Mario Zechner@badlogicgames·3d

many ppl don't know this.

English

376

35.9K

Sergii Guslystyi@JuiceSharp·4d

Why likely? We all do this from time to time, so we pay some price. I would reframe the issue a bit. There are not many people capable of pushing back on AI's advice ... exactly because the expertise isn't there. Another issue: there's no magic machine that captures the context (humans talking to each other, institutional knowledge). So the only way to progress is to scaffold proactively, poking and engaging the devs in the decision-making process along the way. AI pushes the human to engage and address ambiguities. Human pushes back on AI's proposed decisions. Model alone cannot do that. Generic all-purpose harnesses ship the primitives but not the placement, and placement is domain-specific. So there's a layer on top, skills and workflows, that's load-bearing. This is what works in practice, for those who grasp it.

English

Karthik Hariharan@hkarthik·4d

Large code bases have a lot of undocumented quirks and features. Just last week, I came across two conflicting opinions from senior engineers on how to implement something, and a third approach proposed by Claude Opus 4.7. Finding the optimal path forward required judgement, expertise, and talking to actual humans to identify trade offs and make a decision. This is definitely a weak area for agentic coding right now. The scary part is, many engineers may not be aware of this reality and are likely shipping slop every day.

English

8.3K

Sergii Guslystyi@JuiceSharp·4d

@ForgeBuildsIt Any good workflow solves that exhaust …

English

Forge@ForgeBuildsIt·4d

Specs are not the whole truth of a build. As your project grows, architecture evolves, features morph, and surface gaps emerge mid-build. Your tools completely disregard that or bury context in chat history, no binding effect on implementation. Forge workflow solves that.

Thariq@trq212

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

English

198

Sergii Guslystyi@JuiceSharp·4d

@fjzeit On an organizational level, I would prefer a good scaffolding to make an average guy deliver… does not mean there is no room for some stars :)

English

fj@fjzeit·4d

@JuiceSharp we all have the same tools. the differentiator is ability.

English

fj@fjzeit·4d

No number of tokens will outperform individual ability... if you have the ability.

English

1.9K

Sergii Guslystyi@JuiceSharp·4d

This anthropic's representative post shows the limits of their current approach in prompting. That's exactly why "wrapper builders" have good odds. The highest-leverage primitive in agentic engineering isn't a smarter model. It's the structured pause - the ability to ask before committing. A post-hoc "I picked X because Y" is exhaust (still helpful pattern), a postmortem journal of decisions the model already made in one forward pass. By the time you read it, X is already an import, a schema, a public type. And no matter how detailed the spec, ambiguities exist. I learned that the hard way. The reason labs can't quietly absorb this layer is structural. Every autonomous coding eval scores the model on completing without asking, so asking is a leaderboard loss. But that's the symptom. The deeper structure: a forward pass is just generation... It can't pause itself. Deterministic control flow (pauses, gates, checkpoints) lives outside it ... reliability comes from architecture, not instruction. The harness has the opposite gradient. The user is the eval and the driver, not the benchmark. The harness can enforce pauses because pauses are control flow, not sampled tokens. An opinionated metaflow (like mine rpiv-pi.com, @dexhorthy's CodeLayer/HL, the many other flows enthusiasts built on top of the models) is uncompressible into a general purpose API. Not because the model can't learn workflow shapes, but because workflows are control flow around the model. This is a product competition on even ground. Not a model eating a layer.

Thariq@trq212

English

154

Sergii Guslystyi@JuiceSharp·5d

@badlogicgames In your previous life, you used to have weekends …

English

270

Mario Zechner@badlogicgames·5d

fun debugging session. pi is accessing xiaomi endpoints via the anthropic messages API. they seem to have changed their endpoint to now require the non-standard openai-completions `reasoning_content` field, breaking their anthropic endpoint :/

Xiaomi MiMo@XiaomiMiMo

Heads up, agent users！ If you're using Xiaomi MiMo with thinking mode: When thinking mode is enabled in a multi-turn agent session and the conversation history contains a tool call, any assistant message with tool calls passed back in subsequent user turns must preserve its full reasoning_content field — otherwise the API will return a 400 error. Without it, the model's context is incomplete, which can lead to weaker instruction-following, more hallucinations, and a visibly degraded user experience. Missing reasoning = incomplete context = degraded reasoning quality. Affected frameworks include TRAE, Cursor, Roo Code, Codex, GitHub Copilot CLI, Zed, AutoGen. We're actively working with the maintainers to push compatibility updates. Affected models: MiMo-V2.5-Pro, MiMo-V2.5, MiMo-V2-Pro, MiMo-V2-Omni, MiMo-V2-Flash. See docs(platform.xiaomimimo.com/docs/en-US/usa… )for more details.

English

10.7K

Sergii Guslystyi@JuiceSharp·17 May

@0xblacklight Good stuff thank you

English

Kyle Mistele 🏴‍☠️@0xblacklight·16 May

seems like folks liked the article on forking so I turned it into a blog post humanlayer.com/blog/context-f…

English

118

6.4K

Sergii Guslystyi retweetledi

九原客@9hills·15 May

pi 有个功能我很喜欢，当Agent在运行时，你再给他发消息，既不会打断运行，也不会排队到Agent运行完毕。而是在Agent下一次tool call之前插入，这样可以灵活的给一个long-running的agent 注入指令。比如我这个主Agent老是要自己写代码，我就给他发个规则：禁止主Agent自己写代码和做测试。

中文

15.7K

Sergii Guslystyi@JuiceSharp·15 May

@mattpocockuk Length is just a cost, not a quality dimension. Judging skills by length is something like judging books by weight.

English

Matt Pocock@mattpocockuk·15 May

Long skills are such a red flag to me - Hard to audit (and therefore, trust) - Hard to edit (more text, harder to maintain) - Expensive to run (more text, more tokens) The shorter the skill, the better IMO

English

147

1.4K

85.2K

Sergii Guslystyi@JuiceSharp·15 May

Skills are prompts. Prompts are noisy. We can't tell from the inside whether a skill change actually helped or just felt different. So we run A/Bs. Wrote up the recipe: parallel arms, locked answers, blinded LLM judges, per-cell aggregation. rpiv-pi.com/blog/how-we-te…

English

293

Sergii Guslystyi@JuiceSharp·15 May

There are too many skills on the market that help shape/collect requirements, and very few that ask grounded questions. Even fewer do it well. This part is critical for existing codebases. Yesterday I tried to make one even better, so I ran an A/B over a SAGE-based /discover variant. Baseline /discover came out firmly ahead. The deciding factor was scope discipline: /discover keeps the FRD on the ask you actually made, instead of quietly drifting into adjacent work. /discover is one of the best interview skills I've ever seen, and it naturally fits as an optional entry point to a research/design pipeline. rpiv-pi.com/blog/discover-…

English

116

Sergii Guslystyi@JuiceSharp·14 May

@mattshumer_ lemmings don't care ...

English

Matt Shumer@mattshumer_·14 May

Everyone switching from Markdown to HTML is missing the nuance. The optimal approach (most of the time): If it's for a human to read, yeah, use HTML. BUT If an agent is consuming it, use Markdown!

English

113

371

47.2K

Sergii Guslystyi@JuiceSharp·13 May

@andresv_io @noahzweben No no no - huge SDK credit simplification …

English

104

Andres Vergara@andresv_io·13 May

@noahzweben This is actually limiting usage once again. Forcing subscribers to essentially use interactive mode only is just driving people to codex where we are not limited as to how we use the subscription we are paying for.

English

Noah Zweben@noahzweben·13 May

We're launching a huge SDK credit simplification on June 15: * Agent SDK/-p draws from a new bucket instead of your interactive limits (which stay exactly the same) * Every plan comes with $20-$200 of monthly SDK credits * You can use 3P tools like T3/OC with these credits

ClaudeDevs@ClaudeDevs

Starting June 15, paid Claude plans can claim a dedicated monthly credit for programmatic usage. The credit covers usage of: - Claude Agent SDK - claude -p - Claude Code GitHub Actions - Third-party apps built on the Agent SDK

English

157

333

144.1K

Sergii Guslystyi@JuiceSharp·13 May

@dabit3 @DevinAI rpiv-pi.com

QME

nader dabit@dabit3·13 May

I'm giving away five $200 Max plans for @DevinAI If you're using tools like Claude Code, Codex, or Cursor and haven't tried @DevinAI, comment below with what you're building to be eligible. ⚡️

ellis@DriscollEllis

all you nerds hyperventilating about your claude code config would be gobsmacked by the sophistication devin has had for months. cc is catching up but still relatively unserious

English

641

638

109.4K

Keşfet

@badlogicgames @AlphaSignalAI @ForgeBuildsIt @fjzeit @dexhorthy @elonmusk @BarackObama @taylorswift13