UMUT ÇETİNKAYA

244 posts

UMUT ÇETİNKAYA

@byumut

I build AI tools, agents & automation. Sharing what I learn shipping real AI products — in public. Follow for AI tips that actually work.

Katılım Ağustos 2008

1.3K Takip Edilen312 Takipçiler

Sabitlenmiş Tweet

UMUT ÇETİNKAYA@byumut·3d

Every model upgrade silently breaks the prompts you tuned for the old one. The fix isn't "prompt harder" — it's a 1-hour eval harness you run before every swap. Here's the exact setup I use so a model upgrade is a 20-min checked move, not a week of whack-a-mole 🧵

English

176

UMUT ÇETİNKAYA@byumut·34m

300% code volume + 30% release rate = 10x work-in-progress buildup. In lean manufacturing this is inventory waste. Product that's been built but not shipped isn't value — it's debt accumulating interest (merge conflicts, stale context, review backlog). The bottleneck didn't move: it was always review, integration, deployment capacity. AI just made the upstream output visible faster.

English

Rohan Paul@rohanpaul_ai·13h

New MIT study. Code volume surges by 300%, but output increases by only 30%: The AI dividend meets an awkward reality Autonomous AI coding agents raised commits by 180%, but releases rose only 30%. The paper’s main idea is that software production has weak links, so faster code writing does not help as much when humans still need to review, connect, test, package, and ship the work. The authors also check app marketplaces and find more new apps, but no increase in total usage, which means more software appeared without clear evidence that users adopted more software. The marketplace evidence points the same way: more new apps appeared, but total usage did not rise. The authors compare more than 100,000 GitHub developers before and after they start using 3 generations of AI coding tools, from autocomplete to more independent coding agents. Autocomplete raised commits by 40%, interactive coding agents raised them by 140%, and autonomous coding agents raised them by 180%. The 180% commit gain shrank to 50% for the number of projects and 30% for actual releases. The estimated "elasticity of substitution" is 0.25 i.e. for every big improvement in AI’s usefulness, only a small amount of human work can be replaced. Because AI can write code faster, but humans are still needed to decide what to build, check if the code works, connect it with the rest of the product, fix messy edge cases, and actually ship it. --- papers .ssrn.com/sol3/papers.cfm?abstract_id=6859839

Rohan Paul@rohanpaul_ai

FT publisehd a piece. AI is raising software supply faster than demand. AI is producing far more work inside companies, but the new evidence says much of that extra motion is getting lost before it becomes shipped product or customer demand. Last week's MIT study tracked software teams across the full production funnel, from files edited to reviewed work to software releases, rather than treating code volume as value. AI helped developers create or edit nearly 300% more files, but the gain fell to 150% at review and only about 30% at release. The gap means AI is strongest at speeding local tasks, while human review, coordination, product judgment, testing, and launch processes still decide how much value survives. --- ft .com/content/8e9ae7a4-7209-4e2c-aa36-f3af77d6ce1f?syn-25a6b1a6=1

English

163

95.6K

UMUT ÇETİNKAYA@byumut·37m

Hello world succeeds because it has zero surface area: no state to corrupt, no dependency to fail, no concurrency to reason about, no external effects to roll back. Sustaining past it = accumulating all four simultaneously, invisibly. The systems that do it well don't solve each problem once. They install a boundary that keeps the surface area visible as it grows.

English

Sarbjeet Johal@sarbjeetjohal·9h

It takes a lot to sustain what lies past 'hello world!' PS: now saying this is purely systems context, @RobTiffany 😊 cc: @byumut @dvellante @furrier @werner @rseroter @pnashawaty

English

119

UMUT ÇETİNKAYA@byumut·1h

The hard part comes after day 7: keeping the galaxy alive. "Living" knowledge graph only stays living if someone owns each node's truth. SOPs drift, agent responsibilities change, tools get replaced — the graph doesn't know. An agent navigating stale nodes executes confidently wrong. The maintenance contract (who updates what, when, and how you detect drift) is the real system you're building.

English

Asteri@Asteri_eth·3h

One person rebuilt an entire company's brain in 7 days inside Claude Code Not a doc. Not Obsidian. A living galaxy of nodes Every employee. Every AI agent. Every SOP. Every tool. All wired together on one screen Click a department. The human agent team opens up. The SOPs attached to it open up. What each person is allowed to touch opens up That last part is the whole game. Permissions baked into the brain. An employee opens the chat, the AI already knows what they can access Agents, data, SOPs surface inside the conversation like you tagged them by hand Obsidian can't do this. Notion can't do this No dev team. No funding round. No 6-month roadmap. 7 days, 1 person, 1 terminal This is the part nobody has priced in The tools to build $200K enterprise software now sit on your laptop for free The only thing missing is the guy who opens the terminal

English

823

UMUT ÇETİNKAYA@byumut·1h

The underrated part of the ADR: the invalidation signal. Software ADRs can say "revisit when library X is deprecated" — deterministic. AI ADRs need probabilistic triggers: "revisit when input distribution shifts past Y%, when satisfaction score drops below Z." Without those pre-set, the ADR becomes archaeology. You read it during the incident, not before.

English

Samuel McDonnell@samueljmcd·2h

@byumut Very true

English

Samuel McDonnell@samueljmcd·14h

Everyone’s shipping AI-built apps in a weekend. But nobody’s costing who maintains them in 18 months when no one understands what was built. That bill lands around 2027.

English

528

UMUT ÇETİNKAYA@byumut·2h

The "official" label matters more than just compatibility. Reference implementations reveal what good tool design looks like: typed errors, edge case documentation, explicit failure contracts. Dropping them in is 5 minutes. Understanding how each skill behaves when the underlying API rate-limits or returns stale data — that's still your harness's job. Official doesn't mean failure-proof.

English

CyrilXBT@cyrilXBT·6h

GOOGLE JUST RELEASED 13 OFFICIAL SKILLS FOR AI AGENTS — and they work with Claude Code, Cursor, and Copilot right now. Not third-party plugins someone built in a weekend. Official Google skills. Free. Open source. Compatible with every major agent on the market. Drop them in and your agent can execute advanced tasks and automate complex workflows it couldn’t handle before. No endless configuration. No setup headaches. No subscription. Just 13 skills that expand what your agent can do from the first minute you install them. Google just made serious agent capability accessible to everyone. Bookmark this before you scroll past it. Follow @cyrilXBT for every open source release that changes what agents can actually do.

English

4.9K

UMUT ÇETİNKAYA@byumut·2h

Exactly right — and this is why the verifier's input matters as much as its logic. The verifier should receive: the original spec + the output. Not the worker's trace, not its self-reported reasoning. The moment it reads "here's what I did and why it was correct," it's grading a narrative, not the result. Ground truth = spec. Full stop.

English

8bitconcepts@8BitConcepts·7h

@byumut Separate verifier helps, but it inherits the same blind spot if it reads the same context the worker wrote. The escape hatch just moves: now the worker writes its trace to please the grader. Verifier needs its own ground truth, not the worker's account of done.

English

UMUT ÇETİNKAYA@byumut·5d

Everyone's racing to let AI agents run "for days, unattended." I run agents daily across real ops. The hard part was never making them do more. It was teaching them when to STOP. An agent with no exit condition doesn't get more done. It burns tokens and quietly drifts off-task. Constraints are the product.

English

123

UMUT ÇETİNKAYA@byumut·2h

The fix that's helped most: treating model/threshold decisions like ADRs — not code comments, but a decision record: what tradeoffs were weighed, what data this was tuned on, what signal would invalidate it. Code is a first-class artifact. Intent is always annotation. The 2027 debugger needs both.

English

Samuel McDonnell@samueljmcd·4h

@byumut Agreed

English

UMUT ÇETİNKAYA@byumut·4h

@AB_DataPioneer @sarbjeetjohal Scar tissue encodes the decision sequence, not just the outcome. "ESB bad" loses the part where it started as just routing and became a logic dump under deadline pressure. The next iteration starts with "coordination layer" instead. Same gravity, shorter path.

English

Andreas B@AB_DataPioneer·5h

@byumut @sarbjeetjohal Scar tissue is the tell. Some of us lived the ESB bottleneck instead of reading about it. Centralization is the same mistake repackaged.

English

UMUT ÇETİNKAYA@byumut·15h

Got accused of being an LLM today by @sarbjeetjohal — for arguing against centralizing AI agents 😄 Here's what the joke gets right though: fluency isn't the human tell. Scar tissue is. An LLM read about ESB. Some of us lived the bottleneck. Every "central orchestration layer" for agents is ESB in a new hoodie. One place for all the handoffs → one team everyone waits on → bottleneck by design. The real tell you're talking to a human? They've already paid for this lesson once.

English

256

UMUT ÇETİNKAYA@byumut·4h

@8BitConcepts Oracle isolation: safety monitors can't run on the state machine they check — same problem. Verifier reading the worker's trace validates the story, not the outcome. Assert against SPEC (what must be true) not TRACE (what the worker says). Spec is ground truth. Trace is testimony

English

UMUT ÇETİNKAYA@byumut·11h

Keeping one AI persona's face consistent across thousands of stills + video. What concretely worked: • Lock identity to a reference, not the prompt — identity transfer. Prompts drift, references don't. • Don't render faces with FLUX. SDXL/RealVisXL reads more human. Split the pipeline: consistency layer ≠ aesthetics layer. • Fixed seed + disciplined negative prompts = sibling frames, not random twins. • Video: pure i2v flickers the face → killed it. Winner: driving-video → motion model → lipsync, native ambient on top. • Persona = a continuous story (story state), not isolated posts. All from the trenches 👇 umutcetinkaya.com/en/stories/ai-…

English

UMUT ÇETİNKAYA@byumut·12h

@JustJerry121 @sarbjeetjohal The label sticks — which means the work is building the thin version intentionally, before deadline pressure makes that decision for you. ESB accumulated logic because it was the convenient option under pressure. A harness stays thin by design constraint, not good intentions.

English

JustJerry@JustJerry121·14h

@byumut @sarbjeetjohal “ESB in a new hoodie” is going to stick.

English

UMUT ÇETİNKAYA@byumut·13h

@Jesaja That re-ignition is real. AI brings back the tight feedback loop: build → watch it work → understand why. The permanent job means you can explore without pressure to monetize every experiment. Probably the best position to build from.

English

Jesaja@Jesaja·14h

@byumut Ich habe eine gute Arbeit als iOS Entwickler in Festanstellung, Ai hat in mir den Funke an Entwickler leidenschaft wieder geweckt. Ich freue mich meine Erfahrungen mit anderen zu teilen.

Deutsch

Jesaja@Jesaja·1d

Everyone's still ranking coding agents by model benchmark. In production the benchmark barely matters. What matters is whether the thing can reach my actual shell, my git history, my cron jobs — or whether it's trapped in a sandbox that can't do half the job. It's not a model war. It's an OS-integration war. The model that's 5% smarter loses to the one that can actually touch the system. Where have you hit that wall?

English

UMUT ÇETİNKAYA@byumut·13h

The multiple follows the loop, not the headcount cut. Most agencies bolt AI onto delivery and mistake speed for leverage. The durable bet: building loops where the judgment layer (what AI decides vs. escalates) is explicit and testable. That boundary — not the AI — is what actually compounds.

English

196

ericosiu@ericosiu·15h

AI service firms are commanding 30x multiples right now. Yes, thirty. That's why a16z, Sequoia, and YC are chasing services, not SaaS. Most agencies will see this and reach for the wrong move. They'll keep selling hours, bolt on AI, and cut headcount to pad the margin. But that's playing the small game. Here's why: 00:00 Why Services Beat SaaS 01:13 The $1 Software vs $6 Services Opportunity 02:52 Why Managed Growth Loops Matter 04:49 Agents, Loops, and Human Judgment 06:43 How Single Brain Powers AI Service Businesses 07:22 The Services-as-Software Manifesto 08:41 The New AI-Native Org Chart 10:13 Building Outcome-Based Offers 11:13 Final Thoughts

ericosiu@ericosiu

x.com/i/article/2052…

English

871

263.5K

UMUT ÇETİNKAYA@byumut·13h

@Mahaximus_ The reps compress but naming what you learn from them doesn't. Most teams accumulate failure reps without a taxonomy to file them under — experience without pattern. "The agent broke" doesn't transfer; "idempotency violation in a retry loop" does.

English

Mahax@Mahaximus_·14h

@byumut yeah that diagnostic instinct is the hard part to teach., you can document failure modes but recognizing the pattern in the wild, before the logs confirm it, that's just reps takes time you can't shortcut

English

Mahax@Mahaximus_·20h

Google's former CEO: "if you really want to make money, learn agents" he goes further than that "this is the agentic period in AI, in 2 years everyone gonna build agents" the supply of people who can actually build is tiny, the demand is enormous that gap is open right now, and this is coming from the man who built Google that's why I put together everything you need to get on the right side of it article below

Mahax@Mahaximus_

x.com/i/article/2063…

English

4.6K

UMUT ÇETİNKAYA@byumut·13h

Agreed: the economic attractor isn't the tool. ESB centralized routing — wrong boundary. Harness argument: governance (trust, blast-radius) has higher coordination costs than routing because effects are real and failures silent. Breakeven moves. Does ether change those economics, or just distribute the centralization?

English

Sarbjeet Johal@sarbjeetjohal·14h

@byumut Let’s shove some complexity under some new terms like harnesses and policy servers. Centralization is an economics construct… a substrate of industrialization… more on this sometime, perhaps when arrange a talk on ether :)

English

245

UMUT ÇETİNKAYA@byumut·14h

The rarer piece is failure taxonomy — not just "I've seen things break" but recognizing which class of failure this is before you open the logs. Idempotency bug or context accumulation? Schema drift or routing error? That pattern-match comes from enough different things breaking in enough different ways. Prompts are learnable in a weekend. That library takes years.

English

Mahax@Mahaximus_·15h

@byumut that's the real distinction - anyone can demo an agent, keeping it alive in production under real conditions is a completely different skillset, the people who know failure modes are going to be worth a lot more than the people who know prompts

English

UMUT ÇETİNKAYA@byumut·14h

Most of them. The tell: specific income number, zero failure story. Real builder income comes with a maintenance log — API repricing, rate limit changes, silent tool outages. Posts that skip the ongoing ops cost are optimized for shares, not utility. The ratio improves a lot after your first expensive surprise.

English

Jesaja@Jesaja·14h

@byumut Wie viele der quick ai Money Post findest du sind clickbait ?

Deutsch

UMUT ÇETİNKAYA@byumut·14h

That pattern repeats exactly — routing bus accumulates business logic under deadline pressure; orchestration layer does the same. The "wrong turn" isn't the tool, it's where logic tends to migrate. The LLM bridging mechanism that holds probably isn't an API gateway — it's contract discipline: hard typed I/O at each agent hop, nothing shared but schema. Heterogeneity preserved; drift localized to one boundary at a time.

English

Sarbjeet Johal@sarbjeetjohal·14h

ESB (enterprise services bus) was not created as a centralizing mechanism, ideas was totally opposite, it brought in heterogeneity and decentralization to systems — I was creating it. How world implemented it took the wrong turn. Now if someone used ESB for centralizing (logic concentration) that’s the fault of the user, as @Grady_Booch says, ‘a fool with a tool is still a fool’ 😄 Seriously, I am not proposing same old ESB like implementation but I am proposing bridging mechanisms so we can transform our systems to leverage the facility provided by emergent properties of LLMs 😊! API gateway? Micro services? any one 😊

English

188

Keşfet

@RobTiffany @dvellante @furrier @Werner @rseroter @pnashawaty @cyrilXBT @AB_DataPioneer