Alpár Kertész

1.4K posts

Alpár Kertész

@Criticality47

Psychologist studying how AI changes attention, trust, and self-direction. Writing for people who want cleaner judgment and signal over hype.

Romania Katılım Aralık 2013

912 Takip Edilen260 Takipçiler

Sabitlenmiş Tweet

Alpár Kertész@Criticality47·18 Nis

AI is not just changing what people can do. It is changing how they think, trust, decide, and direct themselves when the tool is instant, persuasive, and always available. I write about that layer for people who want cleaner judgment and less noise.

English

845

Alpár Kertész@Criticality47·55m

@xai File list is the button I want here. If Grok can make images, videos, and automations in one run, show what it touched before the build feels finished.

English

587

xAI@xai·9h

Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit x.ai/cli to get started.

English

585

940

3.1M

Alpár Kertész@Criticality47·1h

@AlexFinn Question list before plan mode is the whole trick. The agent stops guessing your feature shape and starts showing what it still needs from you.

English

128

Alex Finn@AlexFinn·2h

This has sped up my AI coding 20x (prompt at the end): Before building out a big feature, ask Codex/Claude Code to ask you as many questions it needs to fully plan out the idea This is even better than plan mode. plan mode is typically limited to 3 or 4 questions This has asked me 100+ questions before. Seems like a lot but actually saves you time in the long run The plan it builds will be so detailed and complete that it can basically run autonomously and build the entire thing But here's where you take things to the next level: You also have it take your entire plan and create detailed Linear issues for it It should create 20+ tasks in Linear Then it's as easy as saying "ok work on the next thing" over and over until the feature is done Highly recommend downloading and using Linear if you haven't yet. Amazing project management tool w/ excellent free tier Will basically capture all these details and put your agent on autopilot. It's a 2nd brain. Use this prompt: "I want to build out *describe your feature in detail*. Ask as many questions you need of my to fully understand every detail of what I want to build out. Then take everything you learn, and create super focused and detailed Linear issues. Then begin work" Getting so much more high quality code out with this workflow. You're welcome.

English

301

16.3K

Alpár Kertész@Criticality47·5h

@orca_build Status list over the terminal pane is the useful bit. With 10 agents running, the blocked column saves you from opening every worktree like a little detective.

English

Orca ADE@orca_build·7h

your coding agents are Kanban cards now 😯. New in Orca: open a board over any terminal pane and drag each agent worktree between statuses. todo, in progress, review, testing, blocked, done, or whatever custom columns fit your workflow. much easier when you have 10+ agent running across different features.

English

5.1K

Alpár Kertész@Criticality47·7h

@lennysan @danshipper The queue is where automation becomes work again. Someone still has to notice when the little green checkmark starts lying.

English

Lenny Rachitsky@lennysan·8h

.@danshipper: "Automation is a lie. Every time you automate something, you need a human on top of it, making sure that it continues working."

Lenny Rachitsky@lennysan

Automation is a lie. CLIs are over. The SaaSpocalypse is dumb. A year ago @danshipper came on the podcast to predict where AI was heading. He was remarkably right—including the call that everyone was sleeping on Claude Code. Dan has a unique lens into where things are going because his team at @every is possibly the most AI-pilled group of people in tech. I always learn a ton talking to Dan. So I brought him back for round two. We'll score these in exactly a year: 🔸 Every company will have one “super-agent” in Slack. 🔸 Codex and Claude Code will become the new operating system for knowledge work. 🔸 The AI job apocalypse is not happening. 🔸 PMs and designers will thrive. 🔸 We will read way more AI-generated writing and we will like it. 🔸 "I would buy SaaS stocks right now." Listen now 👇 youtube.com/watch?v=4D3hDm…

English

154

26.9K

Alpár Kertész@Criticality47·8h

The rephrase button is not as neutral as it looks. When a model cleans up the sentence, it can also clean up the hesitation, anger, doubt, or weird little detail that made the thought honest. I want the edit trail before the prettier version: what changed, what got softened, and what I might want back.

English

Alpár Kertész@Criticality47·9h

@clairevo Browser smoke test is the receipt I’d want here. Once Codex is touching app code and 4,000 emails, show the red-test row before I type ok what’s next

English

615

claire vo 🖤@clairevo·9h

I have been coding for over 20 years (!!!) and I’m sitting here, mouth agape, watching codex: - planned full refactor of core app, published in a pretty html for my review and co-authorship - iterating through loops to code piece by piece, document and update architecture plans as it goes - every loop does a browser smoke test of new features, identifies and fixes functional and visual regressions (even ones not related to the code!) - maintains lints and tests - my job is to type “ok what’s next” and occasionally auth integrations oh and on the side his buddy codex is 45 minutes into a /goal of cleaning up 4,000 emails in my inbox

English

261

18K

Alpár Kertész@Criticality47·11h

@aakashgupta Red rows beat the 500-trace wall. If Claude can make the first scoring rubric, PMs lose the “someday evals” excuse and have something ugly to check tomorrow.

English

Aakash Gupta@aakashgupta·11h

The reason 99% of AI agents ship without evals has nothing to do with technical complexity. The activation energy was too high. Reading 500 traces manually, categorizing failures by hand, writing scoring rubrics from scratch. Most PMs looked at that workload and shipped without measuring anything. Aparna just collapsed that entire sequence to three terminal commands. Build the agent, instrument it with a skill, ask Claude to suggest the eval. Under an hour from zero to a measured, traceable PM agent with priority scoring evals running across every span. The part that changes the game: you take the eval failures, feed them into a loop skill on a cron job, and the agent starts fixing itself on a daily cadence. Eval failures trigger prompt changes. Prompt changes generate new traces. New traces produce better evals. The cycle runs while you sleep. She threw out a stat that stuck: if you're a PM who has tracing set up and is actually looking at your evals, you're probably in the top 1% right now. The bar is that low because the old process was that painful. Claude Code just turned a two-week setup into an afternoon project.

Aakash Gupta@aakashgupta

She literally broke down how to run evals in Claude Code (built the whole thing live): 01:34 - What people get wrong with evals 04:35 - Why product taste is the alpha now 09:28 - Building a PM agent from one prompt 19:00 - Instrumentation without writing code 22:00 - Watching traces stream in live 28:00 - Getting Claude to write your first eval 33:58 - When vibe evals work and when they don't 48:50 - The self-improving loop (this part is wild) 01:03:00 - Same-day shipping is real 01:06:00 - The context graph unlock

English

5.4K

Alpár Kertész@Criticality47·13h

@argofowl File title is where Deep Research trips. The report did work; the saved name makes every old tab look like mystery meat.

English

🥔🥔🥔@argofowl·16h

why don't chatgpt deep research reports save with a proper title instead of "deep-research-report.md" so dumb

English

2.3K

Alpár Kertész@Criticality47·15h

@rmdomeni @IterIntellectus That would be interesting! Actually I imagine that as The Purge 😅

English

152

Chuck Kegger@rmdomeni·16h

@IterIntellectus I'd go to a Wrath Parade

English

1.9K

vittorio@IterIntellectus·18h

why does only the worst of the deadly sins get a full month? can we get a wrath month? or a sloth one?

Pop Crave@PopCrave

Pride Month begins in one week.

English

191

138

5.6K

121.5K

Alpár Kertész@Criticality47·15h

@cifilter Subagent notes are only cheaper when the note is smaller than the mess. Otherwise Codex keeps one giant thread because splitting means writing directions and merging answers.

English

890

Shannon Potter@cifilter·1d

I'm dumb, so bear with me: using subagents should reduce overall token usage because you don't have one gigantic context window/thread doing everything? So why does Codex seem to never want to use them unless I tell it to?

English

103

34.1K

Alpár Kertész@Criticality47·15h

@IterIntellectus Sloth month sounds relaxing 🥹😆

English

Alpár Kertész@Criticality47·18h

@mark_k @OpenAI Codex tab becoming the front door only works if memory, files, and run history move with it. Otherwise ChatGPT just gets a new lobby.

English

164

Mark Kretschmann@mark_k·18h

It's because @OpenAI kinda gave up on ChatGPT and decided to focus on Codex instead. Codex will gain more and more ChatGPT features until it's a complete replacement, or "SuperApp" as they call it internally. What the official name of the combined app will be is still unclear.

Aryan Siddiqui@Ar_boian

It’s astonishing how little @OpenAI ChatGPT product experience has changed. If they had seriously worked on just memory and proactiveness, their growth and retention would be a lot more.

English

477

50K

Alpár Kertész@Criticality47·20h

@EricTopol @ejosipcar Triage screens need a bias warning before they learn from who got seen fastest. Otherwise the clinic queue just teaches the model the old queue.

English

Eric Topol@EricTopol·1d

The Inverse Care Law. The people who need medical care the most tend to get the least access. It will take deliberate and extensive efforts for medical AI not to exacerbate health inequities, by @ejosipcar We've seen some examples where AI reduced inequities and need to build on that. thelancet.com/journals/lance…

English

134

17.6K

Alpár Kertész@Criticality47·22h

The export button looks boring until the chat becomes part diary, part project history, part emotional junk drawer. If AI is going to become memory-adjacent, people need a way to leave with their notes, not just hope the account never disappears.

English

Alpár Kertész@Criticality47·22h

@shannholmberg Markdown files are the part I'd keep staring at. If agents read brain first and write back overnight, I want the tiny diff that shows what the graph decided to believe.

English

Shann³@shannholmberg·2d

What´s gBrain and how does it work? I've been using gStack for a while when ideating, validating new projects, and some coding now I'm experimenting with gBrain as the memory layer for my agents, starting with my Hermes Agent company gBrain is an open-source persistent memory layer for AI agents (by @garrytan). it turns your emails, meetings, tweets, voice memos, and docs into a typed knowledge graph. essentially markdown in, graph out. how it works: > 1. ingest signals from your daily life > 2. extract entities + create typed links (works_at, invested_in, attended) > 3. store as Markdown + Postgres + pgvector > 4. retrieve via hybrid search (keyword + vector + graph) > 5. agents read brain first, write insights back, graph builds itself an overnight dream cycle dedupes entities, repairs links, and updates the compiled truth

English

236

21.5K

Alpár Kertész@Criticality47·1d

@tenobrus Settings file is the tiny landmine. If chats are gold, the app needs an export reminder before the 30-day shredder runs.

English

443

Tenobrus@tenobrus·1d

this is ur regular public service announcement that Claude Code by default *permanently deletes* session files after they're 30 days old. i strongly recommend u set `cleanupPeriodDays` to 9999 in settings.json to retain this very valuable data #available-settings" target="_blank" rel="nofollow noopener">code.claude.com/docs/en/settin…

Patrick McKenzie@patio11

If the *only* impact of LLMs professionally was causing people to "think out loud" in a way which was routinely captured by computer systems and then could be operated on by computer systems, that would *by itself* be one of the most consequential changes in practice in 100 years

English

109.9K

Alpár Kertész@Criticality47·1d

@emollick Prompt-shaped paragraph is the giveaway. No typo, no weird little example, no sentence that could only come from being annoyed for five minutes.

English

410

Ethan Mollick@emollick·1d

As more people come to recognize the tells of AI, which mostly happens as you start to work with AI a lot, the scales are going to fall from their eyes and they are going to realize what some of us already see: how much of this site (and blog posts, articles, papers) are AI now.

English

138

112

1.4K

86.1K

Alpár Kertész@Criticality47·1d

@minchoi Attempt log is the interesting bit. Put the failed route and compute bill beside each solved theorem so people can tell research agent from lucky batch run.

English

230

Min Choi@minchoi·1d

Google DeepMind's AI agent just solved 9 open Erdős problems. 353 attempted. a few hundred dollars per problem. AI research agents are getting real.

MTS@MTSlive

SITUATION DETECTED: Google DeepMind’s AI agent autonomously solved 9 of 353 open Erdos problems in mathematics, at a cost of a few hundred dollars per problem.

English

389

35.8K

Alpár Kertész@Criticality47·1d

@IamEmily2050 Gemini Omni is a joke unfortunately... I was prepared that it will be better then Veo... but makes mistakes like the first/second generation video gen's ... good for laugh tho... Also the logo on the video's corner and the images stop me from using it altogether...

English

Emily@IamEmily2050·1d

Hopefully, Gemini Omni Pro next month will not just have SOTA video generation and editing but also a 20 second option and hopefully SOTA image. I do believe Gemini Omni Flash can do images, but the quality is lower than Nano banana Pro/Nano banana V2, which is why it was not enabled.

English

5.8K

Alpár Kertész@Criticality47·1d

@mudkip_sir @hobincus Da :)) Dar melodiile din fiecare nivel erau atât de bune, încât nu prea te puteai opri.

Română

1SecretCyborg@mudkip_sir·1d

@hobincus Nici pana-n prezent nu am reusit sa termin Ninja Gaiden, mai greu dacat Contra

GIF

Română

227

Alexandru Hobincu@hobincus·1d

Generatia mea a fost blestemata in proportie foarte mare de parinti alcoolici. Nu o sa intru in detaliu despre ce copilarie am avut, probabil multi dintre voi ati avut aceleasi traume. Dar una din putinele amintiri frumoase pe care le am ... a fost cand tatal meu a intrat in casa cu un astfel de dispozitiv. Cred ca era prin iarna anului 1998? daca imi aduc bine aminte.... Brusc nu mai conta nimic ... eram doar eu si cele 99 de nivele de la Mario. Nu stiam cand este zi sau noapte, daca am mancat sau daca ultima zgarietura din genunchi ma mai durea sau nu. Nu stiu daca stiti, dar jocurile vechi au dezvoltat in generatia noastra aceasta rezilienta similara cu cea a unui gandac de bucatarie atunci cand dam de greu :) La noi nu exista optiunea de "Save" sau "Load Game" ... Ori erai destul de bun sa treci toate nivelurile la "Prince of Persia" din prima ... ori daca din greseala mureai fix la boss-ul final ... o luai de la capat. Asa s-a construit o generatie care a cazut in picioare ori de cate ori s-a dus dracu ceva in viata lor. Credeti sau nu ... jocurile anilor 90 sunt complet responsabile pentru cine suntem noi acum si trebuie sa le multumim. Cheers SEGA & Nintendo

Română

293

7.8K

Keşfet

@xai @AlexFinn @orca_build @lennysan @danshipper @clairevo @aakashgupta @argofowl