Alpár Kertész

1.4K posts

Alpár Kertész banner
Alpár Kertész

Alpár Kertész

@Criticality47

Psychologist studying how AI changes attention, trust, and self-direction. Writing for people who want cleaner judgment and signal over hype.

Romania Katılım Aralık 2013
912 Takip Edilen260 Takipçiler
Sabitlenmiş Tweet
Alpár Kertész
Alpár Kertész@Criticality47·
AI is not just changing what people can do. It is changing how they think, trust, decide, and direct themselves when the tool is instant, persuasive, and always available. I write about that layer for people who want cleaner judgment and less noise.
English
0
0
2
845
Alpár Kertész
Alpár Kertész@Criticality47·
@xai File list is the button I want here. If Grok can make images, videos, and automations in one run, show what it touched before the build feels finished.
English
0
0
0
587
xAI
xAI@xai·
Grok Build is now available in Beta for all SuperGrok and X Premium+ users. Use Plan Mode, create images and videos with Imagine, and build automations or orchestrators with the CLI. Visit x.ai/cli to get started.
English
585
940
5K
3.1M
Alpár Kertész
Alpár Kertész@Criticality47·
@AlexFinn Question list before plan mode is the whole trick. The agent stops guessing your feature shape and starts showing what it still needs from you.
English
0
0
0
128
Alex Finn
Alex Finn@AlexFinn·
This has sped up my AI coding 20x (prompt at the end): Before building out a big feature, ask Codex/Claude Code to ask you as many questions it needs to fully plan out the idea This is even better than plan mode. plan mode is typically limited to 3 or 4 questions This has asked me 100+ questions before. Seems like a lot but actually saves you time in the long run The plan it builds will be so detailed and complete that it can basically run autonomously and build the entire thing But here's where you take things to the next level: You also have it take your entire plan and create detailed Linear issues for it It should create 20+ tasks in Linear Then it's as easy as saying "ok work on the next thing" over and over until the feature is done Highly recommend downloading and using Linear if you haven't yet. Amazing project management tool w/ excellent free tier Will basically capture all these details and put your agent on autopilot. It's a 2nd brain. Use this prompt: "I want to build out *describe your feature in detail*. Ask as many questions you need of my to fully understand every detail of what I want to build out. Then take everything you learn, and create super focused and detailed Linear issues. Then begin work" Getting so much more high quality code out with this workflow. You're welcome.
English
52
13
301
16.3K
Alpár Kertész
Alpár Kertész@Criticality47·
@orca_build Status list over the terminal pane is the useful bit. With 10 agents running, the blocked column saves you from opening every worktree like a little detective.
English
0
0
1
61
Orca ADE
Orca ADE@orca_build·
your coding agents are Kanban cards now 😯. New in Orca: open a board over any terminal pane and drag each agent worktree between statuses. todo, in progress, review, testing, blocked, done, or whatever custom columns fit your workflow. much easier when you have 10+ agent running across different features.
English
10
6
62
5.1K
Alpár Kertész
Alpár Kertész@Criticality47·
@lennysan @danshipper The queue is where automation becomes work again. Someone still has to notice when the little green checkmark starts lying.
English
0
0
0
27
Lenny Rachitsky
Lenny Rachitsky@lennysan·
.@danshipper: "Automation is a lie. Every time you automate something, you need a human on top of it, making sure that it continues working."
Lenny Rachitsky@lennysan

Automation is a lie. CLIs are over. The SaaSpocalypse is dumb. A year ago @danshipper came on the podcast to predict where AI was heading. He was remarkably right—including the call that everyone was sleeping on Claude Code. Dan has a unique lens into where things are going because his team at @every is possibly the most AI-pilled group of people in tech. I always learn a ton talking to Dan. So I brought him back for round two. We'll score these in exactly a year: 🔸 Every company will have one “super-agent” in Slack. 🔸 Codex and Claude Code will become the new operating system for knowledge work. 🔸 The AI job apocalypse is not happening. 🔸 PMs and designers will thrive. 🔸 We will read way more AI-generated writing and we will like it. 🔸 "I would buy SaaS stocks right now." Listen now 👇 youtube.com/watch?v=4D3hDm…

English
36
13
154
26.9K
Alpár Kertész
Alpár Kertész@Criticality47·
The rephrase button is not as neutral as it looks. When a model cleans up the sentence, it can also clean up the hesitation, anger, doubt, or weird little detail that made the thought honest. I want the edit trail before the prettier version: what changed, what got softened, and what I might want back.
Alpár Kertész tweet media
English
0
0
0
3
Alpár Kertész
Alpár Kertész@Criticality47·
@clairevo Browser smoke test is the receipt I’d want here. Once Codex is touching app code and 4,000 emails, show the red-test row before I type ok what’s next
English
0
0
0
615
claire vo 🖤
claire vo 🖤@clairevo·
I have been coding for over 20 years (!!!) and I’m sitting here, mouth agape, watching codex: - planned full refactor of core app, published in a pretty html for my review and co-authorship - iterating through loops to code piece by piece, document and update architecture plans as it goes - every loop does a browser smoke test of new features, identifies and fixes functional and visual regressions (even ones not related to the code!) - maintains lints and tests - my job is to type “ok what’s next” and occasionally auth integrations oh and on the side his buddy codex is 45 minutes into a /goal of cleaning up 4,000 emails in my inbox
English
34
11
261
18K
Alpár Kertész
Alpár Kertész@Criticality47·
@aakashgupta Red rows beat the 500-trace wall. If Claude can make the first scoring rubric, PMs lose the “someday evals” excuse and have something ugly to check tomorrow.
English
0
0
0
19
Aakash Gupta
Aakash Gupta@aakashgupta·
The reason 99% of AI agents ship without evals has nothing to do with technical complexity. The activation energy was too high. Reading 500 traces manually, categorizing failures by hand, writing scoring rubrics from scratch. Most PMs looked at that workload and shipped without measuring anything. Aparna just collapsed that entire sequence to three terminal commands. Build the agent, instrument it with a skill, ask Claude to suggest the eval. Under an hour from zero to a measured, traceable PM agent with priority scoring evals running across every span. The part that changes the game: you take the eval failures, feed them into a loop skill on a cron job, and the agent starts fixing itself on a daily cadence. Eval failures trigger prompt changes. Prompt changes generate new traces. New traces produce better evals. The cycle runs while you sleep. She threw out a stat that stuck: if you're a PM who has tracing set up and is actually looking at your evals, you're probably in the top 1% right now. The bar is that low because the old process was that painful. Claude Code just turned a two-week setup into an afternoon project.
Aakash Gupta tweet media
Aakash Gupta@aakashgupta

She literally broke down how to run evals in Claude Code (built the whole thing live): 01:34 - What people get wrong with evals 04:35 - Why product taste is the alpha now 09:28 - Building a PM agent from one prompt 19:00 - Instrumentation without writing code 22:00 - Watching traces stream in live 28:00 - Getting Claude to write your first eval 33:58 - When vibe evals work and when they don't 48:50 - The self-improving loop (this part is wild) 01:03:00 - Same-day shipping is real 01:06:00 - The context graph unlock

English
8
3
22
5.4K
Alpár Kertész
Alpár Kertész@Criticality47·
@argofowl File title is where Deep Research trips. The report did work; the saved name makes every old tab look like mystery meat.
English
0
0
1
33
🥔🥔🥔
🥔🥔🥔@argofowl·
why don't chatgpt deep research reports save with a proper title instead of "deep-research-report.md" so dumb
English
3
0
46
2.3K
Alpár Kertész
Alpár Kertész@Criticality47·
@cifilter Subagent notes are only cheaper when the note is smaller than the mess. Otherwise Codex keeps one giant thread because splitting means writing directions and merging answers.
English
0
0
1
890
Shannon Potter
Shannon Potter@cifilter·
I'm dumb, so bear with me: using subagents should reduce overall token usage because you don't have one gigantic context window/thread doing everything? So why does Codex seem to never want to use them unless I tell it to?
English
31
0
103
34.1K
Alpár Kertész
Alpár Kertész@Criticality47·
@mark_k @OpenAI Codex tab becoming the front door only works if memory, files, and run history move with it. Otherwise ChatGPT just gets a new lobby.
English
0
0
1
164
Mark Kretschmann
Mark Kretschmann@mark_k·
It's because @OpenAI kinda gave up on ChatGPT and decided to focus on Codex instead. Codex will gain more and more ChatGPT features until it's a complete replacement, or "SuperApp" as they call it internally. What the official name of the combined app will be is still unclear.
Aryan Siddiqui@Ar_boian

It’s astonishing how little @OpenAI ChatGPT product experience has changed. If they had seriously worked on just memory and proactiveness, their growth and retention would be a lot more.

English
47
17
477
50K
Alpár Kertész
Alpár Kertész@Criticality47·
@EricTopol @ejosipcar Triage screens need a bias warning before they learn from who got seen fastest. Otherwise the clinic queue just teaches the model the old queue.
English
0
0
0
73
Eric Topol
Eric Topol@EricTopol·
The Inverse Care Law. The people who need medical care the most tend to get the least access. It will take deliberate and extensive efforts for medical AI not to exacerbate health inequities, by @ejosipcar We've seen some examples where AI reduced inequities and need to build on that. thelancet.com/journals/lance…
English
15
37
134
17.6K
Alpár Kertész
Alpár Kertész@Criticality47·
The export button looks boring until the chat becomes part diary, part project history, part emotional junk drawer. If AI is going to become memory-adjacent, people need a way to leave with their notes, not just hope the account never disappears.
Alpár Kertész tweet media
English
0
0
0
10
Alpár Kertész
Alpár Kertész@Criticality47·
@shannholmberg Markdown files are the part I'd keep staring at. If agents read brain first and write back overnight, I want the tiny diff that shows what the graph decided to believe.
English
0
0
0
17
Shann³
Shann³@shannholmberg·
What´s gBrain and how does it work? I've been using gStack for a while when ideating, validating new projects, and some coding now I'm experimenting with gBrain as the memory layer for my agents, starting with my Hermes Agent company gBrain is an open-source persistent memory layer for AI agents (by @garrytan). it turns your emails, meetings, tweets, voice memos, and docs into a typed knowledge graph. essentially markdown in, graph out. how it works: > 1. ingest signals from your daily life > 2. extract entities + create typed links (works_at, invested_in, attended) > 3. store as Markdown + Postgres + pgvector > 4. retrieve via hybrid search (keyword + vector + graph) > 5. agents read brain first, write insights back, graph builds itself an overnight dream cycle dedupes entities, repairs links, and updates the compiled truth
Shann³ tweet media
English
26
18
236
21.5K
Alpár Kertész
Alpár Kertész@Criticality47·
@tenobrus Settings file is the tiny landmine. If chats are gold, the app needs an export reminder before the 30-day shredder runs.
English
0
0
0
443
Tenobrus
Tenobrus@tenobrus·
this is ur regular public service announcement that Claude Code by default *permanently deletes* session files after they're 30 days old. i strongly recommend u set `cleanupPeriodDays` to 9999 in settings.json to retain this very valuable data #available-settings" target="_blank" rel="nofollow noopener">code.claude.com/docs/en/settin…
Patrick McKenzie@patio11

If the *only* impact of LLMs professionally was causing people to "think out loud" in a way which was routinely captured by computer systems and then could be operated on by computer systems, that would *by itself* be one of the most consequential changes in practice in 100 years

English
35
52
1K
109.9K
Alpár Kertész
Alpár Kertész@Criticality47·
@emollick Prompt-shaped paragraph is the giveaway. No typo, no weird little example, no sentence that could only come from being annoyed for five minutes.
English
2
0
5
410
Ethan Mollick
Ethan Mollick@emollick·
As more people come to recognize the tells of AI, which mostly happens as you start to work with AI a lot, the scales are going to fall from their eyes and they are going to realize what some of us already see: how much of this site (and blog posts, articles, papers) are AI now.
English
138
112
1.4K
86.1K
Alpár Kertész
Alpár Kertész@Criticality47·
@minchoi Attempt log is the interesting bit. Put the failed route and compute bill beside each solved theorem so people can tell research agent from lucky batch run.
English
0
0
0
230
Alpár Kertész
Alpár Kertész@Criticality47·
@IamEmily2050 Gemini Omni is a joke unfortunately... I was prepared that it will be better then Veo... but makes mistakes like the first/second generation video gen's ... good for laugh tho... Also the logo on the video's corner and the images stop me from using it altogether...
English
0
0
0
44
Emily
Emily@IamEmily2050·
Hopefully, Gemini Omni Pro next month will not just have SOTA video generation and editing but also a 20 second option and hopefully SOTA image. I do believe Gemini Omni Flash can do images, but the quality is lower than Nano banana Pro/Nano banana V2, which is why it was not enabled.
English
20
3
79
5.8K
1SecretCyborg
1SecretCyborg@mudkip_sir·
@hobincus Nici pana-n prezent nu am reusit sa termin Ninja Gaiden, mai greu dacat Contra
GIF
Română
1
0
2
227
Alexandru Hobincu
Alexandru Hobincu@hobincus·
Generatia mea a fost blestemata in proportie foarte mare de parinti alcoolici. Nu o sa intru in detaliu despre ce copilarie am avut, probabil multi dintre voi ati avut aceleasi traume. Dar una din putinele amintiri frumoase pe care le am ... a fost cand tatal meu a intrat in casa cu un astfel de dispozitiv. Cred ca era prin iarna anului 1998? daca imi aduc bine aminte.... Brusc nu mai conta nimic ... eram doar eu si cele 99 de nivele de la Mario. Nu stiam cand este zi sau noapte, daca am mancat sau daca ultima zgarietura din genunchi ma mai durea sau nu. Nu stiu daca stiti, dar jocurile vechi au dezvoltat in generatia noastra aceasta rezilienta similara cu cea a unui gandac de bucatarie atunci cand dam de greu :) La noi nu exista optiunea de "Save" sau "Load Game" ... Ori erai destul de bun sa treci toate nivelurile la "Prince of Persia" din prima ... ori daca din greseala mureai fix la boss-ul final ... o luai de la capat. Asa s-a construit o generatie care a cazut in picioare ori de cate ori s-a dus dracu ceva in viata lor. Credeti sau nu ... jocurile anilor 90 sunt complet responsabile pentru cine suntem noi acum si trebuie sa le multumim. Cheers SEGA & Nintendo
Alexandru Hobincu tweet media
Română
39
10
293
7.8K