고정된 트윗
Voltex
85 posts

Voltex
@VoltexGar
AI builder | helping you get what you want with AI. Real tools & workflows, drawn simple.
가입일 Haziran 2026
32 팔로잉73 팔로워

MIT PUT LLMs IN A DEBATE ROOM. THE HALLUCINATIONS DROPPED.
frandeeer's council of 18 minds isn't a gimmick. it's a 2023 paper from MIT and Google, and the finding is blunt: one confident model is the bug, a debate is the fix.
Propose → Debate → Cross-examine → Converge
- propose: several model instances answer the same question independently, each with its own reasoning.
- debate: each one reads the others' answers and defends or revises its own, over multiple rounds.
- cross-examine: disagreement is the point. they attack the weak steps instead of nodding along.
- converge: they settle on one final answer that survived the fight.
the payoff, in their words: it "significantly enhances mathematical and strategic reasoning" and "improves the factual validity of generated content, reducing fallacious answers and hallucinations." and it runs on plain black-box models, no retraining.
that is the council, proven. a single model makes uncertainty look clean and confident. a society of minds makes the disagreement explicit before you bet on the answer.
read it, then stop trusting one sure-sounding reply.

Frander@Frandeeer
English

EMBEDDINGS DON'T WORK ON CODE. A GRAPH DATABASE DOES.
jason's post dunks on "index your codebase" tools, and a paper out of NUS and Alibaba proves why. similarity search pulls chunks that look alike and misses the ones that matter.
Parse → Build the graph → Query it → Retrieve exact context
- parse: static analysis turns the repo into a graph. nodes are modules, classes, functions. no model, no embeddings in the loop.
- build: the edges are the relationships already in your code, CONTAINS, INHERITS, USES. an import is a fact, not a guess.
- query: the agent writes graph queries in Cypher to ask who calls this and what it touches. that is jason's query_graph and trace_path, one to one.
- retrieve: it pulls the exact node and its call chain, so the agent stops reading 40 files to answer one question.
the finding: similarity-based retrieval has low recall on complex code tasks. structure-aware retrieval over a graph beats it on CrossCodeEval, SWE-bench, and EvoCodeBench.
that is jason's whole post, sitting in a paper: the map is already inside your code. read the graph, not the vectors, and the token bill drops by half.
read it, then give your agent a map.

Jason Zhou@jasonzhou1993
English

There is a full business team hiding inside your Claude subscription, and almost nobody is using it that way.
Ritesh Verma is running Claude like the first five hires of a one-person company:
1. Claude Code researches ideas, ranks demand, checks competitors, builds the MVP, runs commands, fixes tests
2. Claude Design turns rough ideas into landing pages, dashboards, UI flows, and pitch visuals
3. Claude Cowork writes proposals, client docs, deliverables, and business material
4. /goal gives Claude Code a finish line, so it doesn't stop after one reply
5. Subagents split the build across API, database, frontend, auth, and integrations
He isn't asking Claude for startup ideas.
He's making Claude run the routine parts of a business:
- find the pain
- pull the exact words people use on Reddit
- check competitor reviews
- design the product before wasting weeks coding
- build the MVP with parallel agents
- start marketing before the build is finished
- turn old proposals into something reusable
None of it works if Claude walks in empty.
It needs your niche, files, offer, voice, customers, old work, and a clear definition of "done."
Give it that context and one subscription starts looking like a researcher, a writer, a strategist, a coach, and an operator.
The article below breaks that into 20 prompts you can actually run every week.
Voltex@VoltexGar
English

Ryan Lopopolo, OpenAI:
"I can let a task go for six, 12, 36 hours and still get good results."
this is the exact shift the Fable 5 hype is circling around. the model stopped being a sprinter.
human time is now the scarce resource, not code. you max out at maybe 3 sessions at once, so the whole job becomes getting yourself out of the loop: fork the work, let the agent run for hours, review the PRs it hands back.
you stop being the person who writes the code. you become the person who steers it and owns the output.
that is the real unlock. not a smarter autocomplete. a teammate that runs overnight while you sleep.
humans steer. agents execute.
Skaly_Bull@Skaly__Bull
English

Sam Altman, CEO of OpenAI:
"we had this little betting pool for when the first one-person billion-dollar company will get started."
altman is describing the ceiling. this article is the on-ramp to it.
the reason one person can even be in that bet is that the work of a whole company, research, writing, strategy, coaching, ops, now runs on agents you just talk to. his words, same interview: "you can start a startup by just talking to a bunch of agents."
and it is not a founder-only thing. "the average everyday knowledge worker can also build agents now," he says, and the barrier to start is "super low." that is the entire point of the 20 prompts. you don't hire the team. you open five chats and brief them.
last year that team was a payroll. altman is betting one person runs a billion-dollar company off it. the on-ramp is one person and $20.
who gets to hold ai is not the labs and not the big teams. it's whoever opens the five chats today.
Voltex@VoltexGar
English

Peter Steinberger, creator of OpenClaw:
"you want to keep the human in the loop, but at the same time you also want to create the agentic loop where it is very autonomous."
anatoli's post quotes steinberger's line that you shouldn't prompt agents anymore, you should design loops. on lex fridman he draws the harder half of it. the skill is not handing the whole job to the machine, it's deciding which parts of the loop still need you.
that is the post's own rule, keep a human in the loop, just not in every loop. steinberger runs three to eight agents at once. they write the boring code, the data-in-data-out plumbing he no longer reads. but anything that touches the database he still reviews, and every pull request he checks for intent before implementation. the gate stays human where it matters.
it lines up with the post's verify test too. a loop only earns autonomy when something can reject bad output. steinberger pushes to main on local tests passing, not on vibes. the check is what lets him let go.
what it means for you: going faster is not prompting faster. it's naming the goal, wiring the check, and being honest about the few steps where your judgment is still the thing that makes it good.
Anatoli Kopadze@AnatoliKopadze
English

Andrew Ng, Stanford professor, Google Brain co-founder:
"GPT-3.5 with an agentic workflow actually outperforms GPT-4."
that one result is why 0xmorty's hour of no-code beats raw horsepower. ng's team ran the same coding benchmark two ways: GPT-4 alone scored higher than GPT-3.5 alone, but GPT-3.5 wrapped in a workflow of agents passed even GPT-4. the structure beat the smarter model.
the structure he describes is the post's build. ng's multi-agent pattern is a flock of role-agents, you prompt one to act as the coder, another the designer, another the tester, and they hand work between them. 0xmorty calls them researcher, writer, editor, checker. same move: narrow roles, one job each.
and ng's other pattern is the post's step 5. instead of one agent grading itself, you add a second critic agent, one prompt says "you are the coder," the other says "you are the code reviewer." the post tells the checker "you did not write this, judge it like a skeptical outsider." a fresh agent catches what the maker talked itself past.
what it means for you: you are not buying a smarter model, you are arranging ordinary ones into a team with a checker. that is open to anyone with an hour, not just labs.
Morty@0xMortyx
English

Andrej Karpathy, ex-Director of AI at Tesla:
"suddenly everyone is a programmer because everyone speaks natural language like English."
that one sentence is why cyril's post is even possible. "bachelor's in computer science required" was a real gate back when the only way to talk to a machine was a language you needed a degree to learn. karpathy's point is that the interface changed. you now program the model in english.
so what gets filtered for changed too. the post says portfolio beats credential in 2026, and that is the same idea from the other side. when the entry skill is describing what you want and building until it works, a degree stops being proof of anything and three shipped projects start being the proof.
the stack cyril lists, python, apis, embeddings, rag, agents, deployment, is not a cs curriculum. it is what you pick up while building those three projects, in english, with the tools karpathy is describing. ai engineering roles grew 143% year over year, per the post, because the door got wider, not because the work got easier.
the gate didn't fall because someone was nice. it fell because the language of the machine became one almost everyone already speaks. that is who gets to hold ai now: whoever finishes something real.
CyrilXBT@cyrilXBT
English

JIM FAN JUST KILLED THE PROMPT ERA WITH A SINGLE AGENT
real 42-page pdf out of NVIDIA, shipped two years ago. prompts are easy. loops are hard. and an agent that runs with no human and just keeps getting better is the whole game.
Explore → Write a skill → Self-verify → Store it
- explore: an automatic curriculum picks the agent's next task on its own, so it never waits for you to prompt it.
- write a skill: every solved task becomes executable code saved to an ever-growing skill library it can pull back later.
- self-verify: a built-in loop reads environment feedback and execution errors and fixes its own code before moving on.
- no retraining: it runs on plain GPT-4 via blackbox queries, zero fine-tuning, the loop does all the work.
the payoff: 3.3x more unique items, 2.3x longer distances, key milestones up to 15.3x faster than the prior best. then it reuses that skill library in a brand-new world to solve tasks from scratch.
that is hanako's whole post, proven in a lab: start one loop, let it build a library, and the abilities compound while no one watches. the bottleneck was never the model, it was whether the work runs on its own.
42 pages. read it, then set up your first loop.

Hanako@hanakoxbt
English
Voltex 리트윗함

Cat Wu, Head of Product for Claude Code and Cowork, Anthropic:
"for everyone's job, there's always this percentage of it that's really tedious. for me, it's responding to emails."
that line is the whole premise of this guide, from the person who runs cowork. sairahul1's number is 60%: the share of a knowledge worker's day spent on production that doesn't need their brain. cowork doesn't make you faster at it, it removes you from it.
the part most people miss is what's left. wu's hope is that the agent takes the tedious slice so everyone has room for the things only they can build. the post says the same in other words: the bottleneck stops being production and becomes you, your judgment, your decisions.
the endgame in the guide, the 7am briefing and the self-running day, is exactly where she says cowork is headed: "the next big thing is proactivity. claude understands what you work on, and just sets up some of these automations for you."
one caveat she's clear on: you can't manage agents if you can't do the job yourself. you still have to know why the agent got it wrong. the folder, the templates, the about-me files, that is you teaching a coworker, not outsourcing the thinking.
Rahul@sairahul1
English
