Jefferson Andres Espejo Goez

1.9K posts

Jefferson Andres Espejo Goez

@Dev_Mirror

Full Stack developer. Cat's lover :3

Medellín Katılım Haziran 2020

722 Takip Edilen260 Takipçiler

Jefferson Andres Espejo Goez retweetledi

Gentleman Programming@G_Programming·1d

Both shipped for one reason: to help juniors and seniors actually use AI without burning out or burning tokens. gentle-ai: you ask, it resolves. Correct workflows, token optimization and resource best practices, all automated. Agent-agnostic, works with whatever you use. engram: MCP memory server with a queryable registry. Context survives across agents, and it searches the registry before re-reading code (massive token savings). Hundreds of hours of open source grind. What made it worth it: messages from people saying these tools saved their job or helped them land one, and companies running both in production. Built primarily with GPT 5.5 + different models for each SDD phase. github.com/Gentleman-Prog… github.com/Gentleman-Prog…

English

324

13.9K

Jefferson Andres Espejo Goez retweetledi

Vaishnavi@_vmlops·3d

HARNESS ENGINEERING IS ABOUT TO CHANGE HOW YOU USE AI AGENTS Anthropic ran a controlled experiment. same model, same prompt, opus 4.5 no harness: $9 spent, 20 minutes, unusable output full harness: $200 spent, 6 hours, a game you could actually play the model didn't change... the environment around it did that environment has a name... it's called a harness and most people building with ai agents have never built one here's what it actually is: → instructions the agent reads before touching anything → state that persists so it never starts from zero → verification gates it can't skip to declare done → scope that locks it to one feature at a time → a session lifecycle so every run starts clean and ends clean without this, your agent writes code, says "done," and breaks everything. with this, it picks up where it left off, finishes what it started, and proves it before moving on learn-harness-engineering is a free course built around exactly this 12 lectures. 6 hands-on projects. one real app that evolves as your harness skills grow if you're using claude code or codex on real work and the output still feels unreliable now you know why github.com/walkinglabs/le…

English

682

41.3K

Jefferson Andres Espejo Goez retweetledi

Selva 🌳@selva_marion·3d

[🚨 POR FAVOR AYÚDAME A DIFUNDIR — BUSCO UN ENSAYO CLÍNICO EN CUALQUIER PARTE DEL MUNDO 🌍] Mi nombre es Selva Alvarez, tengo 35 años, vivo en Colombia y soy Científica de Datos 📊. Hoy no les escribo solo como una paciente con cáncer gástrico avanzado (Estadio IV) 🎗️, sino como alguien que acaba de encontrar un dato científico único en su propio cuerpo que podría cambiarlo todo 🧬. Ayer, 21 de mayo de 2026, mi reporte de patología reveló algo extraordinario y muy escaso en el mundo: mi tumor dio POSITIVO para una proteína llamada CLAUDINA 18.2 🔬. Para explicarlo de forma sencilla: la Claudina 18.2 es como una "cerradura" 🔐 muy rara que solo tienen algunos tumores en el mundo. La ciencia médica ya inventó una "llave" 🔑 exacta para esa cerradura (un medicamento avanzado llamado Zolbetuximab o Vyloy), que es capaz de atacar directamente las células enfermas sin destruir el resto del cuerpo 🎯. El problema es que este medicamento no es comercial en Colombia 🇨🇴. Mi oncólogo me confirmó que mi única oportunidad real de acceder a esta "llave" es que una farmacéutica o un centro de investigación me reciba en un Ensayo Clínico Nacional o Internacional o en un programa de acceso expandido ✈️. Al ser un perfil de paciente tan escaso y específico en el planeta, sé que hay laboratorios y científicos en el mundo buscando activamente a personas como yo para sus estudios de medicina de precisión 🩺. Quiero que el mundo sepa que estoy lista 💪: mis órganos vitales (hígado y riñones) están completamente sanos, fuertes y funcionando a la perfección, listos para resistir el tratamiento 🫁. Tengo la juventud, la fuerza biológica y la determinación absoluta de pelear por mi vida y aportar a la ciencia 🦾. Hago un llamado directo a la comunidad científica global (#OncoTwitter 🩺), a laboratorios como @AstellasUS 🏢, y a investigadores de cualquier país que lideren estudios sobre Claudina 18.2: Aquí hay una paciente joven con el biomarcador idóneo 📑. Por favor, ayúdenme con un RT (compartir) 🔁. Un solo clic de ustedes puede hacer que este mensaje cruce fronteras y llegue al comité científico o a la farmacéutica que me pueda patrocinar 💌. Mis mensajes directos (DM) están abiertos para enviar de inmediato mi historial médico, resultados de los marcadores, bloques de patología o cualquier cosa que se necesite 📤. Colombia 🇨🇴 | Contacto: DM abierto 📩. #GastricCancer #Zolbetuximab #ClinicalTrials #PrecisionMedicine #OncoTwitter #Claudina18 #CancerGastrico #EnsayosClinicos #MedicinaDePrecision #Vyloy

Español

235

6.5K

5.8K

434.4K

Jefferson Andres Espejo Goez retweetledi

Suryansh Tiwari@Suryanshti777·4d

Andrej Karpathy just explained the future of software engineering without directly saying it. The best AI engineers are no longer “prompting.” They’re building systems around the agents. Karpathy’s biggest insight wasn’t: “Claude can code.” It was: LLMs become dramatically better when you force them into disciplined workflows. That’s why "CLAUDE.md" files are suddenly everywhere. Not because they’re prompts. Because they behave like an operating system for the agent. Karpathy called out the exact problems with AI coding: - models assume instead of asking - they overengineer simple tasks - they hide confusion - they rewrite unrelated code - they optimize for completion, not correctness So developers started encoding rules directly into the workflow: → Think before coding → Simplicity first → Surgical edits only → Goal-driven execution And the results are wild. People are now running multiple Claude Code agents in parallel like engineering teams: • one agent researching • one debugging • one writing tests • one optimizing code • one validating outputs Not “AI assistance.” Actual orchestration. And this part from Karpathy changes everything: “Don’t tell the model what to do. Give it success criteria and let it loop.” That is the shift. From: “write this function” To: “here’s the goal, constraints, tests, and verification system — now iterate until correct.” The craziest part? This already feels like a phase shift in engineering. A lot of developers quietly went from: 80% manual coding → to 80% agent-driven coding in just months. Not because AI became perfect. Because the leverage became impossible to ignore. We’re entering an era where the highest leverage engineers won’t necessarily be the best coders. They’ll be the people who build the best systems around AI agents.

Suryansh Tiwari@Suryanshti777

x.com/i/article/2053…

English

500

3.8K

722.1K

Jefferson Andres Espejo Goez retweetledi

CyrilXBT@cyrilXBT·5d

🚨 ANTHROPIC JUST KILLED THE DEMO AGENT ERA. Their Agents team showed exactly what production grade looks like. Not theory. Not a tutorial. A four layer framework for multi agent systems built to actually work in the real world. 30 minutes. This is the video I wish existed 6 months ago.

CyrilXBT@cyrilXBT

x.com/i/article/2056…

English

548

133.7K

Jefferson Andres Espejo Goez retweetledi

Movez@0xMovez·5d

Microsoft Senior AI developer just showed how they build AI agents with Claude at Microsoft. 34-minutes. free. By Microsoft team Opus 4.7 + 1,400+ pre-built MCP tools plug Claude into agent → give it tools → ship to production worth more than any $500 vibe-coding course.

Movez@0xMovez

Spotify's Chief Architect just showed how they ship 4,5K deployments /day with Claude at Anthropic stage 27-minutes. free. By #1 music app dev "More than 99% of our engineers use AI coding tools. Adoption took off after Opus 4.5" Worth more than any $500 vibe-coding course.

English

110

804

5.4K

1.7M

Jefferson Andres Espejo Goez@Dev_Mirror·6d

@nicos_ai @G_Programming Alan. Este problema es el mismo que se soluciona con Gentle AI usando los comandos de SDD? 🙏🏽🙏🏽

Español

Nico@nicos_ai·18 May

GitHub acaba de solucionar el mayor problema del vibe coding. Acaban de lanzar Spec Kit y en días ya tiene +95K estrellas. ¿La idea? En vez de tirar prompts vagos y rezar para que el agente no rompa tu proyecto… Spec Kit obliga a la IA a crear una especificación estructurada ANTES de tocar código. La IA primero entiende lo que quieres construir, pregunta lo que falta, organiza el proyecto y después empieza a programar. Eso significa menos tiempo arreglando errores absurdos, menos código inconsistente y resultados mucho más predecibles cuando trabajas con agentes. El flujo es simple: /constitution → reglas y estándares /specify → qué quieres construir /clarify → dudas antes de empezar /plan → arquitectura y stack /tasks → tareas ordenadas /implement → ejecución Compatible con Claude Code, Cursor, Copilot, Codex, Gemini CLI y +25 agentes. 95K estrellas. 8K forks. Open source. Publicado por GitHub. Repositorio 👇

Español

342

3.3K

451.7K

Jefferson Andres Espejo Goez retweetledi

Ronin@DeRonin_·18 May

Andrej Karpathy: "90% of Claude's mistakes come from missing context, not a weak model." 41% mistake rate without a CLAUDE.md. 11% with the 4-rule baseline. 3% with the 12-rule version below here are the 12 rules senior engineers settled on: 1. think before coding: state assumptions, don't guess. the model can't read your mind, stop hoping it will 2. simplicity first: minimum code, no speculative abstractions. the moment you let Claude add "for future flexibility," you've added 200 lines you'll delete next quarter 3. surgical changes: touch only what you must. don't let it improve adjacent code, that's how PRs blow up 4. goal-driven execution: define success criteria upfront, loop until verified. without them Claude either loops forever or stops too early 5. use the model only for judgment calls: classification, drafting, summarization, extraction. NOT routing, retries, status-code handling, deterministic transforms. if code can answer, code answers 6. token budgets are not advisory: per-task 4000, per-session 30000. by message 40 of a long debug, Claude is re-suggesting fixes you rejected at message 5 7. surface conflicts, don't average them: two patterns in the codebase? pick one. Claude blending them is how errors get swallowed twice 8. read before you write: read exports, callers, shared utilities. Claude will happily add a duplicate function next to an identical one it never read 9. tests verify intent, not just behavior: a test that can't fail when business logic changes is wrong. all 12 of Claude's tests can pass while the function returns a constant 10. checkpoint every significant step: Claude finished steps 5 and 6 on top of a broken state from step 4. nobody noticed for an hour 11. match the codebase conventions: class components? don't fork to hooks silently. testing patterns assumed componentDidMount, hooks broke them without surfacing 12. fail loud: "completed successfully" with 14% of records silently skipped is the worst class of bug. surface uncertainty, don't hide it what actually compounds instead of the next framework: - the CLAUDE.md file as institutional memory across sessions - eval-driven changes, not vibe-driven - checkpoints over speed - explicit conflicts over silent blending - discipline over framework, every time - one repo, one rules file, no exceptions be a few rules ahead of AI twitter before this becomes mass-opinion study this

Ronin@DeRonin_

anybody who uses or learns agentic systems, SHOULD READ THIS the install order I run before any new agentic project: 1. PRIVACY: direnv + a real secrets manager install direnv, then plug it into your team's password manager (1Password CLI via op run, doppler, infisical, vault, pick one) what direnv does: loads per-folder environment variables when you cd in, unloads when you cd out. the real move is wiring it into your secrets manager so credentials NEVER live in plain text on disk what this stops: - API keys accidentally committed to git history, the most common AI agent breach pattern in 2026 - credentials leaking from one project into another through your shell history - shared .env files that one teammate quietly backs up to Dropbox - secrets that survive a laptop theft because they were sitting in /Users/you/projects the part nobody mentions: most "my agent got jailbroken" stories actually trace back to one credential the agent had access to that it shouldn't have. scope keys to projects, scope projects to folders, and the blast radius of any single compromise drops dramatically I shipped 2 agents with keys in .env files before switching. the day I plugged direnv into op run I stopped having that whole class of nightmare 2. TOKENS: litellm or portkey as your model proxy one URL that fronts every AI provider (Anthropic, OpenAI, Google, Mistral, local models). all your spend flows through one place what it saves you: - response caching keyed by prompt hash, cuts your bill 30-60% on repeat tasks - automatic fallback on rate limits (Sonnet hits a 429? falls to Opus, then GPT, then your local backup, no broken users) - per-feature and per-user budget caps, block the call before it costs $200 instead of auditing it after - model routing rules, cheap tasks to Haiku, expensive ones to Opus, never the wrong way - PII redaction before requests leave your network, security side benefit the part nobody mentions: every "$4k AI bill" story I've heard ends with "we didn't have a proxy in front." this is where you put guardrails around spend BEFORE the spend happens I built my own router for 2 weeks. it took 20 minutes to replace with litellm. I will be embarrassed about this forever 3. CONTEXT: uv + git commit on every passing eval install uv (the new Python package manager, 10-100x faster than pip+venv, by the Astral team behind ruff). then commit every time an eval suite PASSES, with the model version and pass rate in the commit message what this preserves: - exact dependency set via uv.lock, you always know which packages your agent was using, no nasty surprises from a quiet update - exact prompt + code state, you can reproduce any past run from a single git hash - exact model version paired to exact pass rate, a paper trail when prod breaks weeks later - one-command rollback to a known-working state when a refactor goes sideways - a compliance story, every prompt version tied to a model version in your commit log the security side: when something blows up in prod, you want to say "the prompt was version X, model was Sonnet 4.6.1, last eval pass rate was 94%." not "I think we deployed on Tuesday?" the first is an incident report. the second is a resignation letter I've lost more agents to "I changed 3 prompts in one session and broke something" than to any actual bug 4. VISIBILITY: mitmproxy in front of every LLM call it's basically a wiretap for your agent. install it, point your agent through it, and now you see every conversation your agent has with the model in real time what actually shows up: - every silent retry your SDK sneaks in when a call fails - the full prompt being sent (including any creds you accidentally embedded) - what the model returns BEFORE your code reacts to it - exact token cost per call, per tool, per loop iteration - responses that quietly trigger your code into doing something you didn't intend, this is where prompt injection lives the part nobody talks about: if a website your agent scraped slipped instructions into its data, mitmproxy is how you SEE the moment your agent decides to follow them. without this layer, you're trusting your agent did the right thing, not verifying I shipped 3 agents before adding this. I have no honest idea what they were doing in production 5. EVALS: inspect-ai (the framework the labs actually use) an eval framework is what tells you "this agent works" with numbers instead of vibes. inspect-ai is the one Anthropic, DeepMind, and the UK AI Safety Institute use for the eval reports you read in their papers. open source, MIT licensed what your homegrown version won't have: - run the same task across 5 different models and compare scores side by side - pre-built tests for risky agent behavior (lying, manipulating, misusing tools) - proper structure for evaluating tool-using agents, not just chat - repeatable scoring, the same input always gets graded the same way - reproducible eval seeds, so a flaky test is actually flaky and not just unlucky I wrote my own eval harness 4 times across 4 projects. threw it out 4 times if you ever want to say "my agent passes safety checks" out loud, the check has to come from a framework someone else can re-run. this is that framework the move that ties this together: keep a /lessons.md in every repo. every weird agent behavior, every edge case, every config change you find at 2am, write it down you will not remember it. you'll come back in 3 weeks and the lessons file is the only reason you still know what's going on lock these 5, keep the lessons file, your next agentic system takes 2 days instead of 2 months p.s. half of "AI agent" content online is people who've never run mitmproxy on their own loop. they don't actually know what their agent is doing. they're shipping demo videos. don't be that guy

English

360

2.9K

444.3K

Jefferson Andres Espejo Goez retweetledi

Devin Jameson@devinjameson·6d

@aidenybai You need to solve UI architecture before you can solve UI testing. Foldkit Scene tests are pure, fast, and test what you actually care about. No browser, no JSDOM, no mocks. foldkit.dev/testing/scene

English

568

Jefferson Andres Espejo Goez retweetledi

Anubhav@Anubhavhing·18 May

A guy ran Karpathy's CLAUDE md across 30 codebases for 6 weeks. 🚨 Claude's mistake rate went from 41% → 11% Then he added 8 more rules and got it to 3% The rules are actually fire (even if the numbers are sus)

English

759

72.5K

Jefferson Andres Espejo Goez retweetledi

Andrej Karpathy@karpathy·6d

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

English

7.9K

11.1K

148.9K

27.1M

Jefferson Andres Espejo Goez retweetledi

Suryansh Tiwari@Suryanshti777·18 May

Claude Code feels completely different once you install this. Anthropic quietly released an official plugin called claude-code-setup and it basically turns Claude Code from “pretty good” into an actual AI dev environment. It scans your project and recommends: → hooks → skills → MCP servers → subagents → automations Then sets everything up step-by-step for you. Most people are using Claude Code completely vanilla… which is why their experience feels messy. The real power comes from the ecosystem around it. Install: /plugin install claude-code-setup@claude-plugins-official Bookmark this before you forget it.

Nainsi Dwivedi@NainsiDwiv50980

x.com/i/article/2051…

English

120

1.1K

223.4K

Jefferson Andres Espejo Goez retweetledi

Vaishnavi@_vmlops·18 May

Harness Engineering A Design Guide to Claude Code drive.google.com/file/d/1mIuy2k…

English

141

890

50.6K

Jefferson Andres Espejo Goez retweetledi

Vaishnavi@_vmlops·17 May

This is the best site on the internet to learn harness engineering walkinglabs.github.io/learn-harness-…

English

429

298K

Jefferson Andres Espejo Goez retweetledi

Matt Pocock@mattpocockuk·18 May

Is anyone doing feature flag development with agents? Not tried it, but in theory feature flagging is an alternative model to PR's to getting work on main. 1. Put it on main, disabled by a flag 2. Deploy with the rest of the system 3. Unflag to selected users early 4. Fix bugs for those users 5. Unflag to more users 6. Repeat until shipped Feels like a perfect strategy to pair with agents

English

186

826

191.2K

Jefferson Andres Espejo Goez retweetledi

Lydia Hallie ✨@lydiahallie·18 May

💯 this is why I really like Learning mode in Claude Code I personally use this for all my side projects and it keeps me so much sharper, great if you want to use Claude Code but still stay hands-on! /config → Output style → Learning

Addy Osmani@addyosmani

x.com/i/article/2055…

English

146

2.1K

313.1K

Jefferson Andres Espejo Goez retweetledi

Thariq@trq212·18 May

a prompt I've been using a lot recently: implement <SPEC> and while you do, keep a running implementation-notes.html file (or markdown) with decisions you had to make weren't in the spec, things you had to change, tradeoffs you had to make or anything else I should know

English

341

580

9.7K

807.8K

Jefferson Andres Espejo Goez retweetledi

Charlie L ⚡️@charliesbot·18 May

Este artículo de @addyosmani me encantó: "no delegues tu aprendizaje a la AI" Acerca de varios estudios, incluso algunos hechos por Anthropic, dónde analizan que los devs que usan la AI solo generar código sin analizarlo ni razonarlo. En estos estudios, solo el 40% de los individuos entendían el código y el resultado Pero en el grupo donde aquellos que la usan más clarificar conceptos y seguir teniendo dominio y entendimiento de la tarea, más del 65% entendía que habían hecho Y el artículo menciona más estudios aterrizando en conclusiones parecidas addyosmani.com/blog/dont-outs…

Español

174

6.9K

Jefferson Andres Espejo Goez retweetledi

santi@santtiagom_·18 May

Anthropic publicó un artículo sobre cómo usan Claude Code en codebases gigantes. Repos con millones de líneas, múltiples servicios y equipos enormes. El artículo muestra varios patrones que se repiten en empresas usando Claude Code: 1) CLAUDE.md -> para darle contexto e instrucciones según la parte del repo 2) Skills -> para cargar expertise solo cuando hace falta 3) Hooks -> para automatizar tareas y mejorar el sistema con el tiempo 4) MCPs -> para conectar Claude con herramientas internas y APIs 5) LSP integrations -> para encontrar funciones, definiciones y referencias con más precisión 6) Subagents -> para explorar partes del sistema en contextos separados También explican cómo Claude Code navega el código. No depende de un índice gigante del repo ni embeddings precalculados. Funciona más parecido a un developer: explora archivos, busca referencias, ejecuta comandos, analiza resultados y construye contexto mientras trabaja. Y remarcan algo importante: más contexto no siempre mejora los resultados. Por eso hablan mucho de progressive disclosure, cargar solo el contexto y expertise necesario para cada tarea. Gran parte del trabajo pasa por diseñar bien el harness y cómo organizás contexto, exploración, ejecución y coordinación entre agentes.

ClaudeDevs@ClaudeDevs

What are best practices for running Claude Code at scale? New blog post on what we've learned from teams running it across multi-million-line monorepos, decades-old legacy systems, and distributed microservices: claude.com/blog/how-claud…

Español

763

123K

Jefferson Andres Espejo Goez retweetledi

Gentleman Programming@G_Programming·14 May

Triple release: gentle-ai, Engram v1.15.12, gentle-pi 0.2.4 → 0.2.6. gentle-ai - @andresnator (#440): cambiar effort por modelo requería tocar opencode.json → ahora low/medium/high/xhigh por fase desde el SDD picker. - @adelosrc (#512): Pi 0.74.x rompió el spawn del MCP server → gentle-engram 0.1.4 con schema nuevo (command: node + launcher inline). - @mc-luisg (#520): ejemplos bilingües biaseaban respuestas al español → removidos + regression coverage. - @ManuelRomeroA (#365): echo -n "$(pwd)" rompía /sdd-* en Claude Code v2.1.113+ → reemplazado. - @Daniel20FN (#522): OpenClaw verifier apuntaba a path inexistente → al canónico ~/.openclaw/openclaw.json. Engram v1.15.12 - @yvolchkov, @ardelperal: setup Pi multi-step → engram setup pi en un comando. - @ricardoarz-dev, @aleka, @samuelcooke-cpu: agents long-lived sin project fix → ENGRAM_PROJECT y --project process-level. - @alexandervazquez98, @quirozino: data marcada synced sin confirmación → validamos accepted_seqs antes de ackear. - @deliriumlabs: save nudges Claude/OpenCode rotos → endpoints de compatibilidad restaurados. - @AshrafAKRahman: mem_save perdía contenido → validación estricta + alias backward-compatible. - @parraletz, @IrrealV, @forNerzul: relation/upsert no sincronizaba → aceptadas y retenidas. - @gabrielizalo: falla TUI OpenCode → recovery documentado. gentle-pi - 0.2.4: animación rose bloqueaba input → libera antes de terminar. - 0.2.5: registries viejos pisaban project skills → bloqueado. - 0.2.6: skill registry solo veía Compact Rules → ahora extrae de Hard Rules, Critical Rules, Critical Patterns, Voice Rules, Decision Gates y suma .opencode/skills. Upgrade: - Pi 0.74.x con MCP 0/1: gentle-ai install --agent pi - Claude Code v2.1.113+ con /sdd-* roto: re-sync - Pi + Engram: engram setup pi Gracias a la comunidad. Cada handle reportó o abrió PR. github.com/Gentleman-Prog… github.com/Gentleman-Prog…

Español

2.1K

Keşfet

@AstellasUS @nicos_ai @G_Programming @aidenybai @addyosmani @elonmusk @BarackObama @taylorswift13