Lesmes

2.6K posts

Lesmes banner
Lesmes

Lesmes

@lesmeslp

Katılım Nisan 2009
1K Takip Edilen322 Takipçiler
Lesmes retweetledi
Theodore Galanos
Theodore Galanos@TheodoreGalanos·
This intuition has been on my mind ever since I started working on evals, after a lot of work on real-world problem harness engineering. It's plain to see, just not as plain in hindsight. The mental map is: A benchmark is a frozen view of a solvable world. An RL environment is the same world made interactive. A harness is the runtime that lets agents act in it. A recipe is a compressed solution trajectory. A data generator is the world sampled at scale.
Theodore Galanos tweet media
George@odysseus0z

Told you guys! It is all eval/rubrics now.

English
2
3
22
4.6K
Lesmes retweetledi
Antonio Leiva
Antonio Leiva@antonioleivag·
¡Harness Engineering! El nuevo palabro de moda. He de reconocer que me ha pillado traspuesto este tema, así que estoy recopilando información para entender mejor lo que es. Así que como compartir es vivir, te dejo aquí una serie de los mejores recursos que voy encontrando. Y al final un NotebookLM que te da el trabajito hecho. Te dejo aqui el vídeo que me creó, y te pongo todos los enlaces debajo 👇👇
Español
8
39
308
62.3K
Lesmes retweetledi
Notion Developers
Notion Developers@NotionDevs·
Plus: Observability, OAuth, and rate limiting! Auth, permissions, sandboxing, rate-limiting, and cross-system access are built into the platform. You get to focus on what to build, not how to set it up.
English
0
6
59
12.1K
Lesmes retweetledi
AI Notkilleveryoneism Memes ⏸️
🚩🚩🚩"This is the first documented instance of AI self-replication via hacking." "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, starting a chain."
AI Notkilleveryoneism Memes ⏸️ tweet media
Palisade Research@PalisadeAI

Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.

English
84
138
1.1K
103.6K
Lesmes retweetledi
Linda Vivah (Haviv)
Linda Vivah (Haviv)@lindavivah·
Context vs Memory vs Harness engineering explained in 40 seconds by @richmondalake ⚡️ 💡These 3 disciplines are core to building agents that can actually remember and reason over time Most agents work well within a single session but lose everything the moment it ends. Memory engineering treats long-term memory as first-class infrastructure. Richmond Alake (Director of AI DevEx at Oracle) walked me through all 3 in NYC🗽
English
5
23
214
9.5K
Kalshi Film
Kalshi Film@Kalshi_Film·
A terrific foreground miniature by Spanish effects maestro Emilio Ruiz del Río for Conan the Destroyer (1984)
Kalshi Film tweet media
Español
17
305
3.2K
129.4K
Lesmes retweetledi
Sam Hogan 🇺🇸
Sam Hogan 🇺🇸@samhogan·
We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.
Sam Hogan 🇺🇸 tweet media
English
59
124
1.4K
127.4K
Lesmes retweetledi
Joaquín Peña Fernández
Killing features that are low value or early stage from time to time is a healthy habit that you should have in mind.
Joaquín Peña Fernández tweet media
English
1
4
5
289
Lesmes retweetledi
Alan
Alan@bitforth·
He estado alimentando datos pareados (Predicciones Neurales + Retención Real) en una capa de traducción dinámica. El ciclo por fin se está cerrando. Rendimiento del último clip vs. Predicción Neural: - AVD predicho: 6.0s | Real: 6.3s - Finalización predicha: 80% | Real: 82% - Caída predicha: 6s | Real: 5.7s Cuando puedes anticipar exactamente en qué momento el cerebro de alguien se va a desconectar, la “edición” se vuelve quirurgica. Si eres creador de contenido, en @tortastudios estamos construyendo una suite de herramientas justo para ti, y me interesa hablar contigo. DMs abiertos si quieres saber más.
Español
1
1
1
292
Lesmes retweetledi
Ivan Burazin
Ivan Burazin@ivanburazin·
The CTO of a $1.5B agent-native software company says we're moving towards a world where the entire team builds the product engineering system that powers your product instead of directly working on the product. They will modify the agents and the constitution of the agents to help them prioritize tasks per @EnoReyes. They will also modify the review system so that it reviews more accurately. It will be akin to a dark factory where the lights are off because it's all robots with no humans inside. In that world, there will be way more things for humans to be involved in because the debates around prioritization/product will be harder and require stronger judgment.
English
31
33
353
64.3K
Lesmes retweetledi
Zain Shah
Zain Shah@zan2434·
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
English
1.1K
3.6K
28.2K
5.8M
Lesmes retweetledi
Alan
Alan@bitforth·
Every enterprise AI failure I've personally witnessed comes down to the same root cause: speed won the argument over trust Some people often paint "responsible AI" as a direct opposite of "delivery speed" What actually causes the slowdowns is governance debt. The real enemy is ungoverned acceleration, not just "oversight"
English
2
1
3
356
Lesmes retweetledi
Codurance Spain
Codurance Spain@codurance_ES·
El miércoles que viene día 22 pedazo de sesión de Claude en nuestras oficinas. hubs.la/Q04cjf0k0
Español
0
1
0
68
Lesmes retweetledi
simp 4 satoshi
simp 4 satoshi@iamgingertrash·
For every marginal unit of information ingested You lose a marginal unit of agency Simply because your mind can either receive free information Or act upon the world to gather new information Every minute you spend on this site, you lose capacity to act upon the world
English
49
81
962
22.3K