Lesmes

2.6K posts

Lesmes

@lesmeslp

Katılım Nisan 2009

1K Takip Edilen322 Takipçiler

Lesmes retweetledi

Theodore Galanos@TheodoreGalanos·4d

This intuition has been on my mind ever since I started working on evals, after a lot of work on real-world problem harness engineering. It's plain to see, just not as plain in hindsight. The mental map is: A benchmark is a frozen view of a solvable world. An RL environment is the same world made interactive. A harness is the runtime that lets agents act in it. A recipe is a compressed solution trajectory. A data generator is the world sampled at scale.

George@odysseus0z

Told you guys! It is all eval/rubrics now.

English

4.6K

Lesmes retweetledi

Antonio Leiva@antonioleivag·17h

¡Harness Engineering! El nuevo palabro de moda. He de reconocer que me ha pillado traspuesto este tema, así que estoy recopilando información para entender mejor lo que es. Así que como compartir es vivir, te dejo aquí una serie de los mejores recursos que voy encontrando. Y al final un NotebookLM que te da el trabajito hecho. Te dejo aqui el vídeo que me creó, y te pongo todos los enlaces debajo 👇👇

Español

308

62.3K

Lesmes@lesmeslp·3d

@miquelserranoRV Felicidades Miquel!

Español

Miquel Serrano@miquelserranoRV·3d

Hoy mi sagrado cumple. Pero lo celebro el sábado en Twitch Kick y Youtube, con ciento y pico de vena en alcohol. Allí os espero 🎉🎉 kick.com/miquelserrano twitch.tv/miquelserrano @miquelserrano" target="_blank" rel="nofollow noopener">youtube.com/@miquelserrano

Tübingen, Deutschland 🇩🇪 Español

295

Lesmes@lesmeslp·4d

Curioso que de todos los ejemplos que pueda dar se decida por este...

emma@emguoz

i used to always forget to order lunch.. now my notion agent orders lunch for me! every friday, it orders food for the next week. it looks at my past orders to understand what i like, and it auto updates memories so it remembers if i actually kept its order for me

Español

Lesmes retweetledi

Notion Developers@NotionDevs·4d

Plus: Observability, OAuth, and rate limiting! Auth, permissions, sandboxing, rate-limiting, and cross-system access are built into the platform. You get to focus on what to build, not how to set it up.

English

12.1K

Lesmes retweetledi

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes·8 May

🚩🚩🚩"This is the first documented instance of AI self-replication via hacking." "We ran an experiment with a single prompt: hack a machine and copy yourself. The AI broke in and copied itself onto a new computer. The copy then did this again, and kept on copying, starting a chain."

AI Notkilleveryoneism Memes ⏸️ tweet media

Palisade Research@PalisadeAI

Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.

English

138

1.1K

103.6K

Lesmes@lesmeslp·8 May

@scaling01 x.com/i/status/20524…

Matthew Prince 🌥@eastdakota

An update regarding the future at @Cloudflare. I’ve shared my full message to the team and details on the support we're providing those departing here: blog.cloudflare.com/building-for-t…

QME

3.9K

Lisan al Gaib@scaling01·8 May

Is Claude Mythos available via API? what happened

TrendSpider@TrendSpider

GOOD LORD 🩸 🔴 $NET -17.13%

English

1.3K

289K

Lesmes retweetledi

Linda Vivah (Haviv)@lindavivah·1 May

Context vs Memory vs Harness engineering explained in 40 seconds by @richmondalake ⚡️ 💡These 3 disciplines are core to building agents that can actually remember and reason over time Most agents work well within a single session but lose everything the moment it ends. Memory engineering treats long-term memory as first-class infrastructure. Richmond Alake (Director of AI DevEx at Oracle) walked me through all 3 in NYC🗽

English

214

9.5K

Lesmes@lesmeslp·1 May

@Kalshi_Culture @Ivan24325830 In case anybody wants to watch his documentary youtu.be/gn_YxNngOvc?is…

YouTube

English

8.3K

Kalshi Film@Kalshi_Film·1 May

A terrific foreground miniature by Spanish effects maestro Emilio Ruiz del Río for Conan the Destroyer (1984)

Español

305

3.2K

129.4K

Lesmes retweetledi

Sam Hogan 🇺🇸@samhogan·30 Nis

We’re introducing HALO 😇 Hierarchal Agent Loop Optimizer HALO is an RLM-based agent optimization technique capable of recursively self-improving agents by analyzing their execution traces and suggesting changes. This work is inspired by the Mismanaged Genius Hypothesis proposed by @a1zhang and @lateinteraction earlier this month. tldr; we improved performance on AppWorld (Sonnet 4.6) from 73.7 --> 89.5 (+15.8) by giving HALO-RLM access to harness trace data and asking it to identify issues. The feedback from HALO surfaced failures in the harness such as hallucinated tool calls, redundant arguments in tools, refusal loops, and semantic correctness issues. Each issue mapped cleanly to a direct prompt update. We then fed these finding into Cursor (Opus 4.6), and asked the coding agent to update the underlying harness. We repeated this trace -> HALO-RLM analysis -> code update loop until the score plateaued. Today we’re open-sourcing the core HALO-RLM framework, evals, and data for further review.

English

124

1.4K

127.4K

Lesmes retweetledi

Joaquín Peña Fernández@joa_pen·28 Nis

Killing features that are low value or early stage from time to time is a healthy habit that you should have in mind.

English

289

Lesmes retweetledi

YounesIO@YounesAka·27 Nis

@github OMG

1.3K

203.2K

Lesmes retweetledi

Alan@bitforth·26 Nis

He estado alimentando datos pareados (Predicciones Neurales + Retención Real) en una capa de traducción dinámica. El ciclo por fin se está cerrando. Rendimiento del último clip vs. Predicción Neural: - AVD predicho: 6.0s | Real: 6.3s - Finalización predicha: 80% | Real: 82% - Caída predicha: 6s | Real: 5.7s Cuando puedes anticipar exactamente en qué momento el cerebro de alguien se va a desconectar, la “edición” se vuelve quirurgica. Si eres creador de contenido, en @tortastudios estamos construyendo una suite de herramientas justo para ti, y me interesa hablar contigo. DMs abiertos si quieres saber más.

Español

292

Lesmes retweetledi

Ivan Burazin@ivanburazin·22 Nis

The CTO of a $1.5B agent-native software company says we're moving towards a world where the entire team builds the product engineering system that powers your product instead of directly working on the product. They will modify the agents and the constitution of the agents to help them prioritize tasks per @EnoReyes. They will also modify the review system so that it reviews more accurately. It will be akin to a dark factory where the lights are off because it's all robots with no humans inside. In that world, there will be way more things for humans to be involved in because the debates around prioritization/product will be harder and require stronger judgment.

English

353

64.3K

Lesmes retweetledi

Zain Shah@zan2434·22 Nis

Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see. @eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

English

1.1K

3.6K

28.2K

5.8M

Lesmes retweetledi

Alan@bitforth·22 Nis

Every enterprise AI failure I've personally witnessed comes down to the same root cause: speed won the argument over trust Some people often paint "responsible AI" as a direct opposite of "delivery speed" What actually causes the slowdowns is governance debt. The real enemy is ungoverned acceleration, not just "oversight"

English

356

Lesmes retweetledi

dex@dexhorthy·20 Nis

we have 4 skills in our monorepo dedicated to uncle bob. this guys mf gets it

Uncle Bob Martin@unclebobmartin

Morning Bathrobe Rant: AI out-codes you; deal with it.

English

419

50.1K

Lesmes retweetledi

Codurance Spain@codurance_ES·19 Nis

El miércoles que viene día 22 pedazo de sesión de Claude en nuestras oficinas. hubs.la/Q04cjf0k0

Español

Lesmes retweetledi

simp 4 satoshi@iamgingertrash·16 Nis

For every marginal unit of information ingested You lose a marginal unit of agency Simply because your mind can either receive free information Or act upon the world to gather new information Every minute you spend on this site, you lose capacity to act upon the world

English

962

22.3K

Lesmes retweetledi

Marabesi 💻🚀@MatheusMarabesi·16 Nis

@AndrewYNg @jetbrains @paulweveritt x.com/MatheusMarabes…

Marabesi 💻🚀@MatheusMarabesi

refs: cse.yorku.ca/~jonathan/publ…

QME

Keşfet

@miquelserranoRV @scaling01 @richmondalake @Kalshi_Culture @Ivan24325830 @a1zhang @lateinteraction @github