Carles Navarro

1.5K posts

Carles Navarro

@11krls

Chemisty & Physics | Machine Learning at @acellera

L'Ampolla, España Beigetreten Ekim 2013

538 Folgt242 Follower

Angehefteter Tweet

Carles Navarro@11krls·14 Mar

There is only one basic human right, the right to do as you damn well please. And with it comes the only basic human duty, the duty to take the consequences.

English

Carles Navarro@11krls·1d

guys all important stuff (for us) of claude code was mostly public, system prompts and hooks were all leaked. All the rest is just an envelop to put an agent into production and make it work on our computers / permissions / observability.

English

103

Carles Navarro@11krls·1d

@barckcode todas estas cosas eran ya públicas github.com/marckrenn/clau…

Español

16K

Cristian Córdova 🐧@barckcode·1d

👀 He estado analizando todo el supuesto código de Claude Code para entender mejor cómo funciona por detrás y así intentar sacarle más partido a nivel usuario. Algunas cosas interesantes que veo y que algunas de ellas no sabía: - El system prompt tiene algunas reglas curiosas como que no debe usar emojis salvo que se le pida, tiene un cyber risk y no da estimaciones de tiempo (en esto último se parece a cualquier dev xD) - Tiene 45 Tools y algunas de ellas se le dice explicitament que debe usarlas por encima de herramientas CLI del sistema (Bash). - Tiene 6 tipos de agentes built-in para diferentes propósitos como por ejemplo cuando entra en "Plan Mode". Cada uno tira de un modelo concreto. - Tiene memoria basada en ficheros como sabemos pero hace taxonomía por perfil de usuario (como nos conoce), por feedback (lo que le vamos diciendo, corrigiendo, etc), por project (conocimiento del código, lo que se hizo con él, etc) y por reference (punteros a sistemas externos, como Linear, GH, etc) - El MEMORY.md debe tener máximo 200 líneas. Si se pasa de ahí. Claude Code simplemente parece que lo ignora. - El patrón para implementar decisiones de permisos se hace a través de instrucciones escitas en ficheros XML. Y lo curioso es que gasta entre 64 y 4096 que el agente tome la decisión de saber si ejecuta algo o no y si tiene permisos o no. - Por defecto, la ventana de contexto se compacta en ~13,000 tokens y tras esto tienen un budget post-compactación de 50K tokens. - Tiene un modo "Undercover" para trabajadores internos de Anthropic que oculta nombres e IDs de modelos del prompt que previene que modelos no anunciados se filtren en commits/PRs. Me podría tirar así todo el día porque hay un montón de cosas curiosas. Creo que mas bien voy a recopilar todo lo que pueda en una web a modo educativo para mí mismo y para quien le apetezca. Luego mas tarde la publicaré.

Español

255

2.8K

766.9K

Carles Navarro@11krls·1d

vibe-leaked!

Chaofan Shou@Fried_rice

Claude code source code has been leaked via a map file in their npm registry! Code: …a8527898604c1bbb12468b1581d95e.r2.dev/src.zip

English

Carles Navarro retweetet

Jo Kristian Bergum@jobergum·2d

Exactly this and why we are building hornet.dev

Aaron Levie@levie

It’s wild to think about what types of infrastructure and services must change in a world where agents can process information a hundred or a thousand times faster than humans. Even the tools that were built for machine speed before, generally were still in service of end-users making a request somewhere in the system. Agents running 24/7 and in parallel modify these requirements meaningfully. Here are just a few examples: * Sandboxes. Agents need sandboxes to operate in that have to be insanely low latency because they can boot up these environments for coding at any moment. * Search (both publicly and within an enterprise). Agents can parallelize searches hundreds or thousands of times so they need to be able to work with fast indexes of information. * Payments. Agents can now pay in micro transactions, and aren’t bothered by the friction of paying $0.01 for a resource that a human would be. * File systems. Agents need to be able to work with files at a scale that humans never had to worry about. You’ll have all new complexity around version control, permissions, and having agents reading/writing from data at insane speeds. And there are tons more. We’re going from a word where software was built for people to a world where it’s built for agents. Lots of changes downstream as a result.

English

131

17.3K

Carles Navarro@11krls·4d

@teortaxesTex We just don't think about it

English

192

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·4d

I wonder if the EuroRICH, especially Iberians, ever ask themselves wtf it is that they do that they're more "productive" than Koreans. Is it just "lmao asian bugmen work hard not smart" tier dismissal? Or do they legit think their businesses contribute more value in some sense?

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

Labourite 🥀@brownliberite

South Korea has a ~~52 hour work week and a lower gdp per capita btw

English

873

126.3K

Carles Navarro@11krls·4d

Irónicamente, la universidad sí se ha adaptado a la ley de la oferta y la demanda: a la infinita demanda de madres, padres y chavales a quienes se les ha hecho creer que la única salida digna es estudiar una carrera. La mayoría de grados no deberían existir o deberían ser especializaciones.

Español

154

Mihura@XMihura·4d

ya que hemos abierto el melón de los funcionarios me gustaría comentar una cosa que veo mucho en mi generación y que me parece un síntoma clarísimo de que algo falla: he visto ya muchos casos, muchísimos, de gente que estudia una carrera (fisioterapia, contabilidad y finanzas, veterinaria, edificación, da igual) y a los dos o tres años de estrellarse contra el mundo laboral se meten a preparar una oposición que no tiene NADA que ver con lo que han estudiado. en muchos casos policía o auxiliar administrativo y si lo he visto tantas veces es porque es algo muy común este modelo está literalmente cogiendo los años de mayor energía de una generación y tirándolos a la basura 4 años de grado + 1 de máster + 2 años de trabajos precarios + 2-3 años encerrados memorizando un temario para un puesto que no requiere ese título universitario casi una década de capital humano desperdiciado yo me pregunto, ¿no necesita este sistema repensarse de arriba a abajo? ¿de verdad necesitamos escupir miles de graduados en fisioterapia, magisterio, turismo cada año si sabemos que el mercado no los va a absorber? la universidad en españa parece haberse convertido en un parking de jóvenes, una guardería donde los chavales retrasan su vida 5-6 años una herramienta carísima que lo que en muchos casos solo consigue mantener a la gente bloqueada fuera del mercado laboral en sus años de mayor energía el coste de oportunidad de este modelo para el país muy alto tener a chavales quemando sus veintes en academias de oposiciones para huir de la precariedad, cuando podrían estar construyendo cosas, innovando o simplemente aportando valor real desde los 20 años tenemos que empezar a alinear la formación con la realidad material del mundo, porque sostener este sumidero de talento no le hace un favor a nadie

Español

337

1.3K

5.9K

843.8K

Carles Navarro@11krls·4d

There’s one thing people who say ‘MCP or CLI’ and people who say ‘Transformers or diffusion’ have in common: they have no idea.

English

Carles Navarro@11krls·5d

@davidgomes They love backwards compatibility too

English

3.2K

David Gomes@davidgomes·5d

I'm unshipping a feature from Cursor, and I can tell that all the SOTA models are really bad at deleting code. They will routinely: - Leave behind `throw Error("Not implemented")` style things - Want to give users notifications "Feature has been deprecated" - Keep tests for features that don't exists, and write stubs for these features in the tests (instead of deleting tests) - Not find a ton of dead code, and not care too much about deleting it - Leave useless comments about deleted code/functionality I think we need to much improve the RL data with lots of feature deletions, because they have been trained to only generate code, not to delete it.

English

800

65.7K

Carles Navarro retweetet

Matt Pocock@mattpocockuk·25 Mar

Good tip for avoiding cognitive debt in codebases where AI has run wild: Design the interface, delegate the implementation

English

956

61.8K

Carles Navarro retweetet

rahul@rahulgs·25 Mar

seems obvious but: things that are changing rapidly: 1. context windows 2. intelligence / ability to reason within context 3. performance on any given benchmark 4. cost per token things that are not changing much: 1. humans 2. human behavior, preferences, affinities 3. tools, integrations, infrastructure 4. single core cpu performance therefore, ngmi: 1. "i found this method to cut 15% context" 2. "our method improves retrieval performance 10% by using hybrid search" 3. "our finetuned model is cheaper than opus at this benchmark" 4. "our harness does this better because we invented this multi agent system" 5. "we're building a memory system" 6. "context graphs" 7. "we trained an in house specialized rl model to improve task performance in X benchmark at Y% cost reduction" wagmi: 1. product/ui 3. customer acquisition 4. integrations 5. fast linting, ci, skills, feedback for agents 6. background agent infra to parallelize more work 7. speed up your agent verification loops 8. training your users, connecting to their systems and working with their data, meeting them where they are

English

111

229

3.2K

394.7K

Carles Navarro@11krls·25 Mar

Memory should be only per chat. So I can have a long chat without losing context, if I start a new chat is because I want fresh context. When context engineering gets good enough I may never leave the conversation, but for now it just messes up both experiences (not enough good for infinite conversation nor for global context)

English

Andrej Karpathy@karpathy·25 Mar

One common issue with personalization in all LLMs is how distracting memory seems to be for the models. A single question from 2 months ago about some topic can keep coming up as some kind of a deep interest of mine with undue mentions in perpetuity. Some kind of trying too hard.

English

1.8K

1.1K

21.2K

2.6M

Carles Navarro@11krls·25 Mar

Ehh that's really good. Been saying this for a while

Anthony@kr0der

just found out Claude Code has a new (unreleased?) feature called "Auto-dream" under /memory according to reddit, this basically runs a subagent periodically to consolidate Claude's memory files for better long-term storage this is pretty crazy because that's basically how humans store long-term memories if you think about it - by sleeping

English

Carles Navarro retweetet

Kpaxs@Kpaxs·20 Mar

This is a man who has been haunted since childhood and built a billion dollar company as a side effect of trying to make the haunting stop.

English

110

452

4.3K

978.3K

Carles Navarro@11krls·21 Mar

With agents getting better and working for longer, feels no longer productive to wait several minutes for each task to finish. I resigned, I don't want to know what code they produce anymore, I fully embrace multi-agentic coding. This codebase is modular enough so I can coordinate with a simple agent_inbox/ files communication protocol.

English

Carles Navarro@11krls·20 Mar

Cursor's job is not to compete with model providers IMO, creating an agentOS was (may still be) a better strategy. Easy to say it a posteriori though

English

Carles Navarro@11krls·11 Mar

This is the same for code, and for everything really, current bottleneck is verification. And will be hard to scale AI out of easy verifiable domains

Jon Hernandez@JonhernandezIA

📁 Terence Tao, Fields Medal winner, says AI can already generate many mathematical proofs. The real bottleneck is verification. Creating ideas is becoming cheap. Knowing which ones are truly correct is still human work.

English

Carles Navarro@11krls·7 Mar

@karpathy This is a real ML researcher optimizing even the random seed when there is nothing else to try

English

128

Andrej Karpathy@karpathy·7 Mar

I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)

English

1.1K

3.7K

28.3K

10.9M

Carles Navarro retweetet

Ivan Burazin@ivanburazin·6 Mar

Sandboxes are layer one. As agents take on more complex work, every layer needs rethinking: - Networking for agent to agent communication - Storage for petabyte scale snapshots - Observability for debugging million path execution trees - Security for autonomous decision making The whole stack will be rebuilt from first principles.

English

371

26.3K

Carles Navarro@11krls·6 Mar

@akhileshss @karpathy but they shouldn't dream while awake

English

akhileshss@akhileshss·6 Mar

@11krls @karpathy Oh they do! It's called hallucinations!

English

Andrej Karpathy@karpathy·6 Mar

There was a nice time where researchers talked about various ideas quite openly on twitter. (before they disappeared into the gold mines :)). My guess is that you can get quite far even in the current paradigm by introducing a number of memory ops as "tools" and throwing them into the mix in RL. E.g. current compaction and memory implementations are crappy, first, early examples that were somewhat bolted on, but both can be fairly easily generalized and made part of the optimization as just another tool during RL. That said neither of these is fully satisfying because clearly people are capable of some weight-based updates (my personal suspicion - mostly during sleep). So there should be even more room for more exotic approaches for long-term memory that do change the weights, but exactly - the details are not obvious. This is a lot more exciting, but also more into the realm of research outside of the established prod stack.

Awni Hannun@awnihannun

I've been thinking a bit about continual learning recently, especially as it relates to long-running agents (and running a few toy experiments with MLX). The status quo of prompt compaction coupled with recursive sub-agents is actually remarkably effective. Seems like we can go pretty far with this. (Prompt compaction = when the context window gets close to full, model generates a shorter summary, then start from scratch using the summary. Recursive sub-agents = decompose tasks into smaller tasks to deal with finite context windows) Recursive sub-agents will probably always be useful. But prompt compaction seems like a bit of an inefficient (though highly effective) hack. The are two other alternatives I know of 1. online fine-tuning and 2. memory based techniques. Online fine-tuning: train some LoRA adapters on data the model encounters during deployment. I'm less bullish on this in general. Aside from the engineering challenges of deploying custom models / adapters for each use case / user there are a some fundamental issues: - Online fine-tuning is inherently unstable. If you train on data in the target domain you can catastrophically destroy capabilities that you don't target. One way around this is to keep a mixed dataset with the new and the old. But this gets pretty complicated pretty quickly. - What does the data even look like for online fine tuning? Do you generate Q/A pairs based on the target domain to train the model? You also have the problem prioritizing information in the data mixture given finite capacity. Memory based techniques: basically a policy for keeping useful memory around and discarding what is not needed. This feels much more like how humans retain information: "use it or lose it". You only need a few things for this to work: - An eviction/retention policy. Something like "keep a memory if it has been accessed at least once in the last 10k tokens". - The policy needs to be efficiently computable - A place for the model to store and access long-term memory. Maybe a sparsely accessed KV cache would be sufficient. But for efficient access to a large memory a hierarchical data structure might be beter.

English

274

299

4.6K

588.5K

Carles Navarro retweetet

gabriel@gabriel1·6 Mar

i could learn any topic in 5 minutes with the mist optimal text & visual explanation, if it's adapting live to what you do and dont understand fundamentally ai has like another 5 gpt3 to gpt4 moments ahead

English

589

25.8K

Entdecken

@barckcode @teortaxesTex @davidgomes @elonmusk @BarackObama @taylorswift13 @cristiano @BillGates