Javi™

5.9K posts

Javi™

@javitm

CTO & co-founder @theagilemonkeys Building production AI agents & automation All signal, zero noise.

Las Palmas de Gran Canaria Katılım Temmuz 2008

1.1K Takip Edilen828 Takipçiler

Sabitlenmiş Tweet

Javi™@javitm·3d

Right now, I'm building AI products with a small team that outperforms what used to require an entire department. Agents handle code, research, testing, and deployments. We handle vision, architecture, and the hard decisions. This is the most exciting time to be a software engineer!

English

Javi™@javitm·5h

Local models are definitely a step in the right direction. Cloud models have been necessary while hardware evolved, but they also come with latency, cost and privacy issues. I think there’s another shift coming. As architectures stabilize, it starts to make sense to move from general-purpose compute GPUs, to more specialized silicon, with models increasingly baked closer to the hardware (as companies like Taalas are starting to show). I’m convinced that AI will eventually come embedded in our iPhone and other devices, becoming immediate, private, and virtually free to use.

English

Robert Scoble@Scobleizer·7h

Local models are the way. @AlexFinn is right. "Normies" will just get them automatically. Consumers will care about privacy and reliability, two things you can only get from local models. The "Pros" are moving to open models (I saw it at NVIDIA GTC where nerds waited in line to buy a $5,000 DGX inferencing box to get it loaded up with local models. Why? The cloud models (GPT, Gemini, Grok, Claude) are a little bit better but cost the pros TONS every day. I met more than one developer at GTC that's spending more than $1,000 a day in tokens to build their software. This shift is underway even though most don't recognize it yet.

Alex Finn@AlexFinn

Do you even understand what this means? An open source model just released that is: • Outperforms models 20x its size • Can run on a base model Mac Mini • Is AMERICAN 🇺🇸 If you have a base model Mac Mini you can have unlimited super intelligence on your desk. For free. Sonnet 4.5 was released 5 months ago In 5 months that level of intelligence went from frontier to free on your desk And not only that, can run on any basically any computer out there If you have even a remotely modern computer, do the following immediately: 1. Download LM Studio 2. Go to your OpenClaw and ask which of these new Gemma 4 models is best for your hardware 3. Have it walk you through downloading and loading it 4. Build apps with it knowing you are using your own personal, private super intelligence on your desk The people denying this is the future are so beyond lost.

English

101

12.5K

Javi™@javitm·6h

I'm curious about the linting process. Do you just use LLMs to verify the sources, or you have more sophisticated mechanisms? I'm finding that after some time working with a knowledge base, it starts accumulating errors that propagate to all derived resources. With a large number of originals, it becomes challenging to know or verify what's true and what's not. This structure is proving really useful, but I think this is one of the main challenges we need to solve to achieve longer unsupervised research.

English

687

Andrej Karpathy@karpathy·7h

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

English

878

1.6K

15.1K

1.6M

Javi™@javitm·10h

@BettaTech Cuando nadie se acuerde de cómo funciona la web, quizás la propia AI cree sus propios frameworks y discuta con otras AIs sobre que framework es mejor por nosotros…

Español

182

Martí@BettaTech·1d

El sector del desarrollo siempre ha tenido sus problemas, pero ahora mismo parece un parque infantil. Añoro las discusiones de si React es mejor que Angular, y nunca pensé que diría esto jaja

Español

404

21.5K

Javi™@javitm·10h

@davidmarcus @openclaw The biggest limiting factor I see to achieve that is the accumulative errors on the agent’s memory when they run unsupervised for a while.

English

David Marcus@davidmarcus·12h

We’re likely < 12 months from unsupervised software development. Not just better models. Full closed loops: generate → run → evaluate → fix → repeat. Using @openclaw you can already see it. Once loops + models improve together, supervision will stop making sense.

English

7.6K

Javi™@javitm·11h

@Shpigford @BearNotesApp

QAM

Josh Pigford@Shpigford·1d

What app has the best markdown reading/writing experience you’ve ever experienced in your entire life?

English

29.3K

Javi™@javitm·11h

@jeffrey_way Tests can probably prevent it because they’re a deterministic steering system, but you probably need to understand what you’re doing to create the right ones… and at that point I doubt you can still call it “vibe coding” 🤔

English

305

Jeffrey Way@jeffrey_way·13h

I think it's probably undeniable that RIGHT NOW, if you effectively vibe code your projects, the codebase will become increasingly worse with each passing day. Not sure that any combo of tests + formatters + skills can prevent that.

English

229

17.7K

Javi™@javitm·13h

@rand_longevity I doubt it. The work is infinite. The 9-5, 5-days-a-week schedule comes from what the average human is capable of sustaining. It’s not like there’s exactly 40 hours of work per week. If we could work 48 or 500 hours a day and still have a life, we’d do it... Well, now we can!

English

1.7K

Rand@rand_longevity·17h

your full time 9-5 job is gonna go down to 3 days a week soon

English

314

129

8.1K

1.7M

Javi™@javitm·13h

@DorotheaBaur Indeed, that’s the whole point of AI! A tool to process unreliable inputs and produce useful-ish outputs. We have code for anything that needs to be exact.

English

Dorothea Baur (Dr.)@DorotheaBaur·1d

It bears repeating: „LLMs are inherently probabilistic, not deterministic. They try to guess the answer that sounds best, not the objectively correct one. And that’s not a problem to be fixed. It’s fundamental to how these models work.“

Reuters Open Interest (ROI)@ReutersOI

New research suggests AI may never be reliable enough for the high-stakes work that's helped to justify hundreds of billions of dollars in investment, argues Panmure Liberum's Joachim Klement reut.rs/41EzEJ0

English

507

50.9K

Javi™@javitm·13h

@emollick More than a bottleneck, it’s a ceiling, and that’s probably a good thing, it’s the sign we’re still part of the game 😅

English

Ethan Mollick@emollick·1d

A sign that human creativity is a bottleneck is that this year everyone can generate almost any image or video they can think of for nearly free and the April Fools posts are basically just as bad as any other year.

English

1.1K

38.1K

Javi™@javitm·13h

@awesomekling Maybe to build 100+ projects in parallel?

English

249

Andreas Kling@awesomekling·1d

I wonder if anyone will assemble a team of 100+ paid software engineers ever again.

English

326

47K

Javi™@javitm·14h

@juanmacias Habrá que ver exactamente qué es lo que pasa, pero desde luego está claro que el código ya no es un valor diferenciador. Justo estaba leyendo esto y creo que ha acertado en muchas cosas: x.com/michaelxbloch/…

Michael Bloch@michaelxbloch

x.com/i/article/2037…

Español

873

juanmacias 🏳️‍🌈@juanmacias·15h

A todos los que pensáis que la IA no va acabar con el Saas… estáis muuuuy equivocados. Solo aquellos con integraciones, acceso exclusivo a APIs, red de partners, van a sobrevivir… el resto? Lo dudo y no por el tema precio

Español

7.6K

Javi™@javitm·16h

@pmitu You probably don’t want an AI model processing your wire transfers 😅

English

Paul Mit@pmitu·20h

Why would you build anything non-AI-powered these days?

English

171

145

8.7K

Javi™@javitm·16h

@pmddomingos Because LLMs are just approximate text processors. Everything else is needed to put together the text they need to process and maximize the accuracy for specific use cases.

English

157

Pedro Domingos@pmddomingos·1d

If LLMs are so smart, why do they need all these prompts, harnesses, post-training, scaffolding, etc.?

English

362

924

106.9K

Javi™@javitm·18h

We've been trying something that partially aligns with this model, and the main limiting factor in our case is the reliability of our “world model” because when you stack agents, the hallucinations compound over time. Everyone needs to be aware that some things may be “made up” and must verify them. That verification need is the new bottleneck. That said, a non-reliable world model is already showing a ton of value because it surfaces info that you didn’t even notice before.

English

952

jack@jack·2d

x.com/i/article/2038…

ZXX

467

1.4K

9.1K

4.3M

Javi™@javitm·1d

@wickedguro I tried to run some automated social media experiments, but the posts tend to be a bit soulless, and people generally ignore or reject them because it’s obvious they were generated by AI. Have you noticed that kind of issue with Postiz?

English

390