Joaquin Bravo

6.3K posts

Joaquin Bravo banner
Joaquin Bravo

Joaquin Bravo

@jackbravo

Programador de Guadalajara. Me gustan linux, el software libre, el servicio social, el fucho, la bici y la guitarra.

Guadalajara, México เข้าร่วม Temmuz 2007
1.4K กำลังติดตาม1.5K ผู้ติดตาม
Joaquin Bravo รีทวีตแล้ว
Josh Clemm
Josh Clemm@joshclemm·
Open sourcing something fun from @Dropbox: Witchcraft. It's a local search engine built in Rust with no API keys or vector DB required. Think: ColBERT / late interaction style retrieval, but packaged to run locally (perfect for coding agents). Let's dive in👇
English
19
37
460
41K
Joaquin Bravo รีทวีตแล้ว
DHH
DHH@dhh·
In 2023, we spent $3,934,099 on AWS + other hosting. In 2026, our hosting + support bill is down to ~$1m/year due to the cloud exit. Even including all the hardware buying, we will already have saved ~$4m by the end of this year. And going forward, it's ~$3m/yr in savings 🤑
English
254
327
6.9K
679K
Joaquin Bravo รีทวีตแล้ว
Ejaaz
Ejaaz@cryptopunk7213·
ridiculous amount of alpha in this post, gavin knows this shit better than anyone. tldr: - the switching-cost to train your model on a different type of gpu is very high now - translation: ai labs are becoming increasingly reliant on their GPU maker (which gives nvidia a lot of power) - labs are now literally designing their models to work with specific gpus - google’s gemini needs tpus, openai needs cerebras / nvidia - Anthropic is the ONLY ONE that can afford to switch. why? because they train claude across tpus, trainium and nvidia - but inference is now way more important than pre-training aka the TYPE of gpu matters more - chinese models are trained on chips VERY different to americas = their models wont run on our hardware.
Gavin Baker@GavinSBaker

Much of Dwarkesh's argument hinges on this statment which *was* accurate but will be increasingly inaccurate on a go forward basis imo:    “American labs port across accelerators constantly. Anthropic's models are run on GPUs, they're run on Trainium, they're run on TPUs. There are so many things you can do, from distilling to a model that's well fit for your chips.”   As system level architectures diverge (torus vs. switched scale-up topologies, memory hierarchies, networking primitives), true portability is eroding. The Mi300 and Mi325 had roughly the same scale-up domain size as Hopper while Blackwell’s scale-up domain is 9x larger than the Mi355 scale-up domain, etc. Many frontier models are now being explicitly co-designed for inference on specific hardware like GB300 racks. Codex on Cerebras is another example. Those models run less efficiently on other systems and the performance differentials will only widen. A model that runs well on Google’s torus topology will run less efficiently on Nvidia’s switched scale-up topology and vice versa - the data traffic is fundamentally different as a byproduct of the models being parallelized across the different topologies. Google’s internal teams - and increasingly the Anthropic teams as they become the most important customer of almost every cloud - have the luxury of operating across the stack (models, chips, networking) - but that is not the case for the rest of the market and other prospective users. Anthropic is the exception, not the rule. To wit, Anthropic and Google allegedly have a mutual understanding where Anthropic can hire the TPU engineers they need every year to ensure that they can continue to get the most out of the TPU. Given the overwhelming importance of cost per token to the economics of the labs, models will be run where they run best. Most extremely large MoE models will run best on GB300s given the importance of having a switched scale-up network like NVLink for MoE inference. When training was the dominant cost for labs and power was broadly available, labs were optimizing to minimize capex dollars. Model portability was a way to create leverage over suppliers. I think that drove a lot of the focus on portability. Today, inference costs as measured by tokens per watt per dollar are everything. Inference is way more important than training costs (inference is effectively now part of training via RL). Labs are therefore now optimizing for inference. This means increasing co-design and higher go-forward switching costs for individual models between systems. I do think this explains why Anthropic and Nvidia came together: Anthropic needed Blackwells and Rubins to inference at least *some* of their models economically. And Mythos might just end up being released coincident with the availability of Rubins for inference. TLDR: as labs shift their focus from training to inference, the costs of portability and the upside of co-design to maximize tokens per watt per dollar both rise. Portability is likely to begin decreasing as a result.   I think what I might have respectfully added to Jensen’s answer is that systems evolve under local selective pressures. The evolutionary pressure in America is a shortage of watts so it makes sense for Nvidia to optimize, as an American company, for power efficiency and tokens per watt and stay on copper as long as possible. China has a surfeit of watts. Chinese AI systems are already taking advantage of this with the Huawei Cloudmatrix 384 and Atlas SuperPoD having an optical scale-up domain that is much larger than anything offered by Nvidia today at the cost of *much* higher power consumption and much lower tokens per watt. The networking primitives for this Huawei system are very different than those for Nvidia’s systems and a model that runs well on Nvidia will not run well on that system and vice versa. This means that if a Chinese ecosystem gets momentum, Chinese models might stop running well on American hardware. And when Chinese models run best on American hardware, America is in a better position as this gives America a degree of leverage and control over Chinese AI that it risks losing to an all-Chinese alternative ecosystem.   This architectural fork makes porting and distillation less effective and strengthens the pro-American national security case for selling China deprecated GPUs imo. Also I will attest that I did not wake up a loser this morning.

English
15
14
389
84.2K
Joaquin Bravo
Joaquin Bravo@jackbravo·
@thdxr It runs on an infinite loop and tries to be smart about managing context. You normally need a model with a very long context window and capable of taking good decisions about how to manage memory and context. They also usually load tons of tools
English
0
0
1
670
dax
dax@thdxr·
i haven't really clicked with the openclaw category of product so i'm having trouble understanding some stuff can someone help me understand why it needs a particular good model? isn't it doing more basic stuff?
English
156
7
685
107.3K
Joaquin Bravo รีทวีตแล้ว
Omar Khattab
Omar Khattab@lateinteraction·
As promised, here's a recording of my 30-min keynote and the subsequent Q&A for the inaugural late interaction retrieval (LIR) workshop, cc @bclavie @antoine_chaffin. The talk is admittedly advanced, as it's directed at an expert IR community. But hopefully still broadly useful!
Amélie Chatelain@AmelieTabatta

Lots of people interested in the late Interaction workshop, listening to @lateinteraction's keynote!

English
16
105
799
210.7K
Joaquin Bravo รีทวีตแล้ว
Mario Zechner
Mario Zechner@badlogicgames·
recommended viewing. one more time, on it's own. this is probably yhe most practical talk on using coding agents i've watched to date. watch it. by @lucasmeijer it's also a great demo of pi and captures exactly why i built it. youtu.be/fdbXNWkpPMY?si…
YouTube video
YouTube
English
17
34
568
39.6K
Joaquin Bravo รีทวีตแล้ว
AJ BAUER
AJ BAUER@ajbauer8·
Podría ver este video 10000 veces:
Español
41
357
3.5K
75.1K
Joaquin Bravo รีทวีตแล้ว
Leonie
Leonie@helloiamleonie·
Today I gave a workshop at the @aiDotEngineer Europe on "Agentic search for context engineering". Thanks to everyone who joined my session. If you couldn't make it, here's the thread format of it:
Leonie tweet media
English
4
9
49
4.4K
Joaquin Bravo รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code. But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along. So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions. TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
staysaasy@staysaasy

The degree to which you are awed by AI is perfectly correlated with how much you use AI to code.

English
1.1K
2.4K
20K
4.1M
Joaquin Bravo รีทวีตแล้ว
Andrej Karpathy
Andrej Karpathy@karpathy·
I packaged up the "autoresearch" project into a new self-contained minimal repo if people would like to play over the weekend. It's basically nanochat LLM training core stripped down to a single-GPU, one file version of ~630 lines of code, then: - the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) The goal is to engineer your agents to make the fastest research progress indefinitely and without any of your own involvement. In the image, every dot is a complete LLM training run that lasts exactly 5 minutes. The agent works in an autonomous loop on a git feature branch and accumulates git commits to the training script as it finds better settings (of lower validation loss by the end) of the neural network architecture, the optimizer, all the hyperparameters, etc. You can imagine comparing the research progress of different prompts, different agents, etc. github.com/karpathy/autor… Part code, part sci-fi, and a pinch of psychosis :)
Andrej Karpathy tweet media
English
1.1K
3.6K
28.3K
11M
Joaquin Bravo รีทวีตแล้ว
tomaarsen
tomaarsen@tomaarsen·
🌐 I've just released Sentence Transformers v5.4: we're going fully multimodal for embeddings & reranking! Also featuring a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Highlights in 🧵
tomaarsen tweet media
English
19
29
173
27.7K
Joaquin Bravo รีทวีตแล้ว
MajoraZ
MajoraZ@Majora__Z·
Apparently room 21 (the The Bellini and Giorgione Room) of the Uffizi in Florence, Italy, has Mexica / Aztec soldiers on the painted fresco on it's ceiling Does anybody happen to have photos of this? The virtual tour the website is low res and it's hard to make stuff out Here's the virtual tour if people are interested virtualuffizi.com/the-bellini-an…
MajoraZ tweet mediaMajoraZ tweet mediaMajoraZ tweet media
English
11
75
471
24.7K
Joaquin Bravo รีทวีตแล้ว
LightOn
LightOn@LightOnIO·
💫 Introducing New SOTA Long Context VLM LightOn OriOn-Qwen-SR1 reasons over full documents and executes it implicitly at inference. Reasoning is compressed into the model's weights, no verbose output, no added latency. 🥇 SOTA on MMLongBenchDoc, ahead of Qwen3 VL with 7× fewer parameters. 🙌 Kudos to @further_ai for this new milestone! Reasoning starts at reading. 👉 lighton.ai/lighton-blogs/…
LightOn tweet media
English
2
10
48
4.8K
Joaquin Bravo รีทวีตแล้ว
dax
dax@thdxr·
inference is very profitable and probably a good opportunity to understand some basic business math 1. companies buy long lived assets like GPUs. these are one time costs and the asset depreciates over time 2. once you own this asset, you can plug it in and produce tokens which you can sell. the cost of goods sold here can be very low and you might be making 90% margins at scale, this is why we say inference is profitable 3. then you also hire employees to do r&d work to improve your systems, come up with new models, expand the business if you add these 3 up you end up with $0. you're not producing a profit because the business is growing and you're reinvesting it all buying assets or r&d to meet demand if it's obvious to other people the business is working, you can raise money from them to accelerate all these numbers so they max out in 5 years instead of 25 so on paper you'll be "losing money" every year but that's because you want to make sure you lock down the opportunity before someone else the bigger your market is the bigger this burn can be because it's a function of potential so when you see these companies losing a lot of money it doesn't mean the whole concept of their business broken it's possible they misjudge and overinvest on 1+3 and will suffer some consequences but fundamentally 2 does work
dax@thdxr

@d4m1n i'm a bit confused why so many people say api tokens are sold at a loss this isn't true - these models are incredibly expensive compared to the gpu time cost there's potential for 90% margin depending on the model

English
65
70
1.4K
148.4K
Joaquin Bravo รีทวีตแล้ว
John Carreyrou
John Carreyrou@JohnCarreyrou·
The mystery of Satoshi Nakamoto, the pseudonymous inventor of Bitcoin, has remained unsolved for 17 years. Not anymore. Read my 18-month investigation to find out who Satoshi really is. nytimes.com/2026/04/08/bus…
English
942
1.4K
9K
5.3M
Joaquin Bravo รีทวีตแล้ว
Mario Zechner
Mario Zechner@badlogicgames·
People of pi. BIG NEWS. I've sold out. Let me know how you feel about this in the comments below. mariozechner.at/posts/2026-04-…
English
366
172
1.8K
206.3K
Joaquin Bravo รีทวีตแล้ว
Omar García
Omar García@omar_comunica·
La escuela de negocios del @ITESO confirma que vivir en Zapopan es tan caro como vivir en NuevaYol. Tiene las casas más caras de todo el país, de acuerdo con un boletín difundido esta tarde. Acá un par de datos:
Omar García tweet media
Español
72
191
914
85.8K
Joaquin Bravo รีทวีตแล้ว
marimo
marimo@marimo_io·
Introducing marimo pair: a skill that drops agents like Claude Code inside a running notebook, letting them read variables, run code, and manipulate UI elements. It also lets agents use marimo as a standalone REPL. youtube.com/watch?v=VKvjPJ… (1/n)
YouTube video
YouTube
English
3
11
65
12.4K