Antoni

375 posts

Antoni banner
Antoni

Antoni

@antnjbert

Build things https://t.co/6RAVq4D5P6 - Using GPUs efficiently https://t.co/IcrYN2zEc5 - first LLMs powered clinic in EU Previously Eng +ML infra & k8s at @google & @apple

London, UK เข้าร่วม Mart 2014
271 กำลังติดตาม67 ผู้ติดตาม
ทวีตที่ปักหมุด
Antoni
Antoni@antnjbert·
Day 5 of 14: Solving the accelerator utilization problem. Why are there so many AI chips, and why do companies actually run multiple accelerators? Sticking with one vendor would be so much simpler - one stack, one integration. OpenAI has its deal with Cerebras, X and Meta are building their own silicon, Chinese big tech folks plan to train and inference on an alternative hardware. Google has TPUs, AWS has been doing this for a while with Trainium and Inferentia. Not counting the cool stuff SambaNova, Tenstorrent are building. Can't speak of Apple - under NDA. And Microsoft is deploying AMD's chips while developing its own silicon - Maia. There are also plenty of chips aimed at solving different problems - running models with predictable and extremely low latency, or having chips that can work in smart vehicles. But for customers it's complex to adopt a diversified stack. The absolute majority goes with a single vendor for training and a single vendor for inference, and that works quite well. So that's another problem gpusprint is solving - getting metrics across all of those vendors in a simple, unified format. Not much interesting stuff today, rather a few routine updates: - Made my mind to build a custom frontend - GitHub Actions now work, building and releasing images for all platforms - Moved everything to Helm charts, tested and moved to OCI registry - Improved dashboards helm upgrade --install gpusprint oci://ghcr.io/antonibertel/charts/gpusprint --namespace gpusprint-system --create-namespace
Antoni tweet media
English
0
0
6
545
Antoni
Antoni@antnjbert·
I run Gemma 4 on @tenstorrent Blackhole GPU! And based on a Google search, I'm the first one (or at least the first to write about it) I was very curious about what it would take for me to buy an open-source RISC-V GPU and run a new open-source model on it. Well, today I've got the "Hello, world" out of Gemma 4. Regarding complexity, I had to figure out what RoPE (Rotary Position Embedding) and PLE (Per-Layer Embedding) are, how to build and connect tt-inference-server, tt-metal, and tt-vllm locally, and write a lot of shitty code. (Very soon I won't feel this stupid after talking to my smart deep learning friends) Bad news - I'm still getting gibberish if the answer is more than ~100 characters, so I messed up somewhere, and I don't even want to measure tokens/second yet, nor have I written any native kernels. Good news is that I think the product team cares a lot about making Tenstorrent genuinely friendly for software engineers and customers. I chatted with Adam Housman last week, and I think he loves the experience people can get on MacBooks nowadays with LM Studio and would like to push TT even further. (What I'm genuinely hoping for someday is tt install firmware && tt update firmware && tt run google/gemma-4-E4B-it)
Antoni tweet media
Google@Google

We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially permissive Apache 2.0 license so anyone can build powerful AI tools. 🧵↓

English
0
0
4
167
Antoni
Antoni@antnjbert·
@__tinygrad__ If you point it to a sample 2.5 integration and throw in a few prompts, then for sure
English
0
0
0
1.2K
the tiny corp
the tiny corp@__tinygrad__·
Who thinks MiniMax M2.7 can add support for itself?
the tiny corp tweet media
English
28
18
638
35.5K
Antoni
Antoni@antnjbert·
@AlexReibman Why use Macs instead of containers in the remote cloud? To access the Apple ecosystem?
English
1
0
1
25
Antoni
Antoni@antnjbert·
@CyberRobooo The data on the second screen goes to waste because the person does not see the world through the robot’s cameras, and the data from the first screen is only suitable for pre-training
English
1
0
2
1.1K
CyberRobo
CyberRobo@CyberRobooo·
Two spacetimes, one data value chain >India: Workers’ headcams record garment sewing egocentric data flows to AI/robotics companies. >China: Operators teleoperate humanoid robots in warehouses, sorting packages …validating the system ,collecting real-world training data.
English
27
154
781
105K
Antoni
Antoni@antnjbert·
@pmarca @stratechery @benthompson You can take a smaller model without guardrails, post-train it on Kali Linux tools, and get very similar results. It is not that the model is capable of doing so solely because of its size kali.org/tools/
English
0
0
0
570
Marc Andreessen 🇺🇸
“This raises an obvious question: how much of Anthropic’s reluctance to make Mythos widely available is due to security concerns, as opposed to the more prosaic reality that Anthropic simply doesn’t have enough compute?” @stratechery @benthompson
English
302
300
5.1K
775K
Antoni
Antoni@antnjbert·
Stripe can deliver payment transition confirmations right after they happen via webhooks, so we store those transactions and their confirmations in PostgreSQL. There is a good chance you keep them somewhere internally as well, and I believe fetching by transaction ID does not have that hard limit, so you could probably find a way around it
English
0
0
0
17
Ivan Burazin
Ivan Burazin@ivanburazin·
I connected OpenClaw to Clickhouse, Stripe, QuickBooks, and Brex, and tried to pull some financial data for a board meeting. Asked: "How much did we spend on compute?" Got the figure. Next: "Show me all revenue and cross-reference with Stripe." Gave partial data as Stripe limits the API to 100 events per month (we have thousands) Brex sent an error: "We can only give you spend through this tool, not revenues." Even our CRM can do everything except upload files through their API. All the data is still siloed. Even with agents built into each SaaS tool, I still have to cross-reference everything manually. The entire approach is built backwards.
English
20
1
35
6K
Antoni
Antoni@antnjbert·
@kastacholamine That’s you - that’s a great, in-depth article
English
0
0
0
4
Kate Stafford
Kate Stafford@kastacholamine·
@antnjbert Depends on the program/application- there’s always value in experimental data in your part of chemical space on your specific endpoints of interest. I thought the openadmet blog on zero shot prediction was really good and representative of my experience: openadmet.ghost.io/zero-shot-expa…
English
1
0
1
29
Antoni
Antoni@antnjbert·
@Ginkgo @jrkelly, sounds about right? Are you open to similar partnerships?
English
0
0
0
20
Antoni
Antoni@antnjbert·
Let me explain what Daniel means here by: "and the AI models that we have built and take generate rapid human proof-of-concept by taking them through these investigator-initiated trials" Ginkgo’s automated labs rapidly generate experimental data, ProQR’s uses that data to optimize Axiomer oligonucleotides, and the best candidates then move into investigator-initiated trials to generate rapid human proof-of-concept. Axiomer oligonucleotides are programmable RNA-editing drugs. What they're trying to say is that with automated labs, one could have lots of relatively cheap training data for ML.
English
1
0
0
40
Ginkgo Bioworks
Ginkgo Bioworks@Ginkgo·
In the tech industry, products get faster and cheaper every year. In pharma, the opposite has been true. Earlier today, we announced our partnership with ProQR to accelerate drug discovery with our autonomous lab. Hear ProQR CEO Daniel de Boer and Ginkgo CEO Jason Kelly discuss the implications of this partnership, and why partnerships like this are building a new stack that can get critical medicines to patients faster and cheaper.
English
1
5
15
1.8K
Antoni
Antoni@antnjbert·
There must be a startup “We’re going to make sure your framework samples are in LLM training datasets” What helps is asking it to write the plan first and making sure the plan is solid. But the problem is that writing the plan usually means you need to understand the thing deeply and plan nuances ahead, even though the agent is supposed to be smarter.
English
0
0
0
1K
the tiny corp
the tiny corp@__tinygrad__·
Racing GPT 5.4 xhigh, Opus 4.6, and Kimi K2.5 adding Gemma 4 support to tinygrad. I gave them each their own GPU on a tinybox red. GPT has E2B working and is on to MoE support, Opus runs but has some bug and is looking at norms and scale, and Kimi messed up adding GGUF bfloat16.
English
22
24
572
42.8K
Niko McCarty.
Niko McCarty.@NikoMcCarty·
ERROR, a Swiss project that pays scientists cash bounties for finding errors in published papers, has only paid out ~$6,900 in prizes after two years. "The project planned to carry out 100 in-depth critiques in 4 years, but only nine have been completed so far," according to reporting in Science. It seems that "money itself is not enough of an incentive" for this, because scientists are often afraid to criticize colleagues, etc. That's a shame.
Niko McCarty. tweet mediaNiko McCarty. tweet media
English
18
34
404
34.8K
Antoni
Antoni@antnjbert·
@jyoti_mann1 So I guess we’re at a stage where we need agents that can consume as many tokens per second as possible. One idea - someone could try regenerating all the icon SVG files across Meta. That might consume at least a few billions
English
1
0
5
1.2K
Jyoti Mann
Jyoti Mann@jyoti_mann1·
Exclusive: Meta employees are “tokenmaxxing” and competing on an internal leaderboard called “Claudeonomics” for status as a token legend. Over a recent 30-day period, total usage on the dashboard topped 60 trillion tokens.
English
196
133
3.4K
1.9M
Antoni
Antoni@antnjbert·
OpenAI actually did it for me in the sandbox and gave a proper answer, but I had to manually enable Agent mode, and it took a few minutes to respond. I imagine it will be smaller providers, rather than OpenAI, that give an agent secure access to context, let you configure which tools are available, and allow access to your password manager, credit card details with two-factor authentication, and so on. That seems like the way this will eventually work - not as part of OpenAI chats
Antoni tweet media
English
0
0
0
55
Ivan Burazin
Ivan Burazin@ivanburazin·
@antnjbert i have an agent in a sandbox that does all tasks including this type. Yes
English
1
0
1
245
Ivan Burazin
Ivan Burazin@ivanburazin·
My wife and I asked ChatGPT the same question (separately, in different languages): "Can you get us to Asia from Europe, given current airspace restrictions?" - same shitty cop out answer both times - suggested routes that don't have international flights - and flights that don't exist at all I find myself more and more annoyed with chatbots now. It seems LLM providers are trying to be cheap on tokens. The chatbots are just caching answers and shooting them back instead of going to the model and actually thinking. The responses are shorter, less thorough, and the entire interaction feels super lazy.
English
10
0
15
4.1K
Antoni
Antoni@antnjbert·
Something interesting I noticed is that GPU vendors and frameworks are starting to think about compute less as “a bunch of separate GPUs” and more as “one program over tiles and shards”. This shift happened because warp/SIMT-level programming exposed too much hardware detail. For single-GPU work, the stable unit became the tile; for multi-GPU work, it became the shard. @tenstorrent makes tiles a native unit of compute and layout - a tile is literally a 32x32 block, and tile layout is the efficient path on the hardware. @nvidia CUDA Tile moves the same thinking into the compiler/programming model - you describe work as tensors / logical tiles, and the compiler maps it onto Tensor Core hardware. @__tinygrad__ pushes it up to the framework layer - shard tensors and models across devices so multiple GPUs behave more like one distributed tensor program.
Antoni tweet media
English
0
0
1
76
Jamie Turner
Jamie Turner@jamwt·
gemma 4 31B does indeed run on this MBP M5 Max (128GB), but boy is it slow. opus is safe for now
English
47
2
293
39.7K
Antoni
Antoni@antnjbert·
@__tinygrad__ Would you still build it if you got, let’s say, 3 preorders, or are you already way past that number and I’m underestimating the demand?
English
1
0
1
2.2K
the tiny corp
the tiny corp@__tinygrad__·
exabox preorder is live. we considered raising a round to buy a datacenter, but with exaboxes we don't have to and can build out at our own pace. exaboxes just require a concrete slab and a big plug. tinycorp.myshopify.com/products/exabo…
English
68
54
949
92.9K
Antoni
Antoni@antnjbert·
@kastacholamine That’s cool. Out of curiosity, do you estimate human or animal ADMET? Do you also have a feedback loop - meaning, do you go through the whole path, synthesize molecules, and actually test them - or lack of data is not even a problem?
English
1
0
0
32
Kate Stafford
Kate Stafford@kastacholamine·
@antnjbert Yes we do! (at Numerion Labs, formerly known as Atomwise- just realized my bio is out of date lol) Our flagship models are foundation models for virtual screening and binding affinity prediction. We do also have a suite of ADMET models.
English
1
0
0
57