Antoni

375 posts

Antoni

@antnjbert

Build things https://t.co/6RAVq4D5P6 - Using GPUs efficiently https://t.co/IcrYN2zEc5 - first LLMs powered clinic in EU Previously Eng +ML infra & k8s at @google & @apple

London, UK เข้าร่วม Mart 2014

271 กำลังติดตาม67 ผู้ติดตาม

ทวีตที่ปักหมุด

Antoni@antnjbert·5 Mar

Day 5 of 14: Solving the accelerator utilization problem. Why are there so many AI chips, and why do companies actually run multiple accelerators? Sticking with one vendor would be so much simpler - one stack, one integration. OpenAI has its deal with Cerebras, X and Meta are building their own silicon, Chinese big tech folks plan to train and inference on an alternative hardware. Google has TPUs, AWS has been doing this for a while with Trainium and Inferentia. Not counting the cool stuff SambaNova, Tenstorrent are building. Can't speak of Apple - under NDA. And Microsoft is deploying AMD's chips while developing its own silicon - Maia. There are also plenty of chips aimed at solving different problems - running models with predictable and extremely low latency, or having chips that can work in smart vehicles. But for customers it's complex to adopt a diversified stack. The absolute majority goes with a single vendor for training and a single vendor for inference, and that works quite well. So that's another problem gpusprint is solving - getting metrics across all of those vendors in a simple, unified format. Not much interesting stuff today, rather a few routine updates: - Made my mind to build a custom frontend - GitHub Actions now work, building and releasing images for all platforms - Moved everything to Helm charts, tested and moved to OCI registry - Improved dashboards helm upgrade --install gpusprint oci://ghcr.io/antonibertel/charts/gpusprint --namespace gpusprint-system --create-namespace

English

545

Antoni@antnjbert·4d

I run Gemma 4 on @tenstorrent Blackhole GPU! And based on a Google search, I'm the first one (or at least the first to write about it) I was very curious about what it would take for me to buy an open-source RISC-V GPU and run a new open-source model on it. Well, today I've got the "Hello, world" out of Gemma 4. Regarding complexity, I had to figure out what RoPE (Rotary Position Embedding) and PLE (Per-Layer Embedding) are, how to build and connect tt-inference-server, tt-metal, and tt-vllm locally, and write a lot of shitty code. (Very soon I won't feel this stupid after talking to my smart deep learning friends) Bad news - I'm still getting gibberish if the answer is more than ~100 characters, so I messed up somewhere, and I don't even want to measure tokens/second yet, nor have I written any native kernels. Good news is that I think the product team cares a lot about making Tenstorrent genuinely friendly for software engineers and customers. I chatted with Adam Housman last week, and I think he loves the experience people can get on MacBooks nowadays with LM Studio and would like to push TT even further. (What I'm genuinely hoping for someday is tt install firmware && tt update firmware && tt run google/gemma-4-E4B-it)

Google@Google

We just released Gemma 4 — our most intelligent open models to date. Built from the same world-class research as Gemini 3, Gemma 4 brings breakthrough intelligence directly to your own hardware for advanced reasoning and agentic workflows. Released under a commercially permissive Apache 2.0 license so anyone can build powerful AI tools. 🧵↓

English

167

Antoni@antnjbert·4d

@__tinygrad__ If you point it to a sample 2.5 integration and throw in a few prompts, then for sure

English

1.2K

the tiny corp@__tinygrad__·4d

Who thinks MiniMax M2.7 can add support for itself?

English

638

35.5K

Antoni@antnjbert·5d

@AlexReibman Why use Macs instead of containers in the remote cloud? To access the Apple ecosystem?

English

Alex Reibman 🖇️@AlexReibman·6d

Alex Reibman 🖇️@AlexReibman

Got some at hyperbox.sh Well, we actually also sold out. But more coming online tomorrow and this week!

ZXX

Antoni@antnjbert·6d

@CyberRobooo The data on the second screen goes to waste because the person does not see the world through the robot’s cameras, and the data from the first screen is only suitable for pre-training

English

1.1K

CyberRobo@CyberRobooo·6d

Two spacetimes, one data value chain >India: Workers’ headcams record garment sewing egocentric data flows to AI/robotics companies. >China: Operators teleoperate humanoid robots in warehouses, sorting packages …validating the system ,collecting real-world training data.

English

154

781

105K

Antoni@antnjbert·6d

@pmarca @stratechery @benthompson You can take a smaller model without guardrails, post-train it on Kali Linux tools, and get very similar results. It is not that the model is capable of doing so solely because of its size kali.org/tools/

English

570

Marc Andreessen 🇺🇸@pmarca·11 Nis

“This raises an obvious question: how much of Anthropic’s reluctance to make Mythos widely available is due to security concerns, as opposed to the more prosaic reality that Anthropic simply doesn’t have enough compute?” @stratechery @benthompson

English

302

300

5.1K

775K

Antoni@antnjbert·6d

Stripe can deliver payment transition confirmations right after they happen via webhooks, so we store those transactions and their confirmations in PostgreSQL. There is a good chance you keep them somewhere internally as well, and I believe fetching by transaction ID does not have that hard limit, so you could probably find a way around it

English

Ivan Burazin@ivanburazin·10 Nis

I connected OpenClaw to Clickhouse, Stripe, QuickBooks, and Brex, and tried to pull some financial data for a board meeting. Asked: "How much did we spend on compute?" Got the figure. Next: "Show me all revenue and cross-reference with Stripe." Gave partial data as Stripe limits the API to 100 events per month (we have thousands) Brex sent an error: "We can only give you spend through this tool, not revenues." Even our CRM can do everything except upload files through their API. All the data is still siloed. Even with agents built into each SaaS tool, I still have to cross-reference everything manually. The entire approach is built backwards.

English

Antoni@antnjbert·10 Nis

@kastacholamine That’s you - that’s a great, in-depth article

English

Kate Stafford@kastacholamine·6 Nis

@antnjbert Depends on the program/application- there’s always value in experimental data in your part of chemical space on your specific endpoints of interest. I thought the openadmet blog on zero shot prediction was really good and representative of my experience: openadmet.ghost.io/zero-shot-expa…

English

Kate Stafford@kastacholamine·5 Nis

Just saying if anyone’s buying, I could be persuaded to sell my head for $40-50M…

himanshu@himanshustwts

Anthropic acquires Coeffiecent Bio in $400M and i think this is a pure talent and vision bet. > CB is eight months old startup + team of 10 people. That's roughly $40-50M per head > Nearly all the team members are former Genentech computational biology researchers. > Only a landing page. No publicly known product + no disclosed revenue + no conventional traction metrics. > CB was building a platform enabling AI to draft drug R&D plans etc > The team will be joining Ant's Healthcare & Life Sciences group, led by Eric Kauderer-Abrams (hired mid-2025 after running a diagnostics company focused on nucleic acid detection) > Half of the stock was owned by Dimension (healthcare-focused VC), very lucrative exit.

English

2.7K

Antoni@antnjbert·10 Nis

@Ginkgo @jrkelly, sounds about right? Are you open to similar partnerships?

English

Antoni@antnjbert·10 Nis

Let me explain what Daniel means here by: "and the AI models that we have built and take generate rapid human proof-of-concept by taking them through these investigator-initiated trials" Ginkgo’s automated labs rapidly generate experimental data, ProQR’s uses that data to optimize Axiomer oligonucleotides, and the best candidates then move into investigator-initiated trials to generate rapid human proof-of-concept. Axiomer oligonucleotides are programmable RNA-editing drugs. What they're trying to say is that with automated labs, one could have lots of relatively cheap training data for ML.

English

Ginkgo Bioworks@Ginkgo·9 Nis

In the tech industry, products get faster and cheaper every year. In pharma, the opposite has been true. Earlier today, we announced our partnership with ProQR to accelerate drug discovery with our autonomous lab. Hear ProQR CEO Daniel de Boer and Ginkgo CEO Jason Kelly discuss the implications of this partnership, and why partnerships like this are building a new stack that can get critical medicines to patients faster and cheaper.

English

1.8K

Antoni@antnjbert·7 Nis

There must be a startup “We’re going to make sure your framework samples are in LLM training datasets” What helps is asking it to write the plan first and making sure the plan is solid. But the problem is that writing the plan usually means you need to understand the thing deeply and plan nuances ahead, even though the agent is supposed to be smarter.

English

the tiny corp@__tinygrad__·7 Nis

Racing GPT 5.4 xhigh, Opus 4.6, and Kimi K2.5 adding Gemma 4 support to tinygrad. I gave them each their own GPU on a tinybox red. GPT has E2B working and is on to MoE support, Opus runs but has some bug and is looking at norms and scale, and Kimi messed up adding GGUF bfloat16.

English

572

42.8K

Antoni@antnjbert·7 Nis

@InkFromAStone @NikoMcCarty don’t you worry, ai will self improve over the time

English

Obligate Chronovore@InkFromAStone·7 Nis

@antnjbert @NikoMcCarty so you'd be as shit as the error prone true positives you found? great. just what we need.

English

127

Niko McCarty.@NikoMcCarty·6 Nis

ERROR, a Swiss project that pays scientists cash bounties for finding errors in published papers, has only paid out ~$6,900 in prizes after two years. "The project planned to carry out 100 in-depth critiques in 4 years, but only nine have been completed so far," according to reporting in Science. It seems that "money itself is not enough of an incentive" for this, because scientists are often afraid to criticize colleagues, etc. That's a shame.

English

404

34.8K

Antoni@antnjbert·7 Nis

@jyoti_mann1 So I guess we’re at a stage where we need agents that can consume as many tokens per second as possible. One idea - someone could try regenerating all the icon SVG files across Meta. That might consume at least a few billions

English

1.2K

Jyoti Mann@jyoti_mann1·6 Nis

Exclusive: Meta employees are “tokenmaxxing” and competing on an internal leaderboard called “Claudeonomics” for status as a token legend. Over a recent 30-day period, total usage on the dashboard topped 60 trillion tokens.

English

196

133

3.4K

1.9M

Antoni@antnjbert·6 Nis

OpenAI actually did it for me in the sandbox and gave a proper answer, but I had to manually enable Agent mode, and it took a few minutes to respond. I imagine it will be smaller providers, rather than OpenAI, that give an agent secure access to context, let you configure which tools are available, and allow access to your password manager, credit card details with two-factor authentication, and so on. That seems like the way this will eventually work - not as part of OpenAI chats

English

Ivan Burazin@ivanburazin·6 Nis

@antnjbert i have an agent in a sandbox that does all tasks including this type. Yes

English

245

Ivan Burazin@ivanburazin·6 Nis

My wife and I asked ChatGPT the same question (separately, in different languages): "Can you get us to Asia from Europe, given current airspace restrictions?" - same shitty cop out answer both times - suggested routes that don't have international flights - and flights that don't exist at all I find myself more and more annoyed with chatbots now. It seems LLM providers are trying to be cheap on tokens. The chatbots are just caching answers and shooting them back instead of going to the model and actually thinking. The responses are shorter, less thorough, and the entire interaction feels super lazy.

English

4.1K

Antoni@antnjbert·6 Nis

Something interesting I noticed is that GPU vendors and frameworks are starting to think about compute less as “a bunch of separate GPUs” and more as “one program over tiles and shards”. This shift happened because warp/SIMT-level programming exposed too much hardware detail. For single-GPU work, the stable unit became the tile; for multi-GPU work, it became the shard. @tenstorrent makes tiles a native unit of compute and layout - a tile is literally a 32x32 block, and tile layout is the efficient path on the hardware. @nvidia CUDA Tile moves the same thinking into the compiler/programming model - you describe work as tensors / logical tiles, and the compiler maps it onto Tensor Core hardware. @__tinygrad__ pushes it up to the framework layer - shard tensors and models across devices so multiple GPUs behave more like one distributed tensor program.

English

Antoni@antnjbert·6 Nis

@jamwt define slow

English

Jamie Turner@jamwt·5 Nis

gemma 4 31B does indeed run on this MBP M5 Max (128GB), but boy is it slow. opus is safe for now

English

293

39.7K

Antoni@antnjbert·6 Nis

@__tinygrad__ Would you still build it if you got, let’s say, 3 preorders, or are you already way past that number and I’m underestimating the demand?

English

2.2K

the tiny corp@__tinygrad__·6 Nis

exabox preorder is live. we considered raising a round to buy a datacenter, but with exaboxes we don't have to and can build out at our own pace. exaboxes just require a concrete slab and a big plug. tinycorp.myshopify.com/products/exabo…

English

949

92.9K

Antoni@antnjbert·6 Nis

@kastacholamine That’s cool. Out of curiosity, do you estimate human or animal ADMET? Do you also have a feedback loop - meaning, do you go through the whole path, synthesize molecules, and actually test them - or lack of data is not even a problem?

English

Kate Stafford@kastacholamine·6 Nis

@antnjbert Yes we do! (at Numerion Labs, formerly known as Atomwise- just realized my bio is out of date lol) Our flagship models are foundation models for virtual screening and binding affinity prediction. We do also have a suite of ADMET models.

English

ค้นพบ

@tenstorrent @__tinygrad__ @AlexReibman @CyberRobooo @pmarca @stratechery @benthompson @kastacholamine