Sabitlenmiş Tweet
Josef Habdank
200 posts


@caroAugusto18 @nayli_ai True. But this way of working will very quickly disappear. about 10% of my work is what you described, and it will very soon become 0.1%.
English

@jahabdank @nayli_ai text editor (ide) is much better to use when you're writing/editing code yourself and want to ask ai to do pontual things e.g. select code and ask ai to fix something, ask ai to build a function to do something etc
English

@yunta_tsai True. and The data prep and eval test prep is soooo much work.
English

Many people think any given ML project is 99% training.
In reality, it’s 50% evaluation, 40% data cleaning, 8% integration, and 2% training.
The first two set the noise floor for learning. No ML magic matters; the model cannot lower the noise floor, as that’s the optimal bound of Shannon encoding of your data.
Thus, not a single day goes by without me thinking about ontology. Even the old labels have to be constantly reviewed.
English

@petergyang I use all of them, and have them work with each other, prompt each other and use each other as reviewers.
As it is all CLI, why use only one of them?
English

I used to be a die-hard Claude Code user.
Codex has won me over because:
→ GPT-5.5 is excellent
→ Fast mode + generous limits = more reps
→ Little touches like steering, auto remote control on phone, etc
But most of all Codex's browser and computer use capabilities are simply goated. I built so many workflows relying on those two things alone instead of hunting for APIs.
I still use Claude Code too. The app seems to be getting better and the design and frontend capability of Opus is still much better than GPT. Whenever Fable comes back that's another reason to go back.
Honestly, I hope these two compete forever and other players (Cursor/Grok, Gemini, etc) all stay competitive.
This way the builder keeps winning 🙂
English

Den manglende forargelse fra #MeToo, #BelieveAllWomen og alle andre feministiske bevægelser og deres ledere over resultaterne af undersøgelsen af grooming-banderne i Storbritannien siger alt, hvad man behøver at vide om disse bevægelser.
"Tro på alle kvinder, bare ikke de 250.000" ser ud til at være det nye motto.
Dansk

@owenthcarey God is all powerful, and he is not a slave of any rules, including this one.
But it certainly helps.
English

@owenthcarey If God is all powerful he surely has the power to change his mind :)
English

@reallyoptimized You have maybe 1 or 2 weeks a year when it is actually useful. Why install something that you have to fix, pay maintenance for that you have marginal use?
English

@M123dTeagan @farzyness Exactly my thought. He is literally asking for a refresh of Model 3 in unboxed architecture. Jeeez, I am sure nobody in Tesla have thought about it.
English

@farzyness Tesla should make a 2 row version of the cybercab with steering wheels and pedals. They should call it Model 3.
English

@farzyness Your proposal is effectively 'sell model y cheaper and refresh model 3 and sell it cheaper' ...
'you will sell more if you sell your product cheaper' is a truly genius insight, they should pay you a trillion dollars for it.
English

@Mike__eBee @corbin_braun I do not have Spark (I have a box with a single H100) so I do not know. But from what I have read if you load a massive MOE model it works well, as the number of active params are in the range of 50B and then they fit in a single machine. But I never tried it.
English

I was being a bit gruff.
They have loads of cool uses when stacked and the above is true, but you wouldn’t really want to, since this isn't where they shine, and it's slow compared to the memory bandwidth of having multiple cards in the same system. As soon as you load a bigger model across two, it's sharded and everything then incurs communication over the NIC. It's a PITA to manage.
They have a lot of quirks like this that kind of set them apart. There's no separate CPU memory to lean on in the same way you'd have in a traditional server. You'll invariably end up consuming some of that precious GPU memory with things like KV cache, runtime overhead, batching, and other agent / tool stuff. Lots of consessions that dont really make sense.
They shine when linked for running lots of different models and building out local AI infra (that then can just work TM on nvidias infra) rather than trying to cram the biggest possible model onto them.
One can do generation, another embeddings, reranking, training, evaluation, background agents, that kind of thing.
English

Finally got my hands on some H100s last week to push the AI alignment research forward.
I've written before about why I think this work matters - making sure that AI does not harm (or neuter) humanity is the most pressing issue of our time - more important than anything else. All of our past conflicts pale in comparison to what misaligned ASI could do to humanity. And the deepest alignment questions turn out to be old philosophical questions (what is good, what is truth, what is beauty) rediscovered from the engineering side.
AI alignment is key. More people should be doing it.
And on practical side, first runs: training and evaluating custom models - including Qwen 2.5 72B with QLoRA adapters. Training is stable, evals are clean, everything works.
English

Hmmm... I was explicitly told by somebody that you buy some special cables and you can link them....
Quick google says: "you can cluster them for distributed inference on much larger models, effectively using the combined ~256 GB of memory. Nvidia officially supports this via “Spark Stacking” (also called clustering two DGX Sparks). The two systems become a 2-node cluster connected by a high-speed 200 Gb/s RoCE (RDMA over Converged Ethernet) link through their ConnectX-7 SmartNIC ports"
@grok - is it true? Can you stack multiple DGX Sparks to run inference on large models e.g. multiple DGX Sparks to run Deep seek v4 MoE model?
English

@Mike__eBee @corbin_braun Yes they do. You can link Sparks so they work as one cluster. That is one of their key selling point.
English













