Anton Smith

1.3K posts

Anton Smith

@Anton5mith

Stockholm Katılım Temmuz 2010

608 Takip Edilen289 Takipçiler

Sabitlenmiş Tweet

Anton Smith@Anton5mith·23 Şub

“Data be damned” - my rant about the authoritarianism of data in product management. Roast me! link.medium.com/rMGPImE8Snb

English

Anton Smith retweetledi

Meredith Whittaker@mer__edith·1 Kas

“We tested this, Greg!”

English

245

33K

Anton Smith retweetledi

nixCraft 🐧@nixcraft·3 Tem

Just saw a software devloper coding in a cafe -NO Cursor -NO Windsurf -NO DeepSeek -NO ChatGPT -No Google He just sat there typing code manually in vim on his rusty Thinkpad and reading man pages on Arch Linux What a psychopath 🫣

English

366

393

290.8K

Anton Smith@Anton5mith·8 May

🔥 1,000 AI Minecraft Villagers—WTF? | I Am the Cloud #3 open.substack.com/pub/iamtheclou…

English

Anton Smith@Anton5mith·5 May

@sama OpenAI’s A/B testing “which response do you prefer is flawed”. 101 UX. People are trying to get answers - not help you train your product. I rush two the left hand option *EVERY TIME*. You’re getting zero signal. Speak to your PM/UX team.

English

Anton Smith retweetledi

The New Stack@thenewstack·4 Mar

Palette 4.6 brings new features for edge, VMs and user experience from @spectrocloudinc, by @Anton5mith spectrocloud.com/blog/palette-4…

English

455

Anton Smith retweetledi

Andrej Karpathy@karpathy·3 Şub

There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

English

1.4K

3.6K

33.6K

6.9M

Anton Smith retweetledi

Andrej Karpathy@karpathy·27 Oca

I don't have too too much to add on top of this earlier post on V3 and I think it applies to R1 too (which is the more recent, thinking equivalent). I will say that Deep Learning has a legendary ravenous appetite for compute, like no other algorithm that has ever been developed in AI. You may not always be utilizing it fully but I would never bet against compute as the upper bound for achievable intelligence in the long run. Not just for an individual final training run, but also for the entire innovation / experimentation engine that silently underlies all the algorithmic innovations. Data has historically been seen as a separate category from compute, but even data is downstream of compute to a large extent - you can spend compute to create data. Tons of it. You've heard this called synthetic data generation, but less obviously, there is a very deep connection (equivalence even) between "synthetic data generation" and "reinforcement learning". In the trial-and-error learning process in RL, the "trial" is model generating (synthetic) data, which it then learns from based on the "error" (/reward). Conversely, when you generate synthetic data and then rank or filter it in any way, your filter is straight up equivalent to a 0-1 advantage function - congrats you're doing crappy RL. Last thought. Not sure if this is obvious. There are two major types of learning, in both children and in deep learning. There is 1) imitation learning (watch and repeat, i.e. pretraining, supervised finetuning), and 2) trial-and-error learning (reinforcement learning). My favorite simple example is AlphaGo - 1) is learning by imitating expert players, 2) is reinforcement learning to win the game. Almost every single shocking result of deep learning, and the source of all *magic* is always 2. 2 is significantly significantly more powerful. 2 is what surprises you. 2 is when the paddle learns to hit the ball behind the blocks in Breakout. 2 is when AlphaGo beats even Lee Sedol. And 2 is the "aha moment" when the DeepSeek (or o1 etc.) discovers that it works well to re-evaluate your assumptions, backtrack, try something else, etc. It's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are *emergent* (!!!) and this is actually seriously incredible, impressive and new (as in publicly available and documented etc.). The model could never learn this with 1 (by imitation), because the cognition of the model and the cognition of the human labeler is different. The human would never know to correctly annotate these kinds of solving strategies and what they should even look like. They have to be discovered during reinforcement learning as empirically and statistically useful towards a final outcome. (Last last thought/reference this time for real is that RL is powerful but RLHF is not. RLHF is not RL. I have a separate rant on that in an earlier tweet x.com/karpathy/statu…)

Andrej Karpathy@karpathy

DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being brought up today are more around 100K GPUs. E.g. Llama 3 405B used 30.8M GPU-hours, while DeepSeek-V3 looks to be a stronger model at only 2.8M GPU-hours (~11X less compute). If the model also passes vibe checks (e.g. LLM arena rankings are ongoing, my few quick tests went well so far) it will be a highly impressive display of research and engineering under resource constraints. Does this mean you don't need large GPU clusters for frontier LLMs? No but you have to ensure that you're not wasteful with what you have, and this looks like a nice demonstration that there's still a lot to get through with both data and algorithms. Very nice & detailed tech report too, reading through.

English

363

2.1K

14.3K

2.4M

Anton Smith retweetledi

Trung Phan@TrungTPhan·27 Oca

Semiconductor fund managers after seeing 4 memes about DeepSeek on X

The Kobeissi Letter@KobeissiLetter

Chip Stocks Overnight Reaction to DeepSeek: 1. Arm, $ARM: -5.5% 2. Nvidia, $NVDA: -5.3% 3. Broadcom, $AVGO: -4.9% 4. Super Micro, $SMCI: -4.6% 5. Taiwan Semi, $TSM: -4.5% 6. Micron, $MU: -4.3% 7. Qualcomm, $QCOM: -2.8% 8. AMD, $AMD: -2.5% 9. Intel, $INTC: -2.0% US markets are on track to erase over $1 trillion of market cap in Monday's session. All as earnings, tariffs, and the Fed meeting are in the spotlight. It's going to be another wild week.

English

1.2K

9.7K

1.5M

Anton Smith@Anton5mith·3 Oca

@TracketPacer Unmanaged switches can do it too (not only hubs). I also noticed people that are quite good at networking in general didn’t know this.

English

TracketPacer@TracketPacer·2 Oca

today i found out how many tech professionals on tiktok think vlans are a physical thing on a link & that if a hub were attached to a trunk port it would only be able to show a single vlan

English

144

1.1K

72.6K

Anton Smith@Anton5mith·13 Ara

@unclecode Oh I see you’re saying get links then work through them for the next pages. Hmm. Yeah. I can try :p but I feel like I’ll be duplicating some of what you’re already planning ;)

English

Anton Smith@Anton5mith·13 Ara

@unclecode @unclecode you are awesome. I ended up hitting sitemap.xml on the site that I’m interested in but it feels fragile as it’s manually updated. Can I use crawl4ai already today to get the list of all pages? Or any particular tool you’d recommend?

English

Anton Smith@Anton5mith·12 Ara

@unclecode I love crawl4ai but for the life of me can't find anything to actually crawl an entire site in the docs. Possible that I'm blind?

English

248

Anton Smith retweetledi

Alex Cheema@alexocheema·9 Kas

M4 Mac Mini AI Cluster Uses @exolabs with Thunderbolt 5 interconnect (80Gbps) to run LLMs distributed across 4 M4 Pro Mac Minis. The cluster is small (iPhone for reference). It’s running Nemotron 70B at 8 tok/sec and scales to Llama 405B (benchmarks soon).

English

591

2.5K

25.3K

3.5M

Anton Smith@Anton5mith·19 Eki

@TracketPacer Btw never seen these on network cables - only on TV AC power cables. So what gives

English

Anton Smith@Anton5mith·19 Eki

@TracketPacer Ok so I could use ChatGPT. But I’m not gonna. As a white man network dude, can you please explain to me wtf these do? No sarcasm, serious Q

English

TracketPacer@TracketPacer·17 Eki

ferrite

Euskara

440

62.7K

Anton Smith@Anton5mith·15 Eki

@sirdeh I don’t know what this means but I’m still liking it

English

211

Anton Smith retweetledi

Spectro Cloud@spectrocloudinc·14 Eki

Explore Palette 4.5 in @Anton5mith' blog highlighting: 1️⃣ Extended edge capabilities – manage your #K8s at any scale 2️⃣ LocalUI now supporting multi-node and connected clusters 3️⃣ New deployment model – more flexibility for your existing infrastructure 🔗 hubs.la/Q02TjQFc0

English

112

Anton Smith@Anton5mith·13 Eki

@kelseyhightower People who would never do X will do X when they get desperate enough.

English

Anton Smith@Anton5mith·13 Eki

@kelseyhightower It’s a good question but it’s easy to interpret this in negative ways for people where it gets grey really quickly. It’s not nuanced enough. What should I think about a scammer? And then learn they’re using the scamming to help someone else? Maybe feed children?

English

263

Anton Smith@Anton5mith·4 Eki

@j_p_catanzaro @danieldibswe Bingo. The client code is what needs to do both the cryptographic verification of the content, but also needs to check the dates. There’s no magic cryptography that itself knows about dates. @danieldibswe I think you’re observing a bug in a client.

English

Joe Catanzaro@j_p_catanzaro·4 Eki

@danieldibswe I think it might depend on the validation being used by the system in question. If all it does is validate the root and the expiry date it *may* still show as valid

English

158

Daniel Dib@danieldibswe·4 Eki

The validation of certs seems to be somewhat of dark arts. If you have a cert that was signed by an intermediate issuing CA, and issuing CA cert had expired, but root CA cert still valid, what would you expect to happen? It seems if entire cert chain is sent, it still works.

English

Keşfet

@sama @spectrocloudinc @TracketPacer @unclecode @exolabs @sirdeh @kelseyhightower @elonmusk