1i (@1i__is) - Twitter Profili | Zamantika Mersobahis Locabet

Sabitlenmiş Tweet

1i@1i__is·21 Nis

All my 𝕏 articles! Multiple formats: 📰 Full version [𝕏] 🔥 Abridged ⇒ go fast! 😼 GitHub repo (all + code & data) 📚 SolveIT notebook dialog Ⓜ️ Markdown (AI-friendly) Full versions form the main sequence below. For each article, check replies to find other formats!

English

1

0

2

302

1i@1i__is·8m

the entry ticket will keep falling the capabilities will keep growing bide your time, then buy cheap second hand (most enterprise/prosumer is -75% after 4-5 years). gets you 50~80% there in perf for 20~50% of the cost. thinkpads are the typical example for laptops. silicon doesn't age, perfs stay good as new to the last day. just get good at servicing parts (cleaning with air duster + isopropyl alcohol, changing fans, repasting, etc). i've mostly been freeloading on cheap enterprise refurbs since 2016 when discovering that trick. cuts your bill in half compared to new customer SKUs. no regrets, you just get better hardware (and skills) in the process.

English

0

3

Grey Whitepaper@WhitepaperGrey·1h

@1i__is I live on the cloud but I long for dedicated hardware and open source LLMs..

English

1

0

1

7

1i@1i__is·2d

this is the way and it's only going to get better as GB capacity increases and models reqs decrease. on-prems/MSP self-hosted is still how most SMBs want to operate for their most important data. big clouds are a no-no for mission-critical flows. this is only the beginning!

Dee@dee_hw

On-Premise Business AI Center After my posts on the 2-GPU and 4-GPU builds, people reached out asking how to build an 8-GPU box for their businesses. Why? - Protect their IP - Protect customer data - Save on inference costs - Train their own models Here's how to build one: 🧵

English

4

0

3

107

1i@1i__is·17m

@WhitepaperGrey they tried a bit with Windows Home Server (2007-2009) but that failed hard. you come with this when Apple is on the iPhone… FWIW Gates left in 2008. bad times at MS. they should've tried in 2000. back then i had two PCs with one file serving, proxy, etc and it was glorious.

GIF

English

0

7

Grey Whitepaper@WhitepaperGrey·1h

@1i__is Should have been a railway all along to sell consumers their own server farms. Microsoft and so many others never saw it coming.

English

1

0

1

4

1i@1i__is·2d

been saying it for decades: everyone should have a home server to perform whatever their needs are now it includes GPUs for AI (+ 3D/video processing for gaming etc.) it's a neat way to get great private performance while thinning clients footprint (heat/noise & total bill).

the tiny corp@__tinygrad__

@APompliano We need stacks of GPUs in every house, not really big stacks of GPUs controlled by companies who are trying to extract value from us.

English

1

0

1

41

1i@1i__is·5h

this is exactly what drives me too i want to congratulate but above all **thank** you for making it this way. it's like hope with substance, real-world proof that it can be done, and well. it really makes a difference when i have doubts, second-guess it all. like a good reality-check "i'm not hallucinating, so and so did it, are doing it, rn!" and it's positively raw energy when i lock in. i get that from you and the few others who actually ship, write, share, sell, teach… whatever they call success. examples to follow are worth a millions words. i wish you the absolute best (and to your family!), sincerely, you deserve it. --- as for your beef with Ahmad, i won't be part of that but it's obvious there's too much divergence between you two (in style, goals, values…) how to say… i'd never put you both on the roster haha. it's ok too. if we all take good care of *our own* world (people, places, and things we love and know first-hand), then "the" world is a much better place and we don't even need to all get along for that. have a great one and KISS your 6000s for me 🤓🤖

English

0

34

0xSero@0xSero·5h

It’s hard to LARP your way to the top. I am also really open to people telling me I am wrong, I am learning and I make mistakes. That’s what life is about, growing and doing what you enjoy. Winning at it is only a perk

English

5

0

132

4.6K

0xSero@0xSero·5h

3 months ago I blocked this guy and he made such a scene about it that to this day people still ask me why. The reason I did so is because despite him hyping me up, he’d constantly be writing about how I’m a larper. Now my “larping” has resulted in: - Meeting folks at Nvidia - Meeting folks at OpenAI - Working with Factory - Teaching 1000s of people 100+ GitHub repos: - day 0 deepseek-v4-flash on sm120 - best performance on Framework - REAP-MLX + REAP-Strix - VLLM-STUDIO nearly 1k stars - GLM-4.6/4.7 on a MacBook - Qwen-3.5-plus for 8x 3090s - Parchi - AI-data-extraction 1k stars - First working turboquant on vLLM 4 months ago: - Interned at a large AI company - Produced 15+ models with 100k monthly downloads - Created a discord server and taught nearly 1000 people for free (still doing it) 6 months ago: - Released the first REAP quants - Sponsored by Anthropic running Claude code Warsaw with 500+ attendees - Trained nanochat at home 12 months ago: - Built my first rag self hosting on a MacBook funny enough - Spending 5-10k a month in tokens on random product buildings 18 months ago: - Taught a 200 person course (for free) how to use AI for coding 24 months ago: - Applied research for the Ethereum foundation on ZK proofs - Built Rosetta Node a solidity <> English translator built on OpenAI

English

108

8

620

31.5K

1i@1i__is·5h

@chimpansky don't threaten me with an epic time haha 🤝

English

0

9

Chimpansky@chimpansky·6h

@1i__is haha deal 🤝 czech food + AI conversations is a dangerous combo.

English

1

0

1

46

Chimpansky@chimpansky·1d

As we keep growing, I want to better understand this little AI corner. Where in the world are you building from? 🌍

English

7

1

8

191

1i@1i__is·6h

if you know Rust, sure. but it would be much harder for me to maintain code in Rust than in Python or Go for instance. the memory mgmt overhead is way too much for most trivial problems, there's a place for all levels of abstraction. Python and many languages like JS have C/C++ bindings under the hood, so fast enough. i'm a big believer that the "best" language is the one *you* know that fits the job. and that for most apps, unless you're hyperscaling or something, if your code is well-designed you will not notice the difference on modern hardware. i don't see how LLMs make any difference except for vibe code, throwaways, things you need in passing but don't intend to maintain. the worst would be using LLMs to write Rust because it's fast without any idea what the code does. i think the growth of the developer matters more at the end of the day for the app than choosing this or that tech/stack.

English

0

1

26

dawon 🇺🇸@_imdawon·6h

For the vast majority of software that people are writing, i don't see many excuses to use anything other than Rust.

wavefnx@wavefnx

Thought this was a joke 1mil lines Bun commit re-writing Bun in Rust They learned Bun is now officially the good side of history

English

5

0

5

362

1i@1i__is·7h

@chimpansky well if we ever visit 🇨🇿, i know who I'd gladly take out for a great dinner and a chat! :)

English

1

0

10

Chimpansky@chimpansky·7h

@1i__is Yeah i love it so far

English

1

0

1

12

1i@1i__is·7h

yeah the energy is awesome. haven't seen this since 1999 (literally), the early web era (and I was still a teen so this is much bigger to me). AI has unleashed me, seals removed! i feel empowered enough to take on dream projects, problems i've been thinking for 10 or 20 years… with tech that was sci-fi 5 minutes ago… it's so cool and so inspiring.

English

0

17

Chimpansky@chimpansky·7h

@1i__is 🙏 this is honestly one of the coolest parts of this whole AI wave. people aren’t just building products. some are genuinely trying to build a different life through it.

English

1

0

1

14

1i@1i__is·7h

@chimpansky that's a very fine duo of places!

English

1

0

1

15

Chimpansky@chimpansky·7h

@1i__is Most of the time Czech Republic, sometimes Bay area.

English

1

0

1

23

1i@1i__is·13h

@Snixtp 24 tok/s each is really awesome at that concurrency! this ties back to a chat i've had earlier about a hypothetical shared b300 node, but your numbers seem to hint at more Blackwell or Qwen optimizations than paper suggests so many things to test! x.com/i/status/20548…

Adam Louly@LoulyAdam

So to give you a detailed and very close number i'd have to dig into the architecture more, see how much kv per token holds and weights, I'll make a general rule of how to approach that, and you can refine it if you dig more into the model specs. So the amount of concurrent users relies heavily on how much memory you have left for kv cache, so you'd need to compute how much memory we're allocating for fixed stuff and how much left for kv for example if we're lets say kimi for example fp8, to make math easy let's do 1T x fp8 = 1TB will be allocated for the weights. for int4 or nvfp4 it would be 520GB activation workspace you'd need like 20-30gb, this is usually relies on your max prefill batch possible you just multiply batch tokens x hidden dim x fp8 or fp16 activation x number of intermediate tensors. keep like 2% for cuda overhead, another 4% for safety margins and you're left with like 1.2TB of memory for KV example with MHA, MLA is much much more efficient maybe less than 10% of this number for MHA compute kv per token =2 x n layers x num heads x d_model x fp8 = 1mb (for easy math). now you can get a sense of how many concurrent users if you have 1.2 tb left for 128k context its like 128GB per user so you'll be able to serve 10 concurrent users 1m you'll be able to serve a single user. with MLA ratio is 10x give or take so 100 users with 128k users and 10 users with 1m context. keep in mind that this is the laziest calculation, I just rounded everything up for easy math, depends on SLAs you could serve way more than this, but maybe with slower TBT.

English

1

0

5

2.4K

Espen JD@Snixtp·14h

The concurrency on the Pro 6000 is just crazy cc=96 2296.5 tok/s

English

26

18

407

33.5K

1i@1i__is·14h

@LoulyAdam @MainzOnX thank you sooo much for this detailed answer! this really helps a lot! i'm gonna study this deeper :D

English

0

1

43

Adam Louly@LoulyAdam·17h

So to give you a detailed and very close number i'd have to dig into the architecture more, see how much kv per token holds and weights, I'll make a general rule of how to approach that, and you can refine it if you dig more into the model specs. So the amount of concurrent users relies heavily on how much memory you have left for kv cache, so you'd need to compute how much memory we're allocating for fixed stuff and how much left for kv for example if we're lets say kimi for example fp8, to make math easy let's do 1T x fp8 = 1TB will be allocated for the weights. for int4 or nvfp4 it would be 520GB activation workspace you'd need like 20-30gb, this is usually relies on your max prefill batch possible you just multiply batch tokens x hidden dim x fp8 or fp16 activation x number of intermediate tensors. keep like 2% for cuda overhead, another 4% for safety margins and you're left with like 1.2TB of memory for KV example with MHA, MLA is much much more efficient maybe less than 10% of this number for MHA compute kv per token =2 x n layers x num heads x d_model x fp8 = 1mb (for easy math). now you can get a sense of how many concurrent users if you have 1.2 tb left for 128k context its like 128GB per user so you'll be able to serve 10 concurrent users 1m you'll be able to serve a single user. with MLA ratio is 10x give or take so 100 users with 128k users and 10 users with 1m context. keep in mind that this is the laziest calculation, I just rounded everything up for easy math, depends on SLAs you could serve way more than this, but maybe with slower TBT.

English

1

0

2

2.4K

Adam Mainz@MainzOnX·2d

If you had access to a gb300 server rack right now what are you building with it? Would you know where to start?

English

10

0

14

3K

1i@1i__is·17h

@_imdawon get well soon, champion

English

0

1

16

dawon 🇺🇸@_imdawon·18h

Tummy hurty

English

1

0

2

67

1i@1i__is·17h

yeah of course :) for now it's really just a thought experiment. but i may turn it into a real experiment with rentals first to test the waters, a PoC. maybe this year or 27. the big LLM case would be something like Kimi K2.6 in FP8 (both weights and kv) at max context (256k). Or even DeepSeek V4-Pro (FP8, weights 1.6T, 49B active, up to 1m ctx lol) (HF links below) workload would be partially decided by performance i guess. if people can hammer it 24/7 with lots of agents then cool. but at the baseline it's typical "manual" LLM usage, with as much batching as possible for things that can wait. the goal is to minimize idling as usual, and it may influence what kind of workload and crowd to spec such a project for. there are so many variables left to test, so the guesstimate has to be very fuzzy, that's my problem :D i'm fishing for expertise tbh, if only to learn the problem. lots of experiments needed to optimize. and with a tiny userbase, it won't be smooth 24/7. a bit more scale would surely help make UX really better. but baby steps, right? :D #model-downloads" target="_blank" rel="nofollow noopener">huggingface.co/deepseek-ai/De… #2-model-summary" target="_blank" rel="nofollow noopener">huggingface.co/moonshotai/Kim…

English

1

0

1

48

Adam Louly@LoulyAdam·19h

It would not be informal if I were to just answer from the get go :)) many things contribute to that, I’d want to know the size of the model how many active Moe params, weights dtype we’re using, kv cache dtype, which serving optimizations we’re having, workload type etc… before I can give a guess (and it still won’t be a perfect guess)

English

1

0

1

37

1i@1i__is·19h

@aijoey @sudoingX indeed, and for years he sent literally thousands of engineers pro bono for months on end to customers offices, to improve CUDA for their needs all upstreamed to drivers for all customers to benefit come the next update that's business "moat" if there ever was!

English

1

0

1

24

Joey@aijoey·19h

@sudoingX i saw an interview where he explains how he made the decision to put cuda on every gpu even though there were being used for gaming. it was a big bet. but once devs caught on, cuda was there and ready. super cool history tbh.

English

1

0

2

118

Sudo su@sudoingX·19h

cuda enabled more wealth and innovation than most companies ever will. all running on jensen's baby. one software layer changed everything.

English

6

1

49

2.1K

1i@1i__is·20h

@LoulyAdam @MainzOnX Right? :D Also, given your exp with inference, I can't resist to ask: I'd be super interested in your ballpark/napkin/informed guesses for the kind of concurrent userbase you can serve with such an 8x node. I've no idea personally…

English

1

0

1

47

Adam Louly@LoulyAdam·1d

@1i__is @MainzOnX This is neaat!

English

1

0

1

43

1i@1i__is·20h

thanks for asking! i've replied in the main (quoted) thread. the gist is: 100%. You learn to code for the thinking. - Code is notation for thought. - AI is a lever: it applies force in whatever direction you choose. So our job is to choose well and learn so we can choose better next time. Vibe coding feels productive but often isn't: absorption without growth. The bottleneck was always cognition, not typing. It still is. Understanding compounds; AI can't shortcut that. If you outsource the thinking, you stop upskilling. Learn to code because it makes you a better thinker. The tool just amplifies *you!* : )

English

0

13

Chimpansky@chimpansky·2d

ai can write code, but you still need to know what good looks like, where it’s likely wrong, and how to debug when the happy path breaks. Interested what are your thoughts @1i__is @henrytdowling @PalmsBurnt @DanielSmidstrup

Chimpansky@chimpansky

be honest: if AI writes most of the code now, is learning to code still worth it? • yes, fundamentals still matter • only enough to direct AI • no, product thinking matters more • I never really learned curious how people actually think about this long term. P.S. my real answer is hidden somewhere in the image, can you find it? 👀 vote or drop your take below. 👇

English

4

0

5

232

1i@1i__is·20h

i think that fundamentals matter forever, knowing reality is "evergreen" lol. some things just don't change. i wrote about this a few months ago, haven't published it (felt long, needed work). but it does answer in full how i see this. the gist is you can level up with the LLM (thus will have to, because others will) including in programming. but there's also a "last mile" kinda wordpress-easy future for many more people (which is cool, lower barrier of entry). some ideas i like: > The bottleneck was always cognition; it still is. So I strive to amplify myself first. > The LLM is a lever. It applies force in whichever direction I choose. So choose well. > AI is a shift of skills, not a replacement for sound engineering principles, let alone thinking. Pics with excerpts relevant to this discussion. + link for those who want to read it in full. (last sections below "What is beauty" are more general about AI; above is about how I use notebooks more specifically) #what-is-beauty_10" target="_blank" rel="nofollow noopener">share.solve.it.com/d/8d5d18c6213f…

English

0

1

20

Chimpansky@chimpansky·2d

be honest: if AI writes most of the code now, is learning to code still worth it? • yes, fundamentals still matter • only enough to direct AI • no, product thinking matters more • I never really learned curious how people actually think about this long term. P.S. my real answer is hidden somewhere in the image, can you find it? 👀 vote or drop your take below. 👇

English

5

1

8

388

1i@1i__is·21h

yes, that space in-between, exactly, and i believe that's one way through the bottleneck you describe. being in the field with experienced people is how we've always trained the best craftsmen. in software world tho, the field is more like SSH + comms, so remote people can team up. job profiles do change fast before/after disruption by tech. there's so much i see already that will become the basics. yet most people are still very fuzzy about most of it (i'll plead guilty to that, the full-stack AI vertical is very big, and its layers aren't converging/streamlining yet. it's the far west). whatever model we'll use is, i agree, of little concern in the big picture (even if progress stopped now, which it won't).

English

0

5

Chimpansky@chimpansky·1d

really appreciate the huge reply. honestly the apprenticeship/guild part resonates with me more than most conversations around AI right now. a lot of people are acting like access to models is the bottleneck, but i increasingly think the bottleneck is judgment, taste, and learning from people who’ve actually deployed things in messy real environments. forums/social feeds are great for discovery, but terrible for transmitting deep operational knowledge. schools are often too slow and abstract. there’s probably space for something in between.

English

1

0

1

11

1i@1i__is·1d

@dee_hw thank you! you guys are simply great 🙏💪🖖

English

0

1

27