Greg Sky

92 posts

Greg Sky

@regsky_

GPU assembly and rework in the USA. Repairing nvidia cards since 2021

Grand Rapids, MI เข้าร่วม Kasım 2024

13 กำลังติดตาม17 ผู้ติดตาม

ทวีตที่ปักหมุด

Greg Sky@regsky_·24 May

New website for gpu upgrades, repairs, and conversions out. gpulab.net Specializing in NVIDIA 48gb 4090 upgrades, and datacenter card sxm repair/ pcie conversions

English

173

Greg Sky@regsky_·15h

@browomo yeah didnt we all try this day one of dgx spark? what changed, or are yall running at low tps?

English

299

Blaze@browomo·1d

This Chinese developer linked two $2,999 NVIDIA DGX Sparks into one box and runs the full Qwen3-235B at home, after dropping his $1,999-a-month cloud bill to zero. He wired 2 small boxes into a single computer, split a giant 235-billion-parameter model in half between them, and serves it across his own network at about 10 tokens a second, with no internet, no cloud, right there on the desk. No data center, no thousand-dollar graphics cards, no monthly cloud bill. Just him, 2 gold boxes the size of a sandwich, one cable between them, and 1 power strip. And here is the whole payoff. He used to pay the cloud $1,999 a month for the same model, and the meter ticked on every request. Now he paid $5,998 once for 2 boxes, they covered their cost in 3 months, and after that he sends as many requests as he wants for free, only electricity. The two Sparks talk over one fast cable, each holds 128GB of memory, and together they carry the whole model, about 73GB loaded per box, with the chip inside pinned near the limit at 96%. Both boxes work as one and keep trading data over the cable, with no cloud in the loop and no single word leaking out. The ready model sits on one local address, and any app on his network calls it as easily as ChatGPT. And here is how he described, in plain words, what this pair of boxes does: "this is a pair of boxes that holds the huge Qwen3-235B model and serves it to one network. the model is split in half, and each box owns its half. parts: // Box 1 (holds the first half of the model and starts the answer fast, the first word appears in under a second) // Box 2 (holds the second half and writes out the rest, about 10 tokens a second) // Cable (connects the 2 boxes and moves data between them on every step, with no lag) // Address (one local address where any app sends its request, like to a cloud model) // Test (a script that runs big prompts through and measures speed and delays) // Monitor (checks temperature, power draw, and load on both boxes every 2 seconds). the model never goes to the cloud. he only steps in when a box runs hotter than 80 degrees or the cable between them starts dropping data." So the system knows exactly what it is, what it is for, and where its limits are. It knows it has to hold the whole huge model across 2 boxes on its own. It knows it has to answer every request locally, with no meter, no limits, and no internet. It knows the human is only needed when a box overheats or the link between them stalls. → The setup runs around the clock on 2 boxes, each pulling under 60 watts → However many requests he sends, the monthly bill is $0, only electricity → The first box starts the answer in under a second → The second writes text at about 10 tokens a second → One request at a time: 838 tokens in 85 seconds, first word in 0.8s → Two requests at once: 697 tokens in 108 seconds, first word in 0.7s → Both boxes sit at 96% load and warm up to 76-78 degrees And only when a chip in a box runs hotter than 80 degrees or the cable between the 2 Sparks drops data does the system call the owner. And when he himself is out on a run or in a coffee shop, he still reaches his own model at home from his phone: sends a big prompt to the local Qwen3-235B, gets the full answer back in under a minute and a half, with no token meter ticking and no limit to hit. Here is what the test shows on his screen during one of the night runs: "one request at a time: 838 tokens in 84.9 seconds, first word in 0.8s, then 0.1s per token." "two requests at once: 697 tokens in 107.6 seconds, first word in 0.7s, then 0.15s per token." "Box 1: chip at 96% load, 76 degrees, 56 watts, 73GB used in memory." "Box 2: chip at 96% load, 78 degrees, 56 watts, the Qwen3-235B model fully loaded." And while everyone around is paying for AI by the month and bumping into limits, his top-tier model just sits on the desk and works as much as he wants: his own little power plant instead of a forever meter. He has no server rack of his own and no cloud account behind it. Just 2 DGX Spark boxes on a desk, one model split in half between them, one local address, and a folder of prompts next to it. Out of everything I have seen this year, this is the cleanest way to stop paying for AI: $5,998 of hardware on the desk once, $0 a month to the cloud, unlimited forever, and between them 2 gold boxes, 1 cable, and the full Qwen3-235B answering at home with no internet.

English

338

82.7K

Greg Sky@regsky_·15h

@topvint i cant wait to start doing this to pc's and consoles.

English

126

Topvint@topvint·1d

Long time no see, postfix adapter...

English

333

13.1K

Greg Sky@regsky_·16h

@birdabo maybe they are wise enough to know if youre on the top, you are the main target. "highest nail gets the hammer"

English

1.9K

sui ☄️@birdabo·1d

google fell off. harvested 20 years of humanity’s data and still can’t crack the top 3 in the AI race. smh.

English

270

3.1K

162.6K

Greg Sky รีทวีตแล้ว

MobstersDaily@MobstersDaily·1d

Tony Soprano Tries ChatGPT

English

420

4.9K

379.5K

Greg Sky@regsky_·17h

Excellent video by @cmuratori talking about the lack of responsibility of tech giants (over the last 2-3 decades) sitting on their thrones telling a generation of people now poisoned by the products they actively developed to just forget all that and keep the 'system' going. I think its great to see these young people boo'ing Eric on stage, it proves that the tech giants don't have a strangle-hold like they think they do, and that perhaps the pendulum is starting to swing the other way. The amount of private equity going into datacenters proves SaaS will have a lesser value, so all your cloud drives, platforms, everything they used to 'control' you may start to unravel from the giants control, so they will now go one level higher and gatekeep hardware rather than what they used to, the software. youtube.com/watch?v=tlQ7Eo…

YouTube

English

Greg Sky@regsky_·1d

Computex?

English

Greg Sky@regsky_·1d

Something is happening with ai where large companies are going to work 10-100x harder to get the same value from llm's that an individual can. It's impossible for everyone's job to get replaced with "ai" because who are those companies going to sell to? But "ai" will augment everyone's productivity. It's an additional tax on your precious time on this earth

English

Greg Sky@regsky_·2d

@FanlessTech so you cooled the copper and heated the block to have them press fit together?

English

15.2K

FanlessTech@FanlessTech·2d

Seems completely reasonable to us

English

187

632

14.5K

577.9K

Greg Sky@regsky_·2d

How can X operate without mods like Reddit? Just need X marketplaces to replace Reddit hardwareswap and I'm gone for good. Scroll on /popular and all you get is mind control and the occasional cat to keep you going

English

Greg Sky@regsky_·2d

@Govindtwtt Not being ever in a position to have been able to compete with the big guys in the past. Now the playing field is even

English

Govind@Govindtwtt·3d

If everyone can build with AI now… what will actually make a startup succeed?

English

13K

Greg Sky@regsky_·2d

@lauriewired We're already playing with the idea of increasing compute density in space before figuring out how to radiate the excess heat in a vacuum?

English

1.5K

LaurieWired@lauriewired·2d

I’ve always wondered, why can’t we run CPU’s hotter? Look at any modern CPU, and the maximum junction temperature (TJMax) is around 95ish C. Leakage current explodes past that point, and reliability drops off a cliff. But the question is…couldn’t you make a “tougher” transistor? The answer is…sorta. The Glenn research center at NASA experiments with silicon carbide wafers. Venus is crazy hot (~470C!). Apparently, NASA has been somewhat successful running a medium-scale IC at 500C for 1 year. Ozark IC also won a contract to develop a multicore RISC-V cpu intended to operate at 500C, but I haven’t seen updates in a while. Perhaps an EE can chime in. Is ~95C TJMax just a local optimum that everyone collectively settled on? How much density would you have to give up to run things just a little bit hotter? I wonder if a special, “low density space H100” could reliably run at say, ~150C, or if that’s completely outside the realm of what’s feasible.

English

198

144

2.6K

227.8K

Greg Sky@regsky_·2d

@Polymarket Unplug your routers. No other way to prepare

English

1.6K

Polymarket@Polymarket·3d

JUST IN: Anthropic announces it will roll out Claude Mythos “in the coming weeks” despite growing fears over the model’s cyber capabilities.

English

333

587

8.7K

1.8M

Greg Sky@regsky_·2d

@sudoaptupdater @RT_com cant wait. so much less corruption and useless overhead. Lets see HR as the first to go!!

English

sudoaptupdate@sudoaptupdater·2d

@RT_com All management, csuite and data intensive needs will be relegated to algorithms. Once management is turned over to the algorithms then the infrastructure is designed around them and No manual labor tasks will require humans. 8-12 years out maybe. Good Luck.

English

2.3K

RT@RT_com·3d

'Entire professions may go EXTINCT, replaced by Artificial Intelligence' — Vladimir Putin 'This process is irreversible and INEVITABLE'

English

139

884

3.3K

203.3K

Greg Sky@regsky_·3d

@PaulYacoubian this is the meta. dont tell anyone

English

101

Paul Yacoubian@PaulYacoubian·3d

bro is adding in ai women to sell his surplus heavy equipment 😂

English

323

285

11.5K

Greg Sky@regsky_·3d

Pcie bifurcation extension modules coming to the shop soon! Gpulab.net/shop

English

Greg Sky@regsky_·3d

Fresh import of 100g of shin Etsu. This stuff is great

English

Greg Sky@regsky_·3d

@AlfinCodes We all tell ourselves we're prototyping until someone else see what we're doing and think it's prod ready and we just kinda go with the hype too

English

Alfin@AlfinCodes·3d

every vibe coder is just blindly accepting code written by an AI that’s also blindly guessing

English

103

110

6.8K

Greg Sky@regsky_·3d

@factpostnews Is this andurel? Oh god.

English

3.2K

FactPost@factpostnews·4d

The Trump administration has moved to provide weapons-grade plutonium to private start-ups.

English

488

993

3.4M

Greg Sky@regsky_·4d

@SolLunix He escaped 🙏

Español

660

Lunix@SolLunix·4d

Pavel durov, Net worth $15 billion doesn't own a phone.

English

175

4.2K

554.7K

Greg Sky@regsky_·4d

@cloudklout @CR1337 And the rom?

English

455

cloudcode@cloudklout·4d

@regsky_ @CR1337 It will not be running Motorola’s software

English

439

CR1337@CR1337·4d

A Reddit user found out that Motorola phones have started hijacking the Amazon app to insert affiliate codes - on a phone that cost $1,900 - talk about greed..

English

186

1.4K

14K

591.9K

ค้นพบ

@browomo @topvint @birdabo @cmuratori @FanlessTech @Govindtwtt @lauriewired @Polymarket