Bill

12.4K posts

Bill banner
Bill

Bill

@BillQueens

Internet Guy

Queens Katılım Ekim 2022
1.8K Takip Edilen590 Takipçiler
Lotto
Lotto@LottoLabs·
@PopVerseYT @BillQueens I think local llama on Reddit and @0xSero or @sudoingX have a x community. If you got a 5090 get 27b running on there and install Hermes agent. (It’s not sota but it’s pretty good)
English
2
0
2
18
Lotto
Lotto@LottoLabs·
Hermes Agent + qwen 27b This project is to see if I can get a fully functioning and safe saas/business set up w/ hermes and 27b just off telegram. Site is pretty much done, auth is done, just hooking up stripe payments and manually auditing. Will probably do a sota audit with a couple different models to see if I miss anything.
English
3
2
90
4K
Bill
Bill@BillQueens·
@sudoingX My wife tries this on me everyday
English
0
0
2
41
Sudo su
Sudo su@sudoingX·
Reinforcement learning
English
3
0
14
1.4K
Bill
Bill@BillQueens·
@LottoLabs Ya I was so sick of vLLM horseshit lol. I just built my own frontend today to control everything.
English
1
0
1
117
Lotto
Lotto@LottoLabs·
@BillQueens Fair I’m going into those weeds now 😭
English
1
0
1
127
Bill
Bill@BillQueens·
@LottoLabs I’ve just been ripping llamma.cpp so GGUFs have been solid for me. Had so many issues with vLLM nightlies said fuck it and haven’t looked back.
English
1
0
1
134
Lotto
Lotto@LottoLabs·
@BillQueens I think I ran that quant on a rtx6000 pro and it obviously ripped, I like the 27b overall though, mainly used the unsloth, might as well try nvfp if you have Blackwell chips
English
1
0
0
166
Bill
Bill@BillQueens·
@LottoLabs GGUF will look at trying this tomorrow - what are your thoughts so far?
English
1
0
1
156
0xSero
0xSero@0xSero·
Man, what the hell. That's a few years salary for most of the world donated in under 24 hours. I promise I will do everything in my power to make this worth it for all of you.
0xSero tweet media
English
61
61
1.8K
50.5K
Sudo su
Sudo su@sudoingX·
how much VRAM do you have right now
English
193
7
116
16.2K
Boomer’s Bets
Boomer’s Bets@BoomersBetz·
Alright ive cooled down. Whats the lock
English
10
0
13
3.7K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
I see in uv repo on github there are both MIT and Apache 2.0 licenses, which is the right one? 🤔 Relevant if someone wants to create an OpenUV fork...
Ivan Fioravanti ᯅ tweet media
English
4
1
11
2.2K
Bill
Bill@BillQueens·
@leerob what’s the one more thing ?
English
0
0
1
6
Boomer’s Bets
Boomer’s Bets@BoomersBetz·
Welcome to selling insurance TCU
English
4
1
16
2.2K
Bill
Bill@BillQueens·
@Teknium My dev box is just a server basically. Threadripper and 5090 - 128GB ECC with Ubuntu on it.
English
0
0
2
33
Bill retweetledi
Nicky
Nicky@nickturani·
What an upset! My bracket is in shambles
Nicky tweet media
English
113
10.5K
93.5K
0
Bill
Bill@BillQueens·
@TeksEdge In what world is any of this accurate
English
0
0
1
201
kayhe
kayhe@selfhostedmind·
@maria_rcks did theo really cheap out on a refurbished 2022 M2 air ?? wow so generous
English
12
0
121
8.6K
The Daily Hitman
The Daily Hitman@DailyHitman·
Spent two days straight staring at my wall visualizing the tourney. Who wants the first POD? 🧘‍♂️
English
6
0
70
4.6K
Bill
Bill@BillQueens·
@__tinygrad__ Imagine not building shipping container token machines in the year of our lord
English
0
0
0
273
Andrew Feldman
Andrew Feldman@andrewdfeldman·
NVIDIA's biggest GTC announcement was a $20 billion bet on the same problem we solved 6 years ago. Their next-gen inference chip - not available yet - has 140x less memory bandwidth than @cerebras. To run a single 2 trillion parameter model, you need 2,000+ Groq chips. On Cerebras, that's just over 20 wafers. Even paired with GPUs, Groq maxes out at ~1,000 tokens per second. We run at thousands of tokens per second today. And every day. In production now. Why? When you connect 2,000 chips together, every interconnect has latency. Every cable has overhead. It doesn't matter what your memory bandwidth is on paper if you're bottlenecked by the wiring between thousands of tiny chips. We solved this with wafer scale. One integrated system. Little interconnect tax. Jensen told the world that fast inference is where the value is. He’s right - it’s why the world’s leading AI companies and hyperscalers are choosing Cerebras.
Andrew Feldman tweet media
English
69
71
740
149.5K