CKtalon

80 posts

CKtalon

CKtalon

@CKtalon

Singapore Katılım Eylül 2008
143 Takip Edilen61 Takipçiler
CKtalon
CKtalon@CKtalon·
@abacaj That’s if they even release a GPU
English
0
0
2
63
anton
anton@abacaj·
why can't AMD do that???
English
7
0
11
2.5K
anton
anton@abacaj·
idk how he does it, every time he releases a new GPU, I want to buy it
English
17
7
152
8K
CKtalon
CKtalon@CKtalon·
@RealJosephus @suchenzang Seems like the corpus used to train the tokenizer isn’t as clean as the corpus used to train the LLM
English
0
0
0
374
Joseph
Joseph@RealJosephus·
@suchenzang <114900> -> “最新高清无码” = “中国特色社会主义”
Joseph tweet media
中文
9
13
47
61.3K
Susan Zhang
Susan Zhang@suchenzang·
this new "o200k_base" vocab for gpt-4o makes me want to clutch my pearls
Susan Zhang tweet media
English
44
67
725
310.2K
CKtalon
CKtalon@CKtalon·
@SashaMTL Just stating facts. BLOOM having 1131 citations despite being released in 2022 while Llama2 having 3855 despite being released 8 months later. BLOOM was just severely undertrained with the amount of limited compute they had, with way too much ambition to do so many languages.
English
0
0
2
121
Sasha Luccioni, PhD 🦋🌎✨🤗
So LLaMa 3's carbon footprint is... huge? 🤯 They estimate it to be 2,290 tons of CO2eq, compared to 550t for training GPT-3 and 66t for training *all* of the BLOOM models (1B-176B) 🌬️
Sasha Luccioni, PhD 🦋🌎✨🤗 tweet mediaSasha Luccioni, PhD 🦋🌎✨🤗 tweet media
English
39
51
245
134.5K
CKtalon
CKtalon@CKtalon·
@dctanner @mov_axbx That’s a really expensive server on eBay considering its age and specs. It seems any used rack that can hold greater than 4 GPUs are highly inflated in price now.
English
1
0
0
34
Nathan Odle
Nathan Odle@mov_axbx·
Waiting on a GPU and an electrician but this thing is about ready. I’ve gotten a lot of questions about this AMD Epyc 7x4090 build, thinking of doing a Spaces on parts selection, etc if you guys think you’d be interested
Nathan Odle tweet media
English
42
20
430
74.6K
Damien C. Tanner
Damien C. Tanner@dctanner·
@mov_axbx Lovely rig. This week I managed to find 8x slim 3090 turbos. This let me squeeze them into a super server. Was planning to go open rig if I hadn’t found the turbos.
Damien C. Tanner tweet media
English
2
0
5
228
Sebastian Rojo
Sebastian Rojo@arpagon·
After 20 years of Linux loyalty, here I am, tempted by Apple's MLX for local inference - ($3,199) MacBook Pro M3 Max - ($2,500) 2x RTX 3090 Thanks, Apple MLX, for my existential tech crisis. 🙃
Sebastian Rojo tweet mediaSebastian Rojo tweet media
English
9
0
50
10.3K
NVIDIA GeForce
NVIDIA GeForce@NVIDIAGeForce·
We’re giving you TWO ways to WIN a one-of-a-kind GeForce RTX 4080 SUPER signed by NVIDIA CEO, and founder, Jensen Huang 👀 If you’re at CES head to our partner booths to enter 👉 nvidia.com/en-us/geforce/… Want to WIN here on social? ⚫Comment #RTXSUPER ⚫Like this post
NVIDIA GeForce tweet media
English
41.7K
3.6K
45.1K
1.8M
CKtalon
CKtalon@CKtalon·
@Yampeleg @abacaj Did similar trainings, and from some manual evaluations, the loss might have plateaued for hundreds of thousands of steps, but the quality of the generations are better given more epochs.
English
0
0
0
491
Yam Peleg
Yam Peleg@Yampeleg·
@abacaj I have a 100m translation model I trained for 2 month+ on ~18B tokens infinite loop of epochs. It got stuck. Full stop, the loss doesn't move. (No matter what trick I tried: batch_size, grad noise/clip, lr, w_decay..) There is a HARD limit to finite params..
English
9
3
61
40.1K
anton
anton@abacaj·
The tinyllama model clearly shows that small models are actually *saturating* in terms of performance... it's 1.55% better than OPT? Tinyllama was trained on a whopping 16x more tokens...
anton tweet media
English
31
20
373
69.9K
CKtalon
CKtalon@CKtalon·
@charlieholtz @elevenlabs In the not-so-distant future, pairing this with the Meta Ray-Bans and have it narrate whatever you see will be mind-blowing.
English
0
0
0
24
Charlie Holtz
Charlie Holtz@charlieholtz·
David Attenborough is now narrating my life Here's a GPT-4-vision + @elevenlabs python script so you can star in your own Planet Earth:
English
690
4.4K
25.8K
4M
CKtalon
CKtalon@CKtalon·
@Suhail Helps when a ton of data is distilled from a powerful LLM? Phi-1.5 kinds of shows that generative data can produce a powerful model.
English
0
0
0
42
Suhail
Suhail@Suhail·
It’s interesting that it only takes 4 mo now to train an LLM to GPT 3.5/Llama 2 from scratch. Prior to Jan this year, nobody had practically replicated GPT-3 still. It doesn’t seem like the lead of GPT-4 will last too much longer.
English
48
84
1.1K
369.5K
CKtalon
CKtalon@CKtalon·
@BramVanroy OpenNMT does have most of those implemented since they are also now supporting LLMs. Marian looks dead, perhaps due to lowered importance by MSFT in preference of LLMs.
English
0
0
1
40
CKtalon
CKtalon@CKtalon·
@Yampeleg Just the preview shows how dirty the dataset is…
English
0
0
0
411
Yam Peleg
Yam Peleg@Yampeleg·
The most powerful open source instructions dataset: Flan. 378 Million samples. (~300GB) [1] - Link: huggingface.co/datasets/Open-… Why should you care? 🤔 - Flan is an incredibly powerful dataset [2] and some famous models trained on it (FlanT5, UL2..) hold the top positions on various leaderboards to this day. - The main reason for it is the quality and diversity of the data. - It is huge: Ever wondered "What would happen if we just merged all instructions datasets together into a single huge one?", this is basically the motivation behind the Flan dataset. - It is balanced (!!) which promotes the models trained on it to generalize better to arbitrary tasks down the line. Flexibility: - Zero-Shot vs Few-Shot: For many of the tasks you can fetch the same task either for Zero-Shot: No solved for demonstration or Few-Shot. - Chain of thought built in on some of the tasks. The "next step".. A small part of Flan had been augmented with additional explanations in the past. The result of this was the first model ever to rival ChatGPT on vicuna's benchmark. And again.. This was just a small part of Flan.. ---- [1] ai.googleblog.com/2023/02/the-fl… [2] arxiv.org/pdf/2301.13688… (* This paper is a must if you are building text datasets)
Yam Peleg tweet media
English
7
121
608
146.2K
CKtalon
CKtalon@CKtalon·
@Science_boy_H @huggingface I’m suspecting QLoRA or LoRA doesn’t help for adding/increasing a model’s second language capabilities
English
0
0
0
59
near
near@nearcyan·
arxiv papers are translated to chinese and posted in china immediately after publication, but there's very little information flow in the opposite direction
English
27
13
382
53.5K
Bruce X
Bruce X@_BruceX_·
@ID_AA_Carmack But this is twice the FLOPs for three times the cost, right?
English
1
0
1
1.7K
John Carmack
John Carmack@ID_AA_Carmack·
H100 GPUs are very fast! For those unfamiliar with GPU matrix multiplies, the jaggies in the graph relate to packing occupancy, and are not noise. You can’t just divide theoretical teraflops by your problem size and get accurate times.
John Carmack tweet media
English
23
57
758
242.6K
CKtalon
CKtalon@CKtalon·
@e270889o @ID_AA_Carmack Plenty of ram, but slow compute-wise. Apple’s CoreML is too opaque to developers, so the Neural Engine hasn’t been usable in an obvious way yet.
English
0
0
0
48
John Carmack
John Carmack@ID_AA_Carmack·
I’m a little surprised there isn’t more excitement around Nvidia’s 256 GPU unified memory NVLink clusters, but I have heard from a couple places that it is considered constraining. I wonder if the optics are that your cluster isn’t serious with less than a four figure GPU count.
English
30
12
418
144.3K
CKtalon
CKtalon@CKtalon·
@tmophoto @abacaj You split the layers across different cards. That’s why you need fast interconnects like NVLink so that the GPUs can process the computations quickly without bottlenecks.
English
0
0
0
35
tmo
tmo@tmophoto·
@CKtalon @abacaj How do you load models larger than the size of the vram in the card? That is the biggest limiting factor right now. Consumer gpus top out at 24 gb and we need 48-80+ gb to load a model
English
1
0
0
42
anton
anton@abacaj·
Stack up on GPUs
anton tweet media
English
14
14
131
27.5K