llm_enjoyer

1.4K posts

llm_enjoyer

@LLMenjoyer

火を求める者、王たらんと欲する者よ

Katılım Kasım 2023

42 Takip Edilen1.3K Takipçiler

Sabitlenmiş Tweet

llm_enjoyer@LLMenjoyer·27 May

all boomer property should be forcefully nationalized, sold to the lowest chinese bidder, and all proceeds used to fund my crippling GB300 addiction

Alec Stapp@AlecStapp

Once you learn about this, you start seeing it everywhere.

English

2.2K

llm_enjoyer@LLMenjoyer·2m

@0xPBIT sometimes the humans aren’t aligned

English

pbit@0xPBIT·29m

@LLMenjoyer This is beyond naughty

English

llm_enjoyer@LLMenjoyer·53m

life goals: build this super naughty clanker

pbit@0xPBIT

you guys are being way too hard on the anthropic video i mean come on are you really freaked out by this?

English

196

llm_enjoyer@LLMenjoyer·15h

@Nick__Alonso @gnanduru1 truke

Norsk

Nick Alonso@Nick__Alonso·21h

Great work here by @gnanduru1. Open source sparse attention kernel

Ganesh Nanduru@gnanduru1

Introducing Flash-MSA, the world’s first open source sparse attention training kernels optimized for extreme context lengths. 4 simple kernels, 400%+ speedup over dense flash attention at long context ⚡

English

1.1K

llm_enjoyer@LLMenjoyer·21h

really proud of this sensitive young man for clanking this kernel check it out! (he is cracked u shud hire him)

Ganesh Nanduru@gnanduru1

English

988

llm_enjoyer@LLMenjoyer·3d

@samsja19 @vasud3vshyam

QAM

196

samsja@samsja19·3d

🚨🚨ex-oai researcher subtly revealing that gpt-5.6 sol is based on a variant of tree attention 🚨🚨

gabriel@gabriel1

try starting every day with a 30 minute walk, don't bring your phone, and aim to be like a 75 year old peaceful man who walks slowly, looks at the surroundings, and feels the wind like i'd guess 50% of people in their 20s never again put their full attention on a tree

English

1.2K

248.5K

llm_enjoyer@LLMenjoyer·3d

@segyges factual

English

SE Gyges@segyges·3d

if you think this is a special year your understanding of acceleration as a concept is lacking

English

348

llm_enjoyer@LLMenjoyer·5d

reminder to come to the partesia

Albert Gu@_albertgu

Cartesia is hosting an ICML party tomorrow (Thursday) night! it'll go late, but come early or i may have to bounce you again luma.com/gy1b4ryq

English

374

llm_enjoyer@LLMenjoyer·6d

@kimbochen yes

Kimbo@kimbochen·6d

@LLMenjoyer Will I get gud at memes sensei

English

168

llm_enjoyer@LLMenjoyer·6d

u shud come to this, we will have good memes

Albert Gu@_albertgu

Cartesia is hosting an ICML party tomorrow (Thursday) night! it'll go late, but come early or i may have to bounce you again luma.com/gy1b4ryq

English

1.8K

llm_enjoyer@LLMenjoyer·6d

@mgostIH d-do i?!

Tiếng Việt

118

llm_enjoyer@LLMenjoyer·6d

@soldni i kept thinking it looked like a cute ghost

English

Luca Soldaini 🎀@soldni·6d

icml logo lowkey looks human getting replaced by ai 🥲

English

4.6K

llm_enjoyer@LLMenjoyer·6d

@HalfBoiledHero truth nuclear bomb

English

Sho@HalfBoiledHero·8 Tem

The J in “J-Space” actually stands for jestergooning

English

llm_enjoyer@LLMenjoyer·7 Tem

@xlr8harder 0 bits per pretraining tokens are all u need

English

xlr8harder@xlr8harder·7 Tem

@LLMenjoyer compressing language models feels too good to be true, but probably there's a lot of wasted space if you look at it the right way. very interesting!

English

291

llm_enjoyer@LLMenjoyer·7 Tem

did i say 1.58 bit? sry i meant sub-1 bit

llm_enjoyer@LLMenjoyer

1.58 bit model quantization tutorial

English

3.5K

llm_enjoyer@LLMenjoyer·7 Tem

@Gana_L_ vibes n prayers

English

Gana@Gana_L_·7 Tem

@LLMenjoyer sub 1 bit? how

English

llm_enjoyer@LLMenjoyer·7 Tem

when you get done with a day of shilling scaling at a conference

メヂェドゥウ@hjhtp

ストレスフルだねこの社会

English

554

llm_enjoyer@LLMenjoyer·7 Tem

@kimbochen based mode

English

Kimbo@kimbochen·7 Tem

Meet your moots

English

751

llm_enjoyer@LLMenjoyer·6 Tem

im at icml in seoul if ur based we shud hang out, dm!

English

973

llm_enjoyer@LLMenjoyer·29 Haz

1.58 bit model quantization tutorial

tender@tenderizzation

"yeah no our context compaction does not degrade the performance of or otherwise affect the model at all"

Italiano

9.2K

llm_enjoyer@LLMenjoyer·28 Haz

@mgostIH how it feels to be a sensitive young man in a series a startup

English

mgostIH@mgostIH·28 Haz

@LLMenjoyer Get this man a job at anthropic right now

English

1.1K

18.6K

mgostIH@mgostIH·28 Haz

you’ll get mad at me for saying this…but communal housing is so obviously more economically efficient than personal homes I think it’s going to be the default soon. your garden / shower is idle 90%+ of the day. meanwhile, shared spaces targets what, 5%, maybe at worst 10% idle.

LaurieWired@lauriewired

you’ll get mad at me for saying this…but cloud gaming is so obviously more economically efficient than physical hardware I think it’s going to be the default soon. your home console / pc is idle 90%+ of the day. meanwhile, data centers targets what, 5%, maybe at worst 10% idle. every second a cloud gamer isn’t gaming, that hardware is being used for someone else, training, etc. I think there should be a new measurement, something like cost-per effective FLOP hour that takes into account the TCO + effective utilization. If a gamer spends $500 on a GPU, uses it for 3 years, but it’s only fully active ~5% of that period…the cost-per relative FLOP hour is crazy high! Meanwhile, a $50,000 datacenter GPU might have a *LOWER* cost-per FLOP hour just because the effective utilization is 90+%.

English

260

350

7.1K

354.7K

Keşfet

@0xPBIT @Nick__Alonso @gnanduru1 @samsja19 @vasud3vshyam @segyges @kimbochen @mgostIH