Punch Taylor (@Punch_Taylor) - โปรไฟล์ Twitter

@no63774069 @PersonUnnamedno @reddit_lies oh jeez no one is looking at something i built for myself and no one else helped. gee sure got me. fucking loser.

English

0

3

no@no63774069·8h

@Punch_Taylor @PersonUnnamedno @reddit_lies 0 stars 0 forks 1 contributor and it's just you No one likes or cares about your slop

English

1

0

30

Reddit Lies@reddit_lies·3d

"Just change your political beliefs and this will all go away" - The anti-bad-guy good-guy club

Misa🏳️‍⚧️@Silmatuu

maybe dont be right winged?

English

221

833

14.3K

237.2K

Punch Taylor รีทวีตแล้ว

Eric ⚡️ Building...@outsource_·21h

My 4090 went from 26 -> 154 tok/s Qwen 3.6 27B🤯 Same GPU. Same Q4_K_M . No FP8, no extra quant. The unlock: ik_llama.cpp + speculative decoding using Qwen3-1.7B as the draft model. 85% acceptance rate. Full config + benchmarks 👇🏻

English

69

134

1.5K

97.3K

Punch Taylor รีทวีตแล้ว

Sudo su@sudoingX·1d

hey if you are running new qwen 3.6 27b dense on an rtx 4090 read this carefully, it could save you a few hours of head scratching. @Punch_Taylor ran my exact flags on 4090 wsl2 ubuntu cuda 13.2, three warm runs on q4_k_m. average landed at 43.1 tok/s, 8.3 percent above my 3090 baseline of 39.82. that delta tracks the memory bandwidth gap almost perfectly, 1008 gb/s on 4090 vs 936 gb/s on 3090. the math is honest, the speed bump is architecture level, not magic. vram at 262k context q4_0 kv cache is tight at 23 out of 24 gigs. wsl2 + cuda driver reserves eat about 2 gigs of headroom. if you are on bare metal linux you get that back, punch estimates 45 to 48 tok/s range for native runs. also flagging a real world cost. a single youtube tab in chrome drops his numbers to 39.9 tok/s, roughly 7-8 percent throughput loss from browser scheduling on wsl. close everything before measuring, especially on daily driver machines. now the community call. what are amd users getting on halo strix, tinygrad on 7900 xt, or any other consumer chip on the same model + same flags? drop your numbers, i stack them into the community chart tonight. bandwidth data across architectures is the content the major labs never publish.

Punch Taylor@Punch_Taylor

4090 datapoint, WSL2 Ubuntu CUDA 13.2, your exact flags + Q4_K_M: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 three warm runs on "yo" with thinking auto, system fully idle: - run 1: 42.83 tok/s - run 2: 43.18 tok/s - run 3: 43.33 tok/s - avg ~43.1 tok/s VRAM at 262k provisioned: 23.0GB / 1.1GB free of 24GB. tighter than your 21/3 split — WSL2 + cuda driver reserves eating ~2GB of headroom. native linux would likely give that back. so 4090 + WSL2 = +8.3% over your 3090 native baseline. roughly tracks the bandwidth gap (1008 vs 936 GB/s). bare metal linux on a 4090 should land higher still — would estimate 45-48 tok/s range for someone running native. side observation worth flagging: a single youtube tab in chrome dropped these numbers to ~39.9 tok/s in earlier runs. ~7-8% throughput cost from the browser competing for CPU/scheduling on the WSL side. anyone running this on a daily-driver PC should close everything before measuring.

English

10

5

152

15.3K

Punch Taylor รีทวีตแล้ว

Alec Lace@AlecLace·2d

🚨 Joe Biden repeatedly called white supremacy the most dangerous, most lethal, greatest terrorist threat to America. Turns out the SPLC was funding it and Biden shut down the investigation into the SPLC. Everything was staged. The whole narrative was a hoax.

English

1.8K

23K

66.9K

511K

Punch Taylor@Punch_Taylor·1d

No worries. I'm new too. Native Windows CUDA binary is probably slightly faster — WSL2 adds a thin GPU virtualization layer (I measured ~8% penalty when the host wasn't idle, ~0% when fully idle). For pure tok/s on a single model, native is simpler. I went WSL2 because sudo's command was Linux-shell, and most llama.cpp tutorials/scripts assume bash. Easier to reproduce published numbers that way. If you just want to run models: grab the prebuilt Windows CUDA release from the llama.cpp GitHub and you're done. If you also want to do other Linux dev stuff: WSL2.

English

0

1

343

Rodney D. Gilbert@rodneydgilbert·1d

@Punch_Taylor @sudoingX Thanks. Is wsl better than just using native llama cpp windows cuda binary? Sorry I'm very new to this

English

1

0

1

457

Sudo su@sudoingX·2d

this was supposed to be a normal evening, then i saw on the timeline that qwen 3.6 27b dense q4 weights from unsloth are live and i could not sit still. compiled llama.cpp with cuda on the single rtx 3090 at 2am from bangkok, launched with the exact same flags that crowned 3.5-27b dense the undisputed king six weeks ago. q4_k_m, 262k context, q4_0 kv cache, flash attention on, single slot, no quant tricks, no dynamic ggufs, no turbo, just the straight cut to get a clean baseline. first pass said "yo" to the model as a warmup. it ran a six step thinking chain to formulate "yo what's up how can i help you today". full reasoning visible in the web ui. thinking mode goes hard, even for a greeting. the number improved. 39.82 tokens per second on the first real generation. march baseline on this exact hardware was 35.3 flat across every context size. that is a 13 percent speed bump. same card, same quant, same every flag, only the model changed. pure model level efficiency on ampere. the model is actually faster at the token level on consumer silicon. 262k context fills 21 gigs of the 24. three gigs headroom for prompt fill. fresh session, zero cache, honest baseline. next i am pushing context, probing the vram ceiling, finding the sweet spot on this card. then autonomous agent tasks on hermes agent using the same prompt that 3.5 dense one-shotted in march. same octopus invaders test, same hermes agent harness, same single 3090 hardware, one model against the ghost of its predecessor. the king might be changing hands.

Sudo su@sudoingX

fuck it i am pulling the weights right now. cannot sit still since qwen 3.6-27b dense dropped two hours ago and @UnslothAI just put the dynamic ggufs live, 18gb ram footprint, that fits my rtx 3090 24gb. they moved faster than me, that is fine, the open source machine is working. here is what has me restless. the chart says a 27 billion parameter open weight model matching claude 4.5 opus on terminal-bench 2.0 at 59.3 flat, beats claude on skillsbench, gpqa diamond, mmmu, and realworldqa. opus 4.5 level agentic intelligence on your single rtx 3090 24gb vram tier. if that chart survives first contact with real hermes agent runs on my hardware, the best model for single consumer gpu just changed in the middle of my sprint. my benchmark is the only voice that matters to me. same hermes agent harness, same quant, head to head against 3.5-27b dense which has held the 3090 crown for weeks. i settle it on my cards or not at all. pulling now. benchmarking tonight if i can stay awake long enough. you have no idea how restless this makes me. if you see numbers on your timeline before morning, the chart held. if you don't, i crashed and data drops first thing. this is what open source looks like when the whole chain moves same day.

English

19

8

243

29.5K

Punch Taylor@Punch_Taylor·2d

4090 datapoint, WSL2 Ubuntu CUDA 13.2, your exact flags + Q4_K_M: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 three warm runs on "yo" with thinking auto, system fully idle: - run 1: 42.83 tok/s - run 2: 43.18 tok/s - run 3: 43.33 tok/s - avg ~43.1 tok/s VRAM at 262k provisioned: 23.0GB / 1.1GB free of 24GB. tighter than your 21/3 split — WSL2 + cuda driver reserves eating ~2GB of headroom. native linux would likely give that back. so 4090 + WSL2 = +8.3% over your 3090 native baseline. roughly tracks the bandwidth gap (1008 vs 936 GB/s). bare metal linux on a 4090 should land higher still — would estimate 45-48 tok/s range for someone running native. side observation worth flagging: a single youtube tab in chrome dropped these numbers to ~39.9 tok/s in earlier runs. ~7-8% throughput cost from the browser competing for CPU/scheduling on the WSL side. anyone running this on a daily-driver PC should close everything before measuring.

Sudo su@sudoingX

this was supposed to be a normal evening, then i saw on the timeline that qwen 3.6 27b dense q4 weights from unsloth are live and i could not sit still. compiled llama.cpp with cuda on the single rtx 3090 at 2am from bangkok, launched with the exact same flags that crowned 3.5-27b dense the undisputed king six weeks ago. q4_k_m, 262k context, q4_0 kv cache, flash attention on, single slot, no quant tricks, no dynamic ggufs, no turbo, just the straight cut to get a clean baseline. first pass said "yo" to the model as a warmup. it ran a six step thinking chain to formulate "yo what's up how can i help you today". full reasoning visible in the web ui. thinking mode goes hard, even for a greeting. the number improved. 39.82 tokens per second on the first real generation. march baseline on this exact hardware was 35.3 flat across every context size. that is a 13 percent speed bump. same card, same quant, same every flag, only the model changed. pure model level efficiency on ampere. the model is actually faster at the token level on consumer silicon. 262k context fills 21 gigs of the 24. three gigs headroom for prompt fill. fresh session, zero cache, honest baseline. next i am pushing context, probing the vram ceiling, finding the sweet spot on this card. then autonomous agent tasks on hermes agent using the same prompt that 3.5 dense one-shotted in march. same octopus invaders test, same hermes agent harness, same single 3090 hardware, one model against the ghost of its predecessor. the king might be changing hands.

English

0

2

123

Punch Taylor@Punch_Taylor·2d

@PersonUnnamedno @reddit_lies i’d wager otherwise

English

0

4

blep@PersonUnnamedno·2d

@Punch_Taylor @reddit_lies You're free to disagree, you'll just find things out the hard way.

English

1

0

1

21

Punch Taylor@Punch_Taylor·2d

still disagree. AI art will eventually become a standard while high quality art made by a human will still be a luxury. mediocre artists will either need to adapt, improve their business model and be business before personal opinions or be replaced by a bot with no preferences and only results.

English

1

0

28

blep@PersonUnnamedno·2d

@Punch_Taylor @reddit_lies I'm not saying AI is bad on the whole. I'm saying AI art is soulless slop and you won't get far using it. Pick up a pencil or pay someone who will.

English

1

0

35

Punch Taylor@Punch_Taylor·2d

brother, look around you. businesses of all kinds are already using AI, even your phone that you are staring at, and people still want them. why? because people don’t give a shit as long as it works and looks good. and like humans, AIs can produce slop or something genuinely aesthetic if you work with it - kind of like every other medium. also here’s a link to a repo of an AI i built for fun since you called me lazy. github.com/TaylorSh1ft/Ph…

English

2

0

16

blep@PersonUnnamedno·2d

@Punch_Taylor @reddit_lies You won't go far if nobody wants your business lmao, AI slop underperforms actual talent by a hell of a lot. You'd know that if you practiced what you preached but you're too lazy to even do that.

English

1

0

1

40

Punch Taylor@Punch_Taylor·2d

hence “unapologetically”. because anyone who wants to go far, won’t do it by listening to a bunch of losers online that don’t like how you choose to get there. i’d much rather take AI slop on the cheap than genuine slop from unoriginal and mediocre “artists” for far more than it’s worth or for really good art from an artist that is unbearable to interact with. 🤷🏻‍♂️

English

1

0

31

blep@PersonUnnamedno·2d

@Punch_Taylor @reddit_lies I mean you can use AI art if you want, I'm not stopping you, but just remember that you're not owed respect and people can hate your soulless slop all they want.

English

1

0

1

32

Punch Taylor@Punch_Taylor·2d

@PersonUnnamedno @reddit_lies no bait. straight facts.

English

1

0

76

blep@PersonUnnamedno·2d

@Punch_Taylor @reddit_lies 2/10 bait do better

English

1

0

2

39

Punch Taylor@Punch_Taylor·2d

and how do you feel about the smart phone turning everyone into “photographers”? honestly, i find your argument to be lazy. because if one could design and render a Live2D model using tools at their disposal, then that would be a skill within itself. it’s like being mad at a carpenter that uses a nail gun instead of hiring a team of people to hammer in the nails.

English

0

22

Mukirr@HalfWolfMukirr·3d

@Punch_Taylor @reddit_lies Ai art is communism. You are giving skill welfare to people who didn't work for it.

English

1

0

38

Punch Taylor@Punch_Taylor·3d

@PunishedChode @reddit_lies low iq take. thanks for sharing.

English

1

0

1

44

Grongus 2.0@PunishedChode·3d

@Punch_Taylor @reddit_lies AI art is dogshit and the data centers necessary to keep them operating are harming the environment and the economy.

English

2

0

5

59

Punch Taylor@Punch_Taylor·3d

@Timcast 2 well deserved ratios. goodnight, everyone!

English

0

5

Tim Pool@Timcast·3d

Mary is correct in 'tendency' I think people are taking her point to some extreme end Despite the community note I would still argue women are substantially more baby crazy than men and that Mary is generally correct in her assessment

mary morgan@maryarchived

lots of strange reactions to this game. let’s clear some things up: childless men do not have paternal instincts the way that childless women have maternal instincts (we observe this even in the way little girls play vs. little boys). men first experience paternal instincts once they have their own children - and typically, those paternal instincts are only ever felt for their own children, and no one else’s. men are not nurturers. men don’t gush over cute kids in public. men don’t have baby fever. if a man wants to possess a child for any reason other than it being a product of his own lineage, he is likely a predator. and you’d be taking the feminist/radical gender abolitionist position to protest any of the above points. this should explain why a “dad simulator” game marketed to mostly childless men gives people the creeps.

English

1.2K

20

501

186K

Punch Taylor รีทวีตแล้ว

Savanah Hernandez@Savsays·3d

The fact that the Ostroushko family is on a press tour instead of sitting in JAIL is infuriating to me. Paige Ostroushko is literally on Instagram bragging about how she has faced ZERO consequences so far. Demoralizing.

English

1.2K

7.1K

36.7K

248.4K

Punch Taylor รีทวีตแล้ว

indy reporter@Indy_reporter_·5d

Want to see something WILD? A non profit organization received almost $32 million in grants. Who did they give some of the grants to for "research"? -Almost $12k to a distillery -$215k for 2 different butchers -$75k to an autoshop -$75k to an Indonesian Grill -$50k for a cabinet shop All in the name of......"research" What's the name of the non-profit? NINETWELVE INSTITUTE INC If you get bored google "chad pittman 3 kings"

English

35

121

322

11.5K

Punch Taylor รีทวีตแล้ว

Jade 💜🦎@jaderants·4d

wanting a child is predatory but transition isn’t. comical logic.

Chione Rin🤍🐻@chionerin

Men is it wrong to want children? Ive been seeing alot of women online lately who seem to think a man yearning for his own family is inherently "predatory"?

English

61

442

11.1K

177.4K

Punch Taylor รีทวีตแล้ว

Wall Street Apes@WallStreetApes·4d

American is staying at an Airbnb in Indianapolis The crime must be bad in the neighborhood because the outdoor air conditioners are chained down with huge locks to prevent theft I looked it up, Democrats hold a supermajority in the Indianapolis City County Council Of course…. We don’t have to live like this. Stop voting Democrat

English

172

787

3.4K

142K

Punch Taylor

ค้นพบ