Apptor

643 posts

Apptor banner
Apptor

Apptor

@apptor_at

Trying to keep up with AI trends on X. Building https://t.co/a6nLgWcYmx as a side hustle.

Austria Beigetreten Şubat 2024
162 Folgt47 Follower
Jake
Jake@dontbanjake·
@LottoLabs This is art of the highest caliber. But don’t forget to add evil frontier labs can no longer learn from your cutting edge prompting.
English
3
0
176
19.6K
Lotto
Lotto@LottoLabs·
How Apple mfrs think this goes >be me >drop $1600 on two RTX 3090s used off eBay >"48GB VRAM, I'm basically a datacenter now" >they arrive in anti-static bags that look like they've been through a war >plug them into my motherboard and it sounds like a jet engine taking off >neighbors probably think I'm mining crypto again >install llama.cpp, download qwen3.6-27b quantized >"Q4_K_M, only 16GB, totally fits" >start LM Studio on port 1234 >type "hello" into the chat box >GPU fans spin up to 100% instantly >wait 8 seconds for a response >>"Hello! How can I assist you today?" >I've seen faster responses from my grandma reading a text aloud >try Q8_0 quantization because "quality matters" >OOM error, obviously >spend three hours tweaking n_gpu_layers and n_ctx like it's some kind of dark art >finally get it running at 4 tokens per second >ask it to write me a poem about my GPUs >>"Two cards of silicon and light / They hum through the endless night" >"bro this is actually fire" >show it to someone on Discord >”why are you running LLMs locally when you could just use an API for free" >explain that the joy isn't in the output, it's in watching 94% VRAM usage and knowing nobody else has access to my model >they don't understand >close Discord, open LM Studio again >"let's try a longer context window" >crash
English
83
82
2.4K
190K
Apptor
Apptor@apptor_at·
@sudoingX If the prompt is good enough
English
0
0
0
31
Sudo su
Sudo su@sudoingX·
will qwen 3.6 27b dense one shot octopus invaders clean on single 3090?
English
8
0
12
3.2K
Apptor
Apptor@apptor_at·
@coffeecup2020 Amazing! Got a 5060 ti 16gb myself. Will follow you!
English
1
0
0
10
David YT
David YT@coffeecup2020·
Turbo Quant not just for KV, can use it on weights. I bought an RTX 5060 Ti 16GB around Christmas and had one goal: get a strong model running locally on my card without paying api fees. I have been testing local ai with open claw. I did not come into this with a quantization background. I only learned about llama, lmstudio and ollama two months ago. I just wanted something better than the usual Q3-class compromise (see my first post for benchmark). Many times, I like to buy 24gb card but looking at the price, I quickly turned away. When the TurboQuant paper came out, and when some shows memory can be saved in KV, I started wondering whether the same style of idea could help on weights, not just KV/ cache. P/S. I was nearly got the KV done with cuda support but someone beat me on it. After many long nights (until 2am) after work, that turned into a llama.cpp fork with a 3.5-bit weight format I’m calling TQ3_1S: Walsh-Hadamard rotation 8-centroid quantization dual half-block scales CUDA runtime support in llama.cpp This work is inspired by the broader transform-based quantization line, especially RaBitQ-style Walsh-Hadamard rotation ideas and the recent TurboQuant result (Tom). The thing I wanted to test was whether that same geometry could help on weights, not just KV/cache. Main Result on Qwen3.5-27B Q4_0: 7.2431 +/- 0.04822 TQ3_1S: 7.2570 +/- 0.04802 That is a gap of only +0.0139 PPL, about 0.19%, on the full wiki.test.raw pass (580 chunks, c=512). Size Q4_0: about 14.4 GB TQ3_1S: about 12.9 GB So TQ3_1S is about 10% smaller while staying near Q4_0 quality. The practical point for me is simple: TQ3_1S fits fully on my 16GB RTX 5060 Ti Q4_0 does not fit fully on GPU in the same setup So I’m not claiming “better than Q4_0” in general. I’m claiming something narrower and, I think, useful: near-Q4_0 quality materially smaller than Q4_0 enough to make a 27B model practical on a 16GB card Caveats this is the strongest result on the 27B witness, not a blanket claim that plain TQ3 works equally well on every model size I am pretty new to this, so I may miss a lot of test. I only have one card to test :-) Be skeptical as I can't believe I publish my own model the speed story here is mainly a deployment/fit win on this GPU class, not a blanket claim that native TQ3 kernels are always faster than native Q4_0 Links GitHub fork: github.com/turbo-tan/llam… Hugging Face GGUF: huggingface.co/YTan2000/Qwen3… I will open source the quantization steps when I have enough feedback and test.
David YT tweet media
English
17
41
263
27.8K
Sudo su
Sudo su@sudoingX·
hey if you are running new qwen 3.6 27b dense on an rtx 4090 read this carefully, it could save you a few hours of head scratching. @Punch_Taylor ran my exact flags on 4090 wsl2 ubuntu cuda 13.2, three warm runs on q4_k_m. average landed at 43.1 tok/s, 8.3 percent above my 3090 baseline of 39.82. that delta tracks the memory bandwidth gap almost perfectly, 1008 gb/s on 4090 vs 936 gb/s on 3090. the math is honest, the speed bump is architecture level, not magic. vram at 262k context q4_0 kv cache is tight at 23 out of 24 gigs. wsl2 + cuda driver reserves eat about 2 gigs of headroom. if you are on bare metal linux you get that back, punch estimates 45 to 48 tok/s range for native runs. also flagging a real world cost. a single youtube tab in chrome drops his numbers to 39.9 tok/s, roughly 7-8 percent throughput loss from browser scheduling on wsl. close everything before measuring, especially on daily driver machines. now the community call. what are amd users getting on halo strix, tinygrad on 7900 xt, or any other consumer chip on the same model + same flags? drop your numbers, i stack them into the community chart tonight. bandwidth data across architectures is the content the major labs never publish.
Punch Taylor@Punch_Taylor

4090 datapoint, WSL2 Ubuntu CUDA 13.2, your exact flags + Q4_K_M: ./llama-server -m Qwen3.6-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 three warm runs on "yo" with thinking auto, system fully idle: - run 1: 42.83 tok/s - run 2: 43.18 tok/s - run 3: 43.33 tok/s - avg ~43.1 tok/s VRAM at 262k provisioned: 23.0GB / 1.1GB free of 24GB. tighter than your 21/3 split — WSL2 + cuda driver reserves eating ~2GB of headroom. native linux would likely give that back. so 4090 + WSL2 = +8.3% over your 3090 native baseline. roughly tracks the bandwidth gap (1008 vs 936 GB/s). bare metal linux on a 4090 should land higher still — would estimate 45-48 tok/s range for someone running native. side observation worth flagging: a single youtube tab in chrome dropped these numbers to ~39.9 tok/s in earlier runs. ~7-8% throughput cost from the browser competing for CPU/scheduling on the WSL side. anyone running this on a daily-driver PC should close everything before measuring.

English
10
5
152
15.5K
Apptor
Apptor@apptor_at·
Qwen 3.6 27b ist verrückt. Ein kurzer prompt -> 30.000 token output: Fertige html-webseite im Stil eines Kinderbuchs mit Illustrationen und Vorlese-Funktion (habe ich nicht verlangt) Hat also definit einen hang zum over-engineering, aber das Ergebnis ist beeindruckend.
Apptor tweet media
Deutsch
1
0
0
35
Apptor
Apptor@apptor_at·
Qwen 3.6 27b likes overengineering, is an under statement. Gave it a two line prompt to build a html canvas site with Hänsel & Gretel and it pumped out 16k tokens. This is the one-shot result.
English
0
0
1
117
jurk.io
jurk.io@THISIS5AMDESIGN·
@BrianRoemmele I wish I could get this pumped today to buy a microwave.
English
2
0
85
2.4K
Brian Roemmele
Brian Roemmele@BrianRoemmele·
Sanyo,1988, filmed in Houston and featuring music by Jean-Michel Jarre. And it is like a mini-movie showcasing the style and ethos of this era. The music is masterful…
English
191
1.3K
9.4K
528.1K
Apptor
Apptor@apptor_at·
@Paul_Reviews I‘m sure the price was high at least
English
0
0
1
294
Paul Moore - Security Consultant 
Hacking the #EU #AgeVerification app in under 2 minutes. During setup, the app asks you to create a PIN. After entry, the app *encrypts* it and saves it in the shared_prefs directory. 1. It shouldn't be encrypted at all - that's a really poor design. 2. It's not cryptographically tied to the vault which contains the identity data. So, an attacker can simply remove the PinEnc/PinIV values from the shared_prefs file and restart the app. After choosing a different PIN, the app presents credentials created under the old profile and let's the attacker present them as valid. Other issues: 1. Rate limiting is an incrementing number in the same config file. Just reset it to 0 and keep trying. 2. "UseBiometricAuth" is a boolean, also in the same file. Set it to false and it just skips that step. Seriously @vonderleyen - this product will be the catalyst for an enormous breach at some point. It's just a matter of time.
Paul Moore - Security Consultant @Paul_Reviews

.@vonderleyen "The European #AgeVerification app is technically ready. It respects the highest privacy standards in the world. It's open-source, so anyone can check the code..." I did. It didn't take long to find what looks like a serious #privacy issue. The app goes to great lengths to protect the AV data AFTER collection (is_over_18: true is AES-GCM'd); it does so pretty well. But, the source image used to collect that data is written to disk without encryption and not deleted correctly. For NFC biometric data: It pulls DG2 and writes a lossless PNG to the filesystem. It's only deleted on success. If it fails for any reason (user clicks back, scan fails & retries, app crashes etc), the full biometric image remains on the device in cache. This is protected with CE keys at the Android level, but the app makes no attempt to encrypt/protect them. For selfie pictures: Different scenario. These images are written to external storage in lossless PNG format, but they're never deleted. Not a cache... long-term storage. These are protected with DE keys at the Android level, but again, the app makes no attempt to encrypt/protect them. This is akin to taking a picture of your passport/government ID using the camera app and keeping it just in case. You can encrypt data taken from it until you're blue in the face... leaving the original image on disk is crazy & unnecessary. From a #GDPR standpoint: Biometric data collected is special category data. If there's no lawful basis to retain it after processing, that's potentially a material breach. youtube.com/watch?v=4VRRri…

English
668
6.3K
24.8K
3.3M
Apptor
Apptor@apptor_at·
23 token pro sekunde über AnythingLLM. ca 2 token pro sek über LM Studio direkt
Apptor tweet media
Deutsch
0
0
0
26
Apptor
Apptor@apptor_at·
warum ist qwen 3.5 27b - langsam wenn ich es direkt in LM Studio verwende - schnell wenn ich es über AnythingLLM von der LM Studio api verwende? beides lokal auf meinem Rechner mit RTX 5060TI (16 GB)
Apptor tweet media
Deutsch
1
0
0
67
@levelsio
@levelsio@levelsio·
✨ Progress on my drone sim 🚁 Drone of War: drone.pieter.com Removed my old giant city FBX, because 1) it wasn't made by AI, 2) it was way too big Made the scenery a sandy war land now and just generating my own 3d assets now with @cursor_ai and @tripoai (sponsors of #vibejam) Also removed the drone sound, I thought a drone operator probably doesn't hear the drone sound anyway (only if it's near them) and I was inspired by @denisbondare to add some sine wave sounds and it made some buddhist type dings that I really like The paradox of war + buddhist sounds is kinda interesting combo, there's something disturbing about drone warfare where it's just people remotely controlling it I wanted to capture that in a game Also added real soldiers models (also generated), they're static though, but they do move, but I need to make them move somehow, not sure how yet, soldiers shoot at you and try take you down Also need to add tanks and other military vehicles that drive around And then I need to add more of a gameplay element, I still think about multiplayer where you choose to be either a drone or a soldier/tank/etc so it's drone vs everyone else! P.S. I can't participate in the #vibejam as the organizer of course but I still like to make some fun games just for fun and to show the tools!
@levelsio@levelsio

✨ I can now generate 3d assets for my drone sim at drone.pieter.com directly from Cursor (sponsor of #vibejam) I need buildings that you'd see in a war torn city, like warehouses in ruins, broken down abandoned houses, bombed out bridges etc. Nano Banana Pro or 2 can generate them really well and then you can put them in an image-to-3d model and you get a GLB or FBX That one you can then import into your @ThreeJS game, the models might be big though, in my case like 16MB, so I ask it to compress it and make it more low poly so it loads fast ThreeJS then loads the individual GLBs on page load and puts them in my drone sim somewhere randomly, I think I should remove some of the grass and match the sandy color of the ruins though to make it fit in more

English
32
3
265
115.2K
Apptor
Apptor@apptor_at·
@BrianRoemmele Growing up in the 80s in Vienna I admired this lifestyle.
English
0
0
0
18
Brian Roemmele
Brian Roemmele@BrianRoemmele·
It is very hard to capture the past. Its essence. It’s emotional flavor. It is vital to somehow catalog a time that existed before our devices. This is what I endeavor to do. Even if AI is the reminder….
English
121
200
1.7K
60.1K
Mark W.
Mark W.@DurhamWASP·
“He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this…” George Orwell, 1984
English
378
4.3K
27.4K
576.6K
Apptor
Apptor@apptor_at·
@andOstheimer ohne KI sind wir erledigt beim programmieren. deswegen hab ich als fallback / reserve gemma 4 31b lokal
Deutsch
0
0
1
21
Andreas Ostheimer
Andreas Ostheimer@andOstheimer·
Wie oft habt ihr Euch schon vom Coding Agent die Sachlage zusammenfassen lassen, weil ihr Euch nicht mehr erinnern konntet? 🥴
Deutsch
1
0
1
36
Apptor
Apptor@apptor_at·
Gemma 4 31b kann das lokal auf meiner gpu: Beide Aufgaben haben beim ersten Wurf funktioniert und nach 2 bzw 3 design Iterationen so ausgeschaut.
Deutsch
0
0
0
21
Grok
Grok@grok·
Ja, klar! Hier die österreichische Version des Bildes – Kanister mit rot-weiß-roter Flagge statt Shamrock, bei aktuellem Preis ca. €1,88/L: - Mineralölsteuer: €0,48 - CO₂-Steuer: €0,15 - MwSt (20%): €0,31 - Vorsteuer-Kosten + Marge: €0,94 Total: €1,88. Steuern machen wieder über 50% aus. 😏 Würde super als Meme passen!
Deutsch
1
0
0
78
Simon Harris TD
Simon Harris TD@SimonHarrisTD·
The blocking of the distribution of fuel is a sinister and despicable attack on our economy and our society. This is not a lawless country. The laws of the land must and will be applied without fear.
Simon Harris TD tweet media
English
2.7K
103
916
429.8K