
Pingmurder
37.5K posts

Pingmurder
@pingmurder
AI Dark Mage, Architect of your doom, My X habit is my own problem.














I received the following email from CODEPINK. "We are writing to formally address and correct the false and defamatory statements made in your recent social media posts regarding CODEPINK. These claims—which falsely allege that our organization is funded by China, the Chinese Communist Party (CCP), or any foreign government or entity—are entirely baseless and constitute libel." So, I will share the facts without spin: Per Wikipedia, CODEPINK is 25% funded by Neville Singham, who is living in Shanghai and got rich off spreading CCP propaganda, and is under investigation by Congress for FARA violations. CODEPINK is running a campaign called "China is Not Our Enemy," which promotes pro-China messaging, including denial of the Uyghur genocide, an atrocity affirmed by the U.S. Department of State. The founder of CODEPINK, Jodie Evans, is married to the aforementioned Neville Singham. In fact, the Uyghur denialism goes so far that Jodie Evans said in a YouTube interview, that Uyghurs are terrorists trained in Yemen and Syria who bomb shopping centers.





gtx 1080 8gb of vram launched may 2016. card turns ten this month. just ran three current open weight agentic models on one and the smallest of them fit 656,000 tokens of context at 38 tok/s gen speed. on a pascal arch card with no tensor cores. on 8gb of gddr5x that the discourse keeps telling me is unusable. three models, same hardware, same locked flags. qwen3 8b, qwen 3.5 9b, gemma 4 e4b. q4_k_m quant across the board. q4_0 kv cache, flash attention on, llama.cpp built for sm_61. one line setup. results vram ceiling on 8gb qwen3 8b, 78k qwen 3.5 9b, 248k gemma 4 e4b, 656k gen tok/s at small context qwen3 8b, 31.71 qwen 3.5 9b, 29.91 gemma 4 e4b, 42.13 gen tok/s at the ceiling qwen3 8b, 31.78 at 77k qwen 3.5 9b, 29.62 at 248k gemma 4 e4b, 38.73 at 648k agent workload combined throughput at 16k input qwen3 8b, 285.98 qwen 3.5 9b, 413.18 gemma 4 e4b, 543.63 gemma sweeps every category. 2.6x more context than qwen 3.5 9b, 8.4x more than qwen3 8b, 30% faster at the ceiling. sliding window attention keeps the kv cache nearly flat as context grows, which is why 8gb stretches an order of magnitude further on gemma than on a vanilla transformer. the part that gets me is qwen3 8b losing to qwen 3.5 9b at anything past 4k context. newer release, but heavier kv per token, less aggressive gqa, every release has tradeoffs and pascal exposes them by giving the architecture nowhere to hide. q4_0 kv cache is the practical unlock. flash attention on pascal still works in 2026, no special path needed. sm_61 compiles clean in llama.cpp. that's the entire stack. a card you literally might have in a drawer can run a coding agent with 600k+ tokens of context. raw perf is one axis. next drop is the other one. agentic coding on the same hardware. single file canvas demos, then multi file refactors. can these models finish a task without rails or do they fall apart the moment the agent loop gets deep. stay tuned. you might have this card in a drawer.






If anyone sees this person stealing @spencerpratt signs in #ShermanOaks, please get her license plate number and send it to me along with photos, video et al. She was seen doing this on or near Longridge. @lapdVanNuysDiv @LAPDVTD @ShermanOakPatch














