Blak

691 posts

Blak

@blako87

pasionate about Technology , HVAC & Coding , Optimizing comfort with smart engineering

Austria Katılım Temmuz 2023

202 Takip Edilen50 Takipçiler

Blak@blako87·6h

@stevibe will be qwen3.7 opensource ?i mean open weights?

English

stevibe@stevibe·8h

Wow Qwen3.7 is coming

Qwen@Alibaba_Qwen

🚀🚀Qwen3.7 Preview lands on Arena ！ Here come Qwen3.7-Max-Preview & Qwen3.7-Plus-Preview. Alibaba now #6 lab in Text, #5 in Vision.⚡️⚡️ Can't wait to release Qwen3.7 series models！Stay tuned! @arena

Filipino

213

9.5K

Blak@blako87·6h

@0xSero yeah for this theory you need a new architecture ! so when you train the model you train specific expert for coding, for math ,sience and so on !Theory is so you cann olly activate the experts in model that you need with exact trained specification

English

0xSero@0xSero·9h

I still believe MoEs with cpu offloading can be competitive and bring down costs tremendously. I hit a wall with my testing, mainly: How can you predict which experts are going to be active given a prompt’s trajectory? Anyone interested in digging into this more? Shoot a plan

witcheer ☯︎@witcheer

MoE vs dense offload on 8GB VRAM MoE offload is 10.8x faster than dense offload on 8GB VRAM. here's the proof. I tested Qwen3.6 35B A3B (MoE, 3B active) vs Qwen3.6 27B (dense, 27B active) on my RTX 4060 Ti 8GB. the numbers: >MoE (-ncmoe 30): 35.4 tok/s >dense (-ngl 20): 3.28 tok/s ratio: 10.8x it gets worse at longer context. at 24K tokens, the gap is 16.7x. MoE has zero context degradation (SSM layers), dense loses -35.4%. why: MoE expert offload keeps the hot path (3B active params) entirely in VRAM. only inactive experts move to CPU when selected. dense layer offload splits every layer across GPU and CPU. every token bounces through PCIe for all 64 layers. the bandwidth bottleneck is fatal. quality is slightly better on dense (5/6 vs 4/6). the 27B model has the best hallucination resistance of all 9 models I tested. if you have 8GB VRAM and a model that doesn't fit: MoE with expert offload, not dense with layer offload.

English

7.9K

Blak@blako87·12h

@Surendar__05 yes i have learned also this days c# basics to advance ! very helpfull when you want to build something thats works

English

139

Surendar@Surendar__05·1d

Be honest devs, Is coding still worth learning in the AI era?

English

236

645

86.4K

Blak@blako87·12h

@jcrr4220 @CryptoTony__ o wow ! so basicly you dont know how to guide an llm! its not "ai" or magic! think about it as a Bilion parameter database that queries fast ! q k v so basics data base knowledge and basis coding knowledge would gave you super powers to ghet out the right code! not entire app!

English

Joao Rocha@jcrr4220·1d

@CryptoTony__ Why is it all the fuzz around AI tools about coding? I would like to hear about AI tools helping real engineering problems, like hidraulic problem solving, I’ve tried several times and the results are very disappointing.

English

1.1K

Crypto Tony@CryptoTony__·2d

🚨 Anthropic just showed a 24-minute workshop on how to actually do prompts for Claude. Taught by the people who built it. Free. No registration. No paywall. I've seen $500 courses that don't cover what they teach in the first 8 minutes. Watch it and bookmark it now.

English

384

2.4K

278.7K

Blak@blako87·12h

@sulekhat95 50 years of knowlege in one hour !! comme onn!!

English

Sulekha Tripathi@sulekhat95·1d

A man spends 50 years teaching at MIT. He knows his time is running out. So he records one last lecture — everything he knows, distilled into a single hour. He died 5 months later. This is that lecture. The most important hour you'll watch this week. 👇 Bookmark it for later

English

2.2K

9.1K

808.2K

Blak@blako87·20h

@sudoingX llama.cpp qwen 3.6 27B dense sm tensor (2xR9700) 42-45 tg/s. qwen 3.6 35B A3B sm tensor 68tg/s both Q8K_XL ! qwen3 next 80B-3b same flags 62 tg/s ! -c 96k -n 62k -u 1024 -ub 1024 pp/3.1k-3.3k t/s ! use it for heavy data generation in json/jsonl format

Indonesia

Sudo su@sudoingX·1d

actually, let's not just wait. if you are running local models on AMD right now, R9700, Strix Halo, a 7900 XTX, any RDNA card, on a ROCm or Vulkan build, drop your numbers in the replies. model, quant, the card, tok/s. one line is enough. i'll pull the best into a proper thread and amplify the builders who contribute. consider it the AMD half of the list, written by the people actually running it. real numbers from real cards. let's build the AMD picture together while my hardware ships.

Sudo su@sudoingX

be patient anon. i could fake the AMD numbers tonight. i won't. so we wait.

English

11.9K

Blak@blako87·2d

@leopardracer but how do you solve the hardcoded 2GB metabuffer ?the server will crash when >2GB! so llama.cpp you will not be able to run as a 24/7 server with 120k context ! im testing now for weeks for stable 24/7 server inference for lokal production! please share with us the code!

English

383

leopardracer@leopardracer·2d

THIS AMERICAN DEVELOPER SPENT WEEKS DEBUGGING TIMEOUT ERRORS IN OLLAMA. THEN HE LOOKED UNDER THE HOOD LM Studio is just llama.cpp Ollama is just llama.cpp so he cloned llama.cpp from source, pulled Qwen 3.6 35B off Hugging Face, set up asymmetric KV quantization and got a local server running on 127.0.0.1:8080 plugged it into VS Code, connected it to OpenClaw, 53 tok/s on an M1 Max with 262K context zero wrappers, zero timeout errors, zero API fees bookmark & like this before your next timeout error hits full breakdown of my raw llama.cpp setup ↓

leopardracer@leopardracer

x.com/i/article/2055…

English

150

1.4K

233.3K

Blak@blako87·2d

@1337hero i use wget and this is better ,full 100mb/s not mbit !

English

Mike Key@1337hero·3d

Why is downloading form Hugging Face so painfully slow?

English

2.5K

Blak@blako87·3d

@_vmlops on a disk ! our big company has a " nasserver" save your work on disk + external drive like milions of peoples!

English

Vaishnavi@_vmlops·3d

IF GITHUB DIDN'T EXIST, WHERE WOULD YOU BE SAVING YOUR CODE RIGHT NOW.....?

English

2.1K

Blak@blako87·4d

@claudeai hey guys !your ratelimits are broken, i have done one question and asap the 5 hours limit went from 0 to 100% ! guys this is not ok !with sonet 4.6 !!! not even opus !

English

Blak@blako87·4d

@jun_song i have 64 gb vram and still to low!! i think for production use ,daily working we need at least 200gb vram to have a good reasoning llm on your side! below is just prototype

English

송준 Jun Song@jun_song·4d

If you see posts like "$500 mini PC running local LLMs" it's a scam. Right now, the absolute bare minimum to run an actually productive local LLM is 24GB VRAM or 32GB Unified Memory (Mac). Anything below that is literally just a toy chatbot. At least for now. Just give it a few months.

English

188

11.1K

Blak@blako87·4d

@ApollosMission yeah of course ! tetris

English

Landon - KRNG Apollo@ApollosMission·5d

Call of Duty is dead. Arc Raiders is dead. Fortnite is dead. Battlefield is dead. GTA is still not out. What is anyone even playing anymore these days?

English

17K

1.3K

42.7K

13.1M

Blak@blako87·5d

@DivyanshT91162 hmmm! thenn tell us how do you solve the problem im llama.cpp with metabuffer/pool alocation ?its 2GB hardcoded! so basicly after enough requests the server will crash/ close with out of... ! so yeah 24/7 without to resize is not posible!

English

422

divyansh tiwari@DivyanshT91162·5d

Everyone is distracted by AI agents in the cloud… Meanwhile, some people quietly turned their laptops into autonomous AI research machines running 24/7 locally. No OpenAI bill. No GPU server. No internet required after setup. Just: • Qwen3-35B-A3B • llama.cpp • 4-bit quant by Unsloth This thing can read papers, reason through problems, write code, summarize research, and keep working while you sleep. We’ve officially entered the era where a single laptop can do work that needed an AI lab a year ago. Repo 👇

English

190

15.1K

Blak@blako87·11 May

@1337hero i have just 2 of them

English

Mike Key@1337hero·11 May

Man - Running Qwen3.5-122B-A10B through 25 code scenarios with a total wall time of 16 mins through the whole bench mark. 32.2s scen avg. Loving these R9700's

English

865

Blak@blako87·11 May

@steipete i will give it a try

English

Peter Steinberger 🦞@steipete·11 May

I built a whole distributed caching layer over gh. Still run into limits.

English

574

191.9K

Blak@blako87·10 May

@ritu_twts yes !but not write only ! important is deeper understand how things works !and yeah they fool you to not made a cs journey! to understand how compute work will gave you super powers! go study cs !

English

185

Reethu@ritu_twts·10 May

Be honest devs, Is coding still worth learning in the AI era?

English

784

2.2K

707.7K

Blak@blako87·9 May

@alispahic_dev @ThePrimeagen and it is ! give it a try! i used it as a mix !xml for structured data and .md for the goal ! you will be happy

English

Mirza Alispahic@alispahic_dev·9 May

@ThePrimeagen We should go back to xml, I am sure it will be best for AI 😂

English

909

ThePrimeagen@ThePrimeagen·9 May

I cannot wait for hypertext to be used. The pisi*,** will reach levels previously thought impossible * Prompt injections per square inch (imperial of course) ** I did originally do pilb for per pound, but it's just not at funny sounding

Thariq@trq212

x.com/i/article/2052…

English

273

44.2K

Blak@blako87·9 May

@ryancarson 2024 to 2025 was json format 2025 to 2026 markdaown format ,now html but XML stays #1 at Anthropic they use xml more dan .md files

English

Ryan Carson@ryancarson·9 May

Hmmmm. Might be switching out my .md for .html Makes so much sense once you think about it

Thariq@trq212

x.com/i/article/2052…

English

141

989

353.2K

Blak@blako87·8 May

@AMD one of this cards would make me happy and bring my work forward! to be able lokal to train bigger llms as 8B -14B and host large models !hui , i gues i need to open an inly fans account to make some cash to affird this card!

English

AMD@AMD·8 May

AI adoption shouldn’t require rebuilding your data center. AMD Instinct MI350P PCIe GPUs are designed to deliver production AI inference performance in standard air-cooled servers, helping enterprises scale AI with existing infrastructure. Learn more: bit.ly/48Hxfl4

English

308

21.9K

Blak@blako87·8 May

@jun_song to be honest !i think even senior devs with 20+ years experience, are not comming to review the code in the time as the agent produce! so basicli when you use agents in auto mode ,you are a still passenger ,not the driver !

English

송준 Jun Song@jun_song·8 May

I see so many posts about running dozens or hundreds of agents at the same time. Honestly haven't seen anyone use it properly yet. It's mostly just flexing for impressions.

English

2.5K

Keşfet

@stevibe @0xSero @Surendar__05 @jcrr4220 @CryptoTony__ @sulekhat95 @sudoingX @leopardracer