fcan

91 posts

fcan

@0xfcan

running llms on 3090

İstanbul Katılım Ağustos 2011

259 Takip Edilen252 Takipçiler

fcan@0xfcan·18h

gstack reposunu okurken karşılaştığım sistem promptları

Türkçe

fcan@0xfcan·5d

Çoğu fine tune edilmiş modelin baz modele göre zayıf kaldığını görüyorum. Bunun sebebi benchmarklara overfit edilmesinden dolayı olabilir.

Türkçe

fcan@0xfcan·25 Nis

Nvidia endpoint'indeki belirli modelleri okul veya iş mail adresi ile ücretsiz kullanabiliyorsunuz ancak denediğim tüm modeller çok yavaş. Step 3.5 dışında verim alamadım diyebilirim. API key ve endpoint URL ile istenilen harness'a bağlanılabilir gibi. build.nvidia.com/models

Türkçe

fcan@0xfcan·22 Nis

@LottoLabs Just use llama.cpp

English

251

Lotto@LottoLabs·22 Nis

Every single guy gets oneshotted by LLMs recommending ollama for local infra

0xMarioNawfal@RoundtableSpace

A FREE PRIVATE AI AGENT ON YOUR LAPTOP IS STARTING TO LOOK WAY MORE REAL. Hermes, Ollama, and Gemma 4 turn a simple setup into a local agent with web research, self improving skills, and zero monthly model rent.

English

137

14.3K

fcan@0xfcan·2 Nis

@TheAhmadOsman waiting for more people to get AhmadOsman-pilled

English

Ahmad@TheAhmadOsman·2 Nis

Everybody's request to join x/LocalLLaMA has been approved We're 7772 strong now

Ahmad@TheAhmadOsman

Working on approving all members requests to our x/LocalLLaMA community twitter.com/i/communities/…

English

231

8.6K

fcan@0xfcan·2 Nis

1x3090'da Hermes+Qwen 3.5 27b. Harness olarak Opencode. Dario'nun veya Sam'in şirket içi maillerini takip etmek istemiyorsanız en iyi f/p setuplardan biri bu şu anda :)

Türkçe

fcan@0xfcan·2 Nis

@LottoLabs @JordanStev Best advice so far

English

Lotto@LottoLabs·2 Nis

@JordanStev Install llama.cpp right now and download qwen 27b

English

1.2K

Jordan Stevenson@JordanStev·1 Nis

Using Sonnet 4.6 for a Hermes Agent --- just spent $1.40 in API costs to ask it what the weather is like in Marbella today Not sure it's going to be cost-saving to route 10,000 monthly customer support conversations to this then 😄

English

1.3K

fcan@0xfcan·1 Nis

@0xSero denemek isterim!

Türkçe

0xSero@0xSero·1 Nis

Do you want to try Droid? I’m doing a giveaway 3 people will win 100M Factory credits each.Thats 5 months of their 20$ a month subscription. Winners selected randomly from comments in 48 hours.

English

1.1K

798

80.9K

fcan@0xfcan·31 Mar

@loktar00 llama.cpp is the king

English

185

Loktar 🇺🇸@loktar00·30 Mar

llama.cpp hit 100k stars.... honestly one of the most important projects in the local AI space. This is what made running models on your own hardware actually viable. If you're still routing everything through Ollama you're leaving performance on the table.

English

7.6K

fcan@0xfcan·31 Mar

Prompt girişi sonrası GPU'dan gelen hafif uğultu inference almanın donanım maliyetini bağlamına oturtuyor

Türkçe

fcan@0xfcan·30 Mar

@LottoLabs thank god somebody quantized it :)

English

196

Lotto@LottoLabs·30 Mar

Just need like 6x6000pro for this

English

fcan@0xfcan·29 Mar

@LottoLabs 3090 and a dream is a great phrase👍

English

187

Lotto@LottoLabs·29 Mar

I want to reiterate my tests are for brokies w/ a 3090 and a dream

English

6.2K

fcan@0xfcan·29 Mar

3090 Ti'da test etmeye başladığım modeller: 1) Qwen 3.5 & türevleri (Opus distilled,uncensored vb.) 2) nvidia/Nemotron-Cascade-2-30B-A3B 3) vngrs-ai/Kumru-2B 4) Sevgili @AlicanKiraz0 'dan /Kara-Kumru-v1.0-2B 5) Oto.tamamlama için zed-industries/zeta-2 (@0xSero 'nun önerisi ile)

Türkçe

212

fcan@0xfcan·28 Mar

@0xSero brilliant list! having fun with qwen 3.5 27b and nemotron cascade 2 30b!

English

348

0xSero@0xSero·28 Mar

Best models to run on your hardware level I'll be doing this every week, I hope you guys enjoy. ---- 8 GB ---- Autocomplete for coding (like Cursor Tab) - huggingface.co/NexVeridian/ze… - huggingface.co/bartowski/zed-… Tool calling, assistant style - huggingface.co/nvidia/NVIDIA-… ---- 16 Gb ---- Here things get better: Multimodal - huggingface.co/Qwen/Qwen3.5-9B - huggingface.co/Tesslate/OmniC… - huggingface.co/unsloth/Qwen3.… ---- 24 GB ---- - The best model you can get (thanks Qwen) huggingface.co/Qwen/Qwen3.5-2… - Great model (strong agents) huggingface.co/nvidia/Nemotro… - Mine hehe huggingface.co/0xSero/Qwen-3.… I'm doing a weekly series

English

219

367

3.7K

582.4K

fcan@0xfcan·28 Mar

@sudoingX 'in önerisi ile Qwen3.5-27B q4'ü aşağıdaki argları kullanarak 1x3090'da çalıştırıyorum. llama-server -ngl 99 -c 262144 -fa on --cache-type-k q4_0 --cache-type-v q4_0 Modeli @NousResearch Hermes agentı'na bağladım. Birkaç debug işlemi yaptım. İlk izlenimim çok olumlu

Türkçe

fcan@0xfcan·27 Mar

@0xSero you should check out nerve. github.com/daggerhashimot… i know the dev, he is a brillant guy btw.

English

1.2K

0xSero@0xSero·27 Mar

Please list all the apps you know of this category that's not already on the list: 1. Cursor Glass 2. Factory Desktop 3. Codex App 4. OpenCode App 5. Claude Desktop 6. CMUX I am looking for this mythic ADE: 1. Has browser 2. Has filesystem 3. Has BYOK 4. Has good ui

English

403

88.2K

fcan@0xfcan·26 Mar

Token/inference talebi gittikçe artarken YZ yatırım balonu olduğunu iddia edenlere inanmıyorum. Ve tabi ki lokal llm çalıştırıp kendinizi bu bağlamdan kurtarmak büyük rahatlık..

Thariq@trq212

To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.

Türkçe

fcan@0xfcan·26 Mar

@HEXtheantidote @alexocheema @MiniMax_AI @exolabs it looks like 15.8 token per second in quoted tweet

English

Numbers@HEXtheantidote·26 Mar

@alexocheema @MiniMax_AI @exolabs How many Tokens per second ?

English

528

Alex Cheema@alexocheema·26 Mar

6 x M1 Max mac studios repurposed to run @MiniMax_AI M2.5 using @exolabs. These are 4 year old devices, each with 400GB/s memory bandwidth (total 2.4TB/s). Second hand each mac is ~$1.2k.

William Ruider@ruider92545

EXO rocks!!! 🥰🥰🥰

English

369

56.2K

fcan@0xfcan·26 Mar

@AlicanKiraz0 @NVIDIAAIDev @Apple @nvidia tebrikler! 3090+mac studio cluster hayallerimiz gerçek mi oluyor :)

Türkçe

Alican Kiraz@AlicanKiraz0·26 Mar

Guys, presenting the World's First Heterogeneous @NVIDIAAIDev CUDA + @Apple Metal Distributed Inference Cluster 🎉 I managed to get @nvidia CUDA + Apple Metal running distributed LLM inference together — same pipeline, same ring. I owe this to exo's amazing infrastructure. ❤️ The BF16 model splits across architectures — layers 0-10 on NVIDIA, layers 10-32 on Apple Silicon. I achieved this by adding just ~200 lines across 5 files to exo's brilliant open-source codebase. PR to @exolabs coming soon.

English

200

47.2K

Keşfet

@LottoLabs @TheAhmadOsman @JordanStev @0xSero @loktar00 @elonmusk @BarackObama @taylorswift13