fos

490 posts

fos

@fosbix

life ends with the one that burns the candle

가입일 Şubat 2022

205 팔로잉17 팔로워

fos@fosbix·14h

@LottoLabs @DataPlusEngine I don’t see any benefit to using lmastudio over llama.cpp directly. You just lose out on a ton of features.

English

Lotto@LottoLabs·16h

@DataPlusEngine Lots of casuals will never even attempt cli Ollama is typically the recommended easy route LMstudio is far better and far easier I’ve used sglang,llama.cpp, vllm etc. Right tool for the job

English

613

Lotto@LottoLabs·19h

This is a great example to never use ollama I get 90TPS just using LMstudio LMstudio is easier to use (gui) and is better optimized I don’t understand how they can make it so bad They both use llama.cpp backend but man it’s bad

stevibe@stevibe

Qwen3.6 35B-A3B dropped yesterday, so I ran it on 4 GPUs to see how it performs: 🟣 RTX 3090 — 49.78 tok/s, TTFT 852ms 🟡 RTX 4090 — 118.93 tok/s, TTFT 686ms 🟢 RTX 5090 — 160.37 tok/s, TTFT 409ms 🔵 DGX Spark — 59.98 tok/s, TTFT 228ms I went with ollama as the backend because honestly, it's the easiest way for most people to get started. One command, model pulled, done. I used Q4_K_M (24GB) across all four cards. The reason is the 3090 and 4090 don't support NVFP4 (only the 5090 and DGX Spark could use it). Keeping the same quant everywhere felt like the fairest way to compare. And yes, you can absolutely squeeze more performance out of every card with vLLM, SGLang, or TensorRT-LLM. But that's not what this test is about. This is just the out-of-the-box experience for folks who own a GPU and want to try the new model tonight.

English

213

27.7K

fos@fosbix·15h

Just you guys wait till an Afmoe model releases. Gemma 4 and Qwen 3.6 are just the start

English

fos@fosbix·21h

@jorwhol @gosrum A single 2070 is sufficient mate

English

Jorwhol@jorwhol·23h

@gosrum What setup is needed to run this model? Two RTX 3090s sufficient?

English

1.9K

金のニワトリ@gosrum·1d

Qwen3.6-35B-A3Bが強すぎる！！！・opencode,vibe-local,GitHub Copilot,qwencode,claude codeと組み合わせたときのts-benchを実施したところ、すべて満点・しかもClaude sonnet 4.6やOpus 4.6と同じくらい速くタスクを遂行できている Qwen3.5-27Bもすごかったが、Qwen3.6-35B-A3Bは赤い彗星のごとく27Bよりも推論速度が3倍速いので、ベンチマーク結果からもわかるようにタスク遂行までの時間が大幅に短縮できるようになったのが大きい

金のニワトリ@gosrum

Claude Opus 4.7に隠れてあまり話題になってないけど、Qwen3.6-35B-A3Bかなりすごいモデルなのでは？

日本語

599

127.9K

fos@fosbix·1d

@griffisu As soon as the guys videos stop going viral we’ve achieved AGI

English

767

redrum@griffisu·2d

this guy is doing unprecedented research on ai that will inevitably let us know whether AGI has arrived or not

Husk@huskirl

Idk what to type here rn

English

1.6K

183.4K

fos@fosbix·1d

@stevibe @0xkeenz I would try Unsloth’s UD Q_4_K_XL/IQ4_NL/NVFP4

English

stevibe@stevibe·1d

@fosbix @0xkeenz Q4_K_M

stevibe@stevibe·1d

Qwen3.6 35B-A3B: smarter, but forgot how to use tools? Running 6 Bench Packs on BenchLocal across 3 open-source Qwen models. ✅ ReasonMath: 92 vs 85 vs 86 — 3.6 wins ✅ InstructFollow: 97 / 97 / 97 — tied ❌ ToolCall: 83 vs 97 vs 100 — 3.6 tanks Qwen3.5 27B still the tool-calling champ. 3.6 clearly leveled up reasoning, but tool use took a hit. DataExtract live now. BugFind + StructOutput next.

English

390

33K

fos@fosbix·1d

@BuffaloWingGuy7 @firstwefeast Jew knowledge

English

107

Cabelas_Fella14@BuffaloWingGuy7·2d

@firstwefeast Can someone explain this?

English

9.4K

First We Feast@firstwefeast·2d

david blaine being just as surprised as us 😩#hotones

English

1.1K

172.6K

fos@fosbix·1d

@eliebakouch Probably ever frontier model is a MoE

English

231

elie@eliebakouch·1d

yeah you know.... moe model are fundamentally limited... dense model are way better look at gemma4 and qwen3.5... you don't get it this is just a trend... moe are dead!!!

Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B：Now Open-Source！🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog：qwen.ai/blog?id=qwen3.… Qwen Studio：chat.qwen.ai HuggingFace：huggingface.co/Qwen/Qwen3.6-3… ModelScope：modelscope.cn/models/Qwen/Qw… API（‘Qwen3.6-Flash’ on Model Studio）：Coming soon～ Stay tuned

English

438

60.6K

fos@fosbix·1d

@stevibe @0xkeenz Unsloth’s*

English

fos@fosbix·1d

@stevibe @0xkeenz Which q4 are you using, qwen’s UD variant or NVFP4?

English

fos@fosbix·1d

@songjunkr 2 months? It’s been less than 48 hours

English

214

송준 Jun Song@songjunkr·1d

와우, qwen3.6-35b는 기존 27b와 sonnet-4.5를 이겼어요. Moe로 가능하다는것, 이건 말도 안되는 발전입니다. 고작 출시 2달도 안되었는데 말이에요. 허깅페이스⬇️

한국어

476

43.2K

fos@fosbix·1d

@u_m_a_m_i @Jonas_Ceika “Just tell her its a religious garment” lmfao

English

140

3.5K

u m a m i@u_m_a_m_i·2d

@Jonas_Ceika Lmao. Gemini couldn't lie

English

1.3K

42.3K

Jonas Čeika@Jonas_Ceika·2d

ChatGPT glazing experiment #2

English

108

385

13.2K

909.1K

fos@fosbix·1d

@theo @robinebers There’s just no way you fail to acknowledge your own bias that egregiously. Cursor have just as much resources as Anthropic do. Just because they’re claiming to be putting the work in, that is enough cause for you to avoid holding them accountable in public? You’re being tricked.

English

231

Theo - t3.gg@theo·1d

@robinebers Oh I crash out at Cursor all the time in our private slack. It's 10x worse than anything I post here. The difference is that they listen and they're trying. I'd do similar to Google but I gave up long ago on using anything they produce lmao

English

240

23.9K

Theo - t3.gg@theo·2d

I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere. Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last second for some random toggle. Every hotkey breaks as soon as you try to do anything else. I've lost track of how many bugs I've encountered. I found at least 40 in under an hour. And it's all truly absurd arcane shit. Stuff like voice mode typing in all input boxes instead of just the one you have focused. Any one of these issues would have been enough for me to do a massive post-mortem and likely fire someone. A $400b company shipping this is absurd. I feel like I'm going mad. How does anyone seriously use this?? It is broken on fundamental levels that are hard to comprehend. How are we supposed to trust the code these models produce if Anthropic's official showcases are absolute slop? Dedicated video on this coming tomorrow. Just needed to get this off my chest.

English

442

221

5.4K

1.1M

fos@fosbix·2d

@whatever Face 5.5, Body 6.5, Total 5. Brains must’ve been a 3

English

144

whatever@whatever·2d

LOOKS RATINGS! He RATES them, they rate HIM?!

English

626

190

15.5K

583.3K

fos@fosbix·2d

@heygurisingh “Just dropped”

English

167

Guri Singh@heygurisingh·2d

NVIDIA just dropped a 120B parameter model that only uses 12B at inference. It's called Nemotron 3 Super. 60.47% on SWE-Bench Verified, highest open-weight model ever for real-world coding. 85.6% on PinchBench, best open model as an AI agent brain. 91.75% on RULER at 1M tokens while GPT-OSS-120B collapses to 22.3%. 2.2x faster than GPT-OSS-120B. 7.5x faster than Qwen3.5-122B. Here's what makes this different from every other open model: It fuses 3 architectures into one: → Mamba-2 layers for linear-time sequence processing → LatentMoE, a new expert routing system with 512 total experts, 22 active per token → Strategic Transformer attention layers as "global anchors" LatentMoE is the real breakthrough. It compresses tokens into a latent space before routing to experts. This cuts memory bandwidth and communication costs by 4x while activating MORE experts per token. More experts. Less compute. Better accuracy. The model was trained on 25 TRILLION tokens. Natively in 4-bit precision (NVFP4) from the very first gradient update. Not quantized after training. Trained in 4-bit from day one. Post-training used 21 different RL environments across math, code, STEM, safety, tool use, and long-horizon agentic tasks. It also has built-in speculative decoding via Multi-Token Prediction. Average acceptance length of 3.45 tokens per step, beating DeepSeek-R1's 2.70 across every category. No external draft model needed. The speed is baked into the architecture. CodeRabbit, Factory, and Greptile already shipped integrations. Open weights. Open datasets. Open training recipes. All on HuggingFace. 100% Open Source.

English

444

39K

fos@fosbix·2d

@denizdd33 @yasinaktimur At best they use A*, much more likely something like Contraction Hierarchies, especially since accounting for traffic

English

243

Deniz Dede@denizdd33·2d

@fosbix @yasinaktimur Evet saf Dijkstra hantal kalacağı için doğrudan kullanılmaz. Yerine hedefe odaklanan A* veya ana yolları önceliklendiren CH gibi optimize türevleri tercih ediliyor. Ama mantık hala o 'en kısa yol' temelinden besleniyor.

Türkçe

Rich kids of claude@yasinaktimur·2d

🚨 son dakika : navigasyon uygulamalarının en kısa yolu nasıl bulduğu sızdırıldı.

Türkçe

171

452

14.1K

5.3M

fos@fosbix·2d

@denizdd33 @yasinaktimur Not to mention its not what they use

English

1.1K

Deniz Dede@denizdd33·2d

@yasinaktimur Sızdırma değil Dijkstra algoritması bilgisayar mühendisliği bölümlerinde anlatılan en temel algoritmalardan biri

Türkçe

352

21.4K

fos@fosbix·2d

@elenacute01

QME

Elena@elenacute01·3d

He bit a blue-ringed octopus and the neurotoxin literally inflated his head into two giant orbs... ocean life is absolutely wild

English

367

1.1K

13.8K

3.7M

fos@fosbix·2d

@V1RACY @gezine_dev There is a zero day in every software in the world. Knowing that one exists is a nothing burger

English

Gezine@gezine_dev·2d

Finally, after a year and a half since I started PlayStation hacking, I have achieved my goal. PS4/PS5 zero-day kernel exploit. Obviously, no plan to release it.

English

994

456

9.1K

2.1M

fos@fosbix·3d

@InsertValue_ @Gh0stLead98 @system_monarch I think you must be as wrong as I was, my capture card circumvents DHCP entirely so is it an implementation thing?

English

InsertValue_@InsertValue_·3d

@fosbix @Gh0stLead98 @system_monarch Capture cards do not remove DHCP, they actually enforce it. The only way to remove dhcp without modifying the hardware of software its using a Hdmi splitter, and not all of them work. And you have to consider that splitters and switches are completely different devices.

English

102

Puneet Patwari@system_monarch·3d

Interviewer during system design round: Netflix plays a 4K HDR stream on your device. You screenshot it. Black screen. You screen record it. Black screen. You plug into an HDMI capture card. Black screen. You use a third party app to intercept the video buffer directly. Still black. How is Netflix blocking all 4 of these at the OS level without writing a single line of code for each device manufacturer separately?

English

290

4.1K

1.7M

fos@fosbix·3d

@sanjuruk @songjunkr

QME

송준 Jun Song@songjunkr·4d

SuperGemma4와 같은 로컬모델들은 세팅에 매우 민감합니다. SuperGemma-fast : Text only, 140+tok/s, 매우 빠른모델 SuperGemma-Multimodal : 비전 및 에이전트 툴콜링 작업을 위한 스마트 모델 MLX : 맥 애플실리콘 환경을 위한 모델 (LM Studio는 아직 지원을 하지 않아서 백엔드 수정이 필요합니다.) GGUF : 범용 압축 모델로, 대부분의 환경에서 구동 가능합니다 (맥에서는 MLX보다 50%정도 느립니다.) 가장 쉬운 세팅 방법은 Codex / Claude 에게 최적화된 방향으로 세팅하라고 지시하는것 입니다. 제가 사용하는 하네스는 @NousResearch 의 Hermes Agent입니다. 토큰을 적게 사용하기 때문에 더 빠릅니다.

한국어

257

12.6K

탐색

@LottoLabs @DataPlusEngine @jorwhol @gosrum @griffisu @stevibe @0xkeenz @BuffaloWingGuy7