fos

490 posts

fos banner
fos

fos

@fosbix

life ends with the one that burns the candle

가입일 Şubat 2022
205 팔로잉17 팔로워
fos
fos@fosbix·
@LottoLabs @DataPlusEngine I don’t see any benefit to using lmastudio over llama.cpp directly. You just lose out on a ton of features.
English
0
0
0
23
Lotto
Lotto@LottoLabs·
@DataPlusEngine Lots of casuals will never even attempt cli Ollama is typically the recommended easy route LMstudio is far better and far easier I’ve used sglang,llama.cpp, vllm etc. Right tool for the job
English
1
0
13
613
fos
fos@fosbix·
Just you guys wait till an Afmoe model releases. Gemma 4 and Qwen 3.6 are just the start
English
0
0
0
21
Jorwhol
Jorwhol@jorwhol·
@gosrum What setup is needed to run this model? Two RTX 3090s sufficient?
English
3
0
1
1.9K
金のニワトリ
Qwen3.6-35B-A3Bが強すぎる!!! ・opencode,vibe-local,GitHub Copilot,qwencode,claude codeと組み合わせたときのts-benchを実施したところ、すべて満点 ・しかもClaude sonnet 4.6やOpus 4.6と同じくらい速くタスクを遂行できている Qwen3.5-27Bもすごかったが、Qwen3.6-35B-A3Bは赤い彗星のごとく27Bよりも推論速度が3倍速いので、ベンチマーク結果からもわかるようにタスク遂行までの時間が大幅に短縮できるようになったのが大きい
金のニワトリ tweet media
金のニワトリ@gosrum

Claude Opus 4.7に隠れてあまり話題になってないけど、Qwen3.6-35B-A3Bかなりすごいモデルなのでは?

日本語
20
99
599
127.9K
fos
fos@fosbix·
@griffisu As soon as the guys videos stop going viral we’ve achieved AGI
English
0
0
2
767
stevibe
stevibe@stevibe·
Qwen3.6 35B-A3B: smarter, but forgot how to use tools? Running 6 Bench Packs on BenchLocal across 3 open-source Qwen models. ✅ ReasonMath: 92 vs 85 vs 86 — 3.6 wins ✅ InstructFollow: 97 / 97 / 97 — tied ❌ ToolCall: 83 vs 97 vs 100 — 3.6 tanks Qwen3.5 27B still the tool-calling champ. 3.6 clearly leveled up reasoning, but tool use took a hit. DataExtract live now. BugFind + StructOutput next.
stevibe tweet mediastevibe tweet mediastevibe tweet mediastevibe tweet media
English
33
28
390
33K
First We Feast
First We Feast@firstwefeast·
david blaine being just as surprised as us 😩#hotones
English
7
55
1.1K
172.6K
fos
fos@fosbix·
@eliebakouch Probably ever frontier model is a MoE
English
0
0
0
231
elie
elie@eliebakouch·
yeah you know.... moe model are fundamentally limited... dense model are way better look at gemma4 and qwen3.5... you don't get it this is just a trend... moe are dead!!!
elie tweet media
Qwen@Alibaba_Qwen

⚡ Meet Qwen3.6-35B-A3B:Now Open-Source!🚀🚀 A sparse MoE model, 35B total params, 3B active. Apache 2.0 license. 🔥 Agentic coding on par with models 10x its active size 📷 Strong multimodal perception and reasoning ability 🧠 Multimodal thinking + non-thinking modes Efficient. Powerful. Versatile. Try it now👇 Blog:qwen.ai/blog?id=qwen3.… Qwen Studio:chat.qwen.ai HuggingFace:huggingface.co/Qwen/Qwen3.6-3… ModelScope:modelscope.cn/models/Qwen/Qw… API(‘Qwen3.6-Flash’ on Model Studio):Coming soon~ Stay tuned

English
31
10
438
60.6K
fos
fos@fosbix·
@stevibe @0xkeenz Which q4 are you using, qwen’s UD variant or NVFP4?
English
1
0
0
43
fos
fos@fosbix·
@songjunkr 2 months? It’s been less than 48 hours
English
0
0
1
214
송준 Jun Song
송준 Jun Song@songjunkr·
와우, qwen3.6-35b는 기존 27b와 sonnet-4.5를 이겼어요. Moe로 가능하다는것, 이건 말도 안되는 발전입니다. 고작 출시 2달도 안되었는데 말이에요. 허깅페이스⬇️
송준 Jun Song tweet media송준 Jun Song tweet media
한국어
19
38
476
43.2K
Jonas Čeika
Jonas Čeika@Jonas_Ceika·
ChatGPT glazing experiment #2
Jonas Čeika tweet media
English
108
385
13.2K
909.1K
fos
fos@fosbix·
@theo @robinebers There’s just no way you fail to acknowledge your own bias that egregiously. Cursor have just as much resources as Anthropic do. Just because they’re claiming to be putting the work in, that is enough cause for you to avoid holding them accountable in public? You’re being tricked.
English
0
0
0
231
Theo - t3.gg
Theo - t3.gg@theo·
@robinebers Oh I crash out at Cursor all the time in our private slack. It's 10x worse than anything I post here. The difference is that they listen and they're trying. I'd do similar to Google but I gave up long ago on using anything they produce lmao
English
11
0
240
23.9K
Theo - t3.gg
Theo - t3.gg@theo·
I feel bad dunking on them so much but it's genuinely absurd how bad the new Claude Code desktop app is. You can feel the vibe code leaking everywhere. Every "feature" is barely integrated and full of edge cases that weren't considered. Every menu feels barren, stuffed in last second for some random toggle. Every hotkey breaks as soon as you try to do anything else. I've lost track of how many bugs I've encountered. I found at least 40 in under an hour. And it's all truly absurd arcane shit. Stuff like voice mode typing in all input boxes instead of just the one you have focused. Any one of these issues would have been enough for me to do a massive post-mortem and likely fire someone. A $400b company shipping this is absurd. I feel like I'm going mad. How does anyone seriously use this?? It is broken on fundamental levels that are hard to comprehend. How are we supposed to trust the code these models produce if Anthropic's official showcases are absolute slop? Dedicated video on this coming tomorrow. Just needed to get this off my chest.
English
442
221
5.4K
1.1M
fos
fos@fosbix·
@whatever Face 5.5, Body 6.5, Total 5. Brains must’ve been a 3
English
0
0
0
144
whatever
whatever@whatever·
LOOKS RATINGS! He RATES them, they rate HIM?!
English
626
190
15.5K
583.3K
Guri Singh
Guri Singh@heygurisingh·
NVIDIA just dropped a 120B parameter model that only uses 12B at inference. It's called Nemotron 3 Super. 60.47% on SWE-Bench Verified, highest open-weight model ever for real-world coding. 85.6% on PinchBench, best open model as an AI agent brain. 91.75% on RULER at 1M tokens while GPT-OSS-120B collapses to 22.3%. 2.2x faster than GPT-OSS-120B. 7.5x faster than Qwen3.5-122B. Here's what makes this different from every other open model: It fuses 3 architectures into one: → Mamba-2 layers for linear-time sequence processing → LatentMoE, a new expert routing system with 512 total experts, 22 active per token → Strategic Transformer attention layers as "global anchors" LatentMoE is the real breakthrough. It compresses tokens into a latent space before routing to experts. This cuts memory bandwidth and communication costs by 4x while activating MORE experts per token. More experts. Less compute. Better accuracy. The model was trained on 25 TRILLION tokens. Natively in 4-bit precision (NVFP4) from the very first gradient update. Not quantized after training. Trained in 4-bit from day one. Post-training used 21 different RL environments across math, code, STEM, safety, tool use, and long-horizon agentic tasks. It also has built-in speculative decoding via Multi-Token Prediction. Average acceptance length of 3.45 tokens per step, beating DeepSeek-R1's 2.70 across every category. No external draft model needed. The speed is baked into the architecture. CodeRabbit, Factory, and Greptile already shipped integrations. Open weights. Open datasets. Open training recipes. All on HuggingFace. 100% Open Source.
Guri Singh tweet media
English
45
67
444
39K
fos
fos@fosbix·
@denizdd33 @yasinaktimur At best they use A*, much more likely something like Contraction Hierarchies, especially since accounting for traffic
English
0
0
1
243
Deniz Dede
Deniz Dede@denizdd33·
@fosbix @yasinaktimur Evet saf Dijkstra hantal kalacağı için doğrudan kullanılmaz. Yerine hedefe odaklanan A* veya ana yolları önceliklendiren CH gibi optimize türevleri tercih ediliyor. Ama mantık hala o 'en kısa yol' temelinden besleniyor.
Türkçe
1
0
17
1K
Rich kids of claude
Rich kids of claude@yasinaktimur·
🚨 son dakika : navigasyon uygulamalarının en kısa yolu nasıl bulduğu sızdırıldı.
Türkçe
171
452
14.1K
5.3M
Deniz Dede
Deniz Dede@denizdd33·
@yasinaktimur Sızdırma değil Dijkstra algoritması bilgisayar mühendisliği bölümlerinde anlatılan en temel algoritmalardan biri
Türkçe
3
0
352
21.4K
Elena
Elena@elenacute01·
He bit a blue-ringed octopus and the neurotoxin literally inflated his head into two giant orbs... ocean life is absolutely wild
English
367
1.1K
13.8K
3.7M
fos
fos@fosbix·
@V1RACY @gezine_dev There is a zero day in every software in the world. Knowing that one exists is a nothing burger
English
1
1
54
3K
Gezine
Gezine@gezine_dev·
Finally, after a year and a half since I started PlayStation hacking, I have achieved my goal. PS4/PS5 zero-day kernel exploit. Obviously, no plan to release it.
Gezine tweet media
English
994
456
9.1K
2.1M
InsertValue_
InsertValue_@InsertValue_·
@fosbix @Gh0stLead98 @system_monarch Capture cards do not remove DHCP, they actually enforce it. The only way to remove dhcp without modifying the hardware of software its using a Hdmi splitter, and not all of them work. And you have to consider that splitters and switches are completely different devices.
InsertValue_ tweet media
English
1
0
0
102
Puneet Patwari
Puneet Patwari@system_monarch·
Interviewer during system design round: Netflix plays a 4K HDR stream on your device. You screenshot it. Black screen. You screen record it. Black screen. You plug into an HDMI capture card. Black screen. You use a third party app to intercept the video buffer directly. Still black. How is Netflix blocking all 4 of these at the OS level without writing a single line of code for each device manufacturer separately?
English
290
54
4.1K
1.7M
송준 Jun Song
송준 Jun Song@songjunkr·
SuperGemma4와 같은 로컬모델들은 세팅에 매우 민감합니다. SuperGemma-fast : Text only, 140+tok/s, 매우 빠른모델 SuperGemma-Multimodal : 비전 및 에이전트 툴콜링 작업을 위한 스마트 모델 MLX : 맥 애플실리콘 환경을 위한 모델 (LM Studio는 아직 지원을 하지 않아서 백엔드 수정이 필요합니다.) GGUF : 범용 압축 모델로, 대부분의 환경에서 구동 가능합니다 (맥에서는 MLX보다 50%정도 느립니다.) 가장 쉬운 세팅 방법은 Codex / Claude 에게 최적화된 방향으로 세팅하라고 지시하는것 입니다. 제가 사용하는 하네스는 @NousResearch 의 Hermes Agent입니다. 토큰을 적게 사용하기 때문에 더 빠릅니다.
한국어
11
15
257
12.6K