Rambone

155 posts

Rambone banner
Rambone

Rambone

@vinrambone

Bone Agent, Open Source - Software and agent enthusiast

Katılım Nisan 2026
46 Takip Edilen6 Takipçiler
Rambone
Rambone@vinrambone·
@bridgebench I havent gotten a chance to try this but its making me sad. I had such high hopes.
English
0
0
0
41
Bridgebench
Bridgebench@bridgebench·
DeepSeek V4 Pro just ranked dead last on BridgeBench. Quality score: 11.2. #20 of 20. 29 points behind the second-worst model. 3.3 security. 25.5 debugging. 48.7 refactoring. This is the worst frontier model I've ever tested. Remember when DeepSeek was the story of 2025?
Bridgebench tweet media
English
29
4
55
3.1K
Rambone
Rambone@vinrambone·
@LottoLabs I felt the 4 tokens per second in my soul
English
0
0
1
36
Lotto
Lotto@LottoLabs·
How Apple mfrs think this goes >be me >drop $1600 on two RTX 3090s used off eBay >"48GB VRAM, I'm basically a datacenter now" >they arrive in anti-static bags that look like they've been through a war >plug them into my motherboard and it sounds like a jet engine taking off >neighbors probably think I'm mining crypto again >install llama.cpp, download qwen3.6-27b quantized >"Q4_K_M, only 16GB, totally fits" >start LM Studio on port 1234 >type "hello" into the chat box >GPU fans spin up to 100% instantly >wait 8 seconds for a response >>"Hello! How can I assist you today?" >I've seen faster responses from my grandma reading a text aloud >try Q8_0 quantization because "quality matters" >OOM error, obviously >spend three hours tweaking n_gpu_layers and n_ctx like it's some kind of dark art >finally get it running at 4 tokens per second >ask it to write me a poem about my GPUs >>"Two cards of silicon and light / They hum through the endless night" >"bro this is actually fire" >show it to someone on Discord >”why are you running LLMs locally when you could just use an API for free" >explain that the joy isn't in the output, it's in watching 94% VRAM usage and knowing nobody else has access to my model >they don't understand >close Discord, open LM Studio again >"let's try a longer context window" >crash
English
13
2
57
2.9K
Eric ⚡️ Building...
Quick update: pushed the 4090 further!💡 192K context at 152 tok/s on Qwen3.6-27B, single GPU. 128K hits 159. Same Q4_K_M. Vanilla Qwen3-1.7B draft beat the distilled 4B draft. Smaller > smarter for spec-dec. Next: 1M context locally + 250-400 tok/s via DFlash + TurboQuant. Receipts coming.
Eric ⚡️ Building... tweet media
Eric ⚡️ Building...@outsource_

My 4090 went from 26 -> 154 tok/s Qwen 3.6 27B🤯 Same GPU. Same Q4_K_M . No FP8, no extra quant. The unlock: ik_llama.cpp + speculative decoding using Qwen3-1.7B as the draft model. 85% acceptance rate. Full config + benchmarks 👇🏻

English
17
14
118
8K
Rambone
Rambone@vinrambone·
Added codex sub support to bone so you can now use your chat gpt subscription in bone agent. Was way more difficult then other providers/plans. Still working out some bugs.
English
0
0
0
11
WorldofAI
WorldofAI@intheworldofai·
Just asked DeepSeek V4 Pro to generate a macOS clone... and uh... it tried. Mid.
English
14
7
213
31.5K
Rambone
Rambone@vinrambone·
@antirez A dollar an hour is pretty good, just one agent or concurrent?
English
0
0
0
682
antirez
antirez@antirez·
First impressions on DeepSeek v4 pro used via Claude Code. It is great but not so cheap compared to how much tokens you get with the OpenAI 200$ subscription. More or less I burned 1$ per hour of intense usage.
English
13
3
149
16.4K
Rambone
Rambone@vinrambone·
@96Stats Why arent the us labs doing this?
English
0
0
0
24
Dr. Luke in China
Dr. Luke in China@96Stats·
Wow wow wow.. what DeepSeek have done is actually extremely clever here: Basically they built a massive model with huge stored knowledge, but only activate a small part of it for each token. So they said that V4-Pro has 1.6T total parameters, but only around 49B active at once... which means big power, lower cost. ANDD... even more innovation as they have a 1M-token context. Instead of forcing the model to remember every previous token in full detail, DeepSeek compresses long-context memory and selectively focuses on what matters. Right now China are not even trying to beat Western models on benchmarks, they're trying to make AI cheap, open, and usable at scale. Which is why again Deepseek is smashing it and this news is absolutely going to go viral. Great news!
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
40
171
1.5K
106.4K
Rambone
Rambone@vinrambone·
@ollama I cant get it to run on openrouter may give this a shot
English
0
0
0
112
ollama
ollama@ollama·
deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw --model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes --model deepseek-v4-flash:cloud Try it with chat: ollama run deepseek-v4-flash:cloud (DeepSeek V4 Pro is coming shortly) 🧵
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
54
68
774
49.2K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
GLM 5 -> 5.1 - Great improvement Kimi K2.5 -> K2.6 - Great improvement DeepSeek V3 -> V4 - Great improvement Qwen 3.5 -> 3.6 - Great improvement I am looking at you @MiniMax_AI. When 3.0? 😀
English
7
4
114
5.7K
Rambone
Rambone@vinrambone·
Deepseek is back!!! What a wild week of releases
Rambone tweet media
English
0
0
0
13
Rambone retweetledi
elvis
elvis@omarsar0·
Build your own harness, folks. You won't regret it. These days, you just have to fix things yourself. It's doable, and it will set you up to easily deal with some of the madness that's happening in the space. x.com/badlogicgames/…
Mario Zechner@badlogicgames

recommended reading. cool they are fixing things. but it's also a reason i switched away from CC. no control over the harness means having to wait for them to fix things. the model didn't change. the harness did.

English
22
20
173
24.5K
Rambone
Rambone@vinrambone·
@yacineMTB I downloaded linux just to try it out. Didn't know that would be my last boot into windows. Its just so so much better
English
0
0
1
112
Rambone retweetledi
Loktar 🇺🇸
Loktar 🇺🇸@loktar00·
In regards to models posted that you "can" run locally, my wag at lowest costs, anyone disagree or have alternatives? GLM and Kimi... 16-20k+-ish give or take. Minimax m2.7 6-7k-ish 6 3090s at q4, but 8 would be better. Qwen 3.6 (3.5 is obsolete now) about 1k-ish
Sergio@sergionoodle

@TheAhmadOsman Would be nice to see the list with the corresponding cost to run each one locally?

English
6
1
16
3.2K
Rambone
Rambone@vinrambone·
@MatthewBerman Does the token efficiency make up for the increased cost?
English
0
0
0
32
Matthew Berman
Matthew Berman@MatthewBerman·
Biggest improvements in 5.5: > Personality (more natural, more concise) > Token efficiency (big win)
Matthew Berman@MatthewBerman

GPT-5.5 just dropped, I’ve been testing it for the last two weeks. tl;dr - It’s an incredible model, but there’s something different about this launch… OpenAI isn’t just going for raw intelligence. They’ve improved the personality of the model. This is almost certainly to capture more of the personal agent (OpenClaw) market. Its responses are shorter, more human-like, and less formal. It actually has a personality. While Anthropic actively tries to prevent you from using Opus tokens outside of their harnesses, OpenAI is making their models better for that exact use case. If you were using OpenClaw and felt like your agent lost its soul when you had to switch to GPT, try it again now with 5.5. GPT-5.5 is an expensive model, more expensive than GPT-5.4. But it’s significantly more token-efficient. To reach GPT-5.4-level intelligence, GPT-5.5 uses far fewer tokens. 5.5 should cost less to run overall. This is probably a bigger deal than most people realize. But is it good? Yes, it’s incredible. It comes in two forms: Codex and Pro. Within Codex, it is the absolute frontier of what’s possible with agentic coding. It finds and solves difficult bugs, builds entire applications, and has no problem understanding large codebases. It’s better than Opus at backend, but it’s still not as good at front end design. I found myself using medium and high thinking settings, extra high was just too slow and I didn’t feel the extra thinking juice was worth the squeeze. Opus, especially 4.6 fast, is still significantly faster than any GPT model. I’m a speed-maxxi, so this matters to me. And within Codex, it just goes. I gave it a PRD for a new project I’m building and just said “go.” I had full confidence it would build the entire project, and it did. GPT-5.5 Codex running for hours to build something is not a problem. It’s also in its own league at visual inspection, better than I’ve seen with other models. It’s able to iterate by building -> visual review -> build more that feels much more autonomous than any other model. Using 5.5 Pro in ChatGPT is insane. It just feels like it can solve everything. Honestly, I can’t even come up with hard enough problems to give it. And it’ll work for 30, 60, 90 minutes or more. And it seems to be optimized for taking advantage of their plugins (Google Docs, Microsoft Word, etc) and can easily create 60 page coherent and well-designed documents. GPT-5.5 is now the bar. It is the frontier. And besides for speed, it is as good as any Opus model and oftentimes better at certain tasks.

English
9
1
173
11K
Rambone
Rambone@vinrambone·
@0xSero As someone actively pushing this stuff in the corp america, we are YEARS away from all losing our jobs. The bottom 10% needs to worry.
English
0
0
3
416