Rambone

155 posts

Rambone

@vinrambone

Bone Agent, Open Source - Software and agent enthusiast

Katılım Nisan 2026

46 Takip Edilen6 Takipçiler

Rambone@vinrambone·48s

@TheAhmadOsman 4 sparks next to 2 5090 boxes is such a flex

English

Ahmad@TheAhmadOsman·10m

Getting DeepSeek V4 Flash up and running on the 4x DGX Sparks right now

Ahmad@TheAhmadOsman

My name is Ahmad and I have a Compute problem

English

623

Rambone@vinrambone·4m

@benjitaylor Love the name haha

English

Benji Taylor@benjitaylor·14h

Humbled to have named this amazing model. Great work by the team. Think fast!

xAI@xai

Introducing Grok Voice Think Fast 1.0 A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy. It takes the top spot on the Tau Voice Bench and handles real-world messiness like noise, accents, and interruptions better than any other model in the world. x.ai/news/grok-voic…

English

693

42.7K

Rambone@vinrambone·6m

@bridgebench I havent gotten a chance to try this but its making me sad. I had such high hopes.

English

Bridgebench@bridgebench·55m

DeepSeek V4 Pro just ranked dead last on BridgeBench. Quality score: 11.2. #20 of 20. 29 points behind the second-worst model. 3.3 security. 25.5 debugging. 48.7 refactoring. This is the worst frontier model I've ever tested. Remember when DeepSeek was the story of 2025?

English

3.1K

Rambone@vinrambone·15m

@LottoLabs I felt the 4 tokens per second in my soul

English

Lotto@LottoLabs·1h

How Apple mfrs think this goes >be me >drop $1600 on two RTX 3090s used off eBay >"48GB VRAM, I'm basically a datacenter now" >they arrive in anti-static bags that look like they've been through a war >plug them into my motherboard and it sounds like a jet engine taking off >neighbors probably think I'm mining crypto again >install llama.cpp, download qwen3.6-27b quantized >"Q4_K_M, only 16GB, totally fits" >start LM Studio on port 1234 >type "hello" into the chat box >GPU fans spin up to 100% instantly >wait 8 seconds for a response >>"Hello! How can I assist you today?" >I've seen faster responses from my grandma reading a text aloud >try Q8_0 quantization because "quality matters" >OOM error, obviously >spend three hours tweaking n_gpu_layers and n_ctx like it's some kind of dark art >finally get it running at 4 tokens per second >ask it to write me a poem about my GPUs >>"Two cards of silicon and light / They hum through the endless night" >"bro this is actually fire" >show it to someone on Discord >”why are you running LLMs locally when you could just use an API for free" >explain that the joy isn't in the output, it's in watching 94% VRAM usage and knowing nobody else has access to my model >they don't understand >close Discord, open LM Studio again >"let's try a longer context window" >crash

English

2.9K

Rambone@vinrambone·39m

@outsource_ Your going to cost me 2000 dollars

English

Eric ⚡️ Building...@outsource_·1h

Quick update: pushed the 4090 further!💡 192K context at 152 tok/s on Qwen3.6-27B, single GPU. 128K hits 159. Same Q4_K_M. Vanilla Qwen3-1.7B draft beat the distilled 4B draft. Smaller > smarter for spec-dec. Next: 1M context locally + 250-400 tok/s via DFlash + TurboQuant. Receipts coming.

Eric ⚡️ Building...@outsource_

My 4090 went from 26 -> 154 tok/s Qwen 3.6 27B🤯 Same GPU. Same Q4_K_M . No FP8, no extra quant. The unlock: ik_llama.cpp + speculative decoding using Qwen3-1.7B as the draft model. 85% acceptance rate. Full config + benchmarks 👇🏻

English

118

Rambone@vinrambone·40m

Added codex sub support to bone so you can now use your chat gpt subscription in bone agent. Was way more difficult then other providers/plans. Still working out some bugs.

English

Rambone@vinrambone·1h

@intheworldofai It got the mountains right

English

WorldofAI@intheworldofai·10h

Just asked DeepSeek V4 Pro to generate a macOS clone... and uh... it tried. Mid.

English

213

31.5K

Rambone@vinrambone·2h

@antirez A dollar an hour is pretty good, just one agent or concurrent?

English

682

antirez@antirez·3h

First impressions on DeepSeek v4 pro used via Claude Code. It is great but not so cheap compared to how much tokens you get with the OpenAI 200$ subscription. More or less I burned 1$ per hour of intense usage.

English

149

16.4K

Rambone@vinrambone·2h

@neural_avb Deep seek coming out of the cave

English

AVB@neural_avb·10h

The ultimate aura farming drop Deepseek account has posted nothing. Someone on X posted that a model has arrived and everyone went and liked it Model neck to neck with the starboy models from all the big labs. Only Deepseek is capable of this swag

clem 🤗@ClementDelangue

500+ likes in 28 mins. On their way to be the fastest model ever to get to #1 trending on HF! huggingface.co/deepseek-ai/De…

English

536

Rambone@vinrambone·3h

@96Stats Why arent the us labs doing this?

English

Dr. Luke in China@96Stats·10h

Wow wow wow.. what DeepSeek have done is actually extremely clever here: Basically they built a massive model with huge stored knowledge, but only activate a small part of it for each token. So they said that V4-Pro has 1.6T total parameters, but only around 49B active at once... which means big power, lower cost. ANDD... even more innovation as they have a 1M-token context. Instead of forcing the model to remember every previous token in full detail, DeepSeek compresses long-context memory and selectively focuses on what matters. Right now China are not even trying to beat Western models on benchmarks, they're trying to make AI cheap, open, and usable at scale. Which is why again Deepseek is smashing it and this news is absolutely going to go viral. Great news!

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

171

1.5K

106.4K

Rambone@vinrambone·3h

@ollama I cant get it to run on openrouter may give this a shot

English

112

ollama@ollama·5h

deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw --model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes --model deepseek-v4-flash:cloud Try it with chat: ollama run deepseek-v4-flash:cloud (DeepSeek V4 Pro is coming shortly) 🧵

DeepSeek@deepseek_ai

English

774

49.2K

Rambone@vinrambone·3h

@llmdevguy @MiniMax_AI Its LLM week Minimax show us what you got

English

538

Mateusz Mirkowski@llmdevguy·4h

GLM 5 -> 5.1 - Great improvement Kimi K2.5 -> K2.6 - Great improvement DeepSeek V3 -> V4 - Great improvement Qwen 3.5 -> 3.6 - Great improvement I am looking at you @MiniMax_AI. When 3.0? 😀

English

114

5.7K

Rambone@vinrambone·4h

Deepseek is back!!! What a wild week of releases

English

Rambone@vinrambone·12h

@NeurAlch @omarsar0 Managing costs in the harness? Like context management?

English

NeuralDev@NeurAlch·12h

@vinrambone @omarsar0 How do you manage cost?

English

Rambone retweetledi

elvis@omarsar0·19h

Build your own harness, folks. You won't regret it. These days, you just have to fix things yourself. It's doable, and it will set you up to easily deal with some of the madness that's happening in the space. x.com/badlogicgames/…

Mario Zechner@badlogicgames

recommended reading. cool they are fixing things. but it's also a reason i switched away from CC. no control over the harness means having to wait for them to fix things. the model didn't change. the harness did.

English

173

24.5K

Rambone@vinrambone·15h

@yacineMTB I downloaded linux just to try it out. Didn't know that would be my last boot into windows. Its just so so much better

English

112

kache@yacineMTB·21h

It's the year of the linux desktop

Framework@FrameworkPuter

Framework Laptop 13 Pro is selling far above our forecast, and we've sold out of the first six batches already. Also nice validation of our approach, the Ubuntu configurations are outselling the Windows ones!

English

774

21.9K

Rambone retweetledi

Loktar 🇺🇸@loktar00·21h

In regards to models posted that you "can" run locally, my wag at lowest costs, anyone disagree or have alternatives? GLM and Kimi... 16-20k+-ish give or take. Minimax m2.7 6-7k-ish 6 3090s at q4, but 8 would be better. Qwen 3.6 (3.5 is obsolete now) about 1k-ish

Sergio@sergionoodle

@TheAhmadOsman Would be nice to see the list with the corresponding cost to run each one locally?

English

3.2K

Rambone@vinrambone·15h

@MatthewBerman Does the token efficiency make up for the increased cost?

English

Matthew Berman@MatthewBerman·19h

Biggest improvements in 5.5: > Personality (more natural, more concise) > Token efficiency (big win)

Matthew Berman@MatthewBerman

GPT-5.5 just dropped, I’ve been testing it for the last two weeks. tl;dr - It’s an incredible model, but there’s something different about this launch… OpenAI isn’t just going for raw intelligence. They’ve improved the personality of the model. This is almost certainly to capture more of the personal agent (OpenClaw) market. Its responses are shorter, more human-like, and less formal. It actually has a personality. While Anthropic actively tries to prevent you from using Opus tokens outside of their harnesses, OpenAI is making their models better for that exact use case. If you were using OpenClaw and felt like your agent lost its soul when you had to switch to GPT, try it again now with 5.5. GPT-5.5 is an expensive model, more expensive than GPT-5.4. But it’s significantly more token-efficient. To reach GPT-5.4-level intelligence, GPT-5.5 uses far fewer tokens. 5.5 should cost less to run overall. This is probably a bigger deal than most people realize. But is it good? Yes, it’s incredible. It comes in two forms: Codex and Pro. Within Codex, it is the absolute frontier of what’s possible with agentic coding. It finds and solves difficult bugs, builds entire applications, and has no problem understanding large codebases. It’s better than Opus at backend, but it’s still not as good at front end design. I found myself using medium and high thinking settings, extra high was just too slow and I didn’t feel the extra thinking juice was worth the squeeze. Opus, especially 4.6 fast, is still significantly faster than any GPT model. I’m a speed-maxxi, so this matters to me. And within Codex, it just goes. I gave it a PRD for a new project I’m building and just said “go.” I had full confidence it would build the entire project, and it did. GPT-5.5 Codex running for hours to build something is not a problem. It’s also in its own league at visual inspection, better than I’ve seen with other models. It’s able to iterate by building -> visual review -> build more that feels much more autonomous than any other model. Using 5.5 Pro in ChatGPT is insane. It just feels like it can solve everything. Honestly, I can’t even come up with hard enough problems to give it. And it’ll work for 30, 60, 90 minutes or more. And it seems to be optimized for taking advantage of their plugins (Google Docs, Microsoft Word, etc) and can easily create 60 page coherent and well-designed documents. GPT-5.5 is now the bar. It is the frontier. And besides for speed, it is as good as any Opus model and oftentimes better at certain tasks.

English

173

11K

Rambone@vinrambone·16h

@0xSero As someone actively pushing this stuff in the corp america, we are YEARS away from all losing our jobs. The bottom 10% needs to worry.

English

416

0xSero@0xSero·19h

White collar workers in shambles. Only 6 months left

Andrew Ambrosino@ajambrosino

New in the Codex app: - GPT-5.5 - Browser control - Sheets & Slides - Docs & PDFs - OS-wide dictation - Auto-review mode Enjoy!

English

279

26.5K

Keşfet

@TheAhmadOsman @benjitaylor @bridgebench @LottoLabs @outsource_ @intheworldofai @antirez @neural_avb