Rambone

159 posts

Rambone banner
Rambone

Rambone

@vinrambone

Bone Agent, Open Source - Software and agent enthusiast

가입일 Nisan 2026
49 팔로잉6 팔로워
Rambone
Rambone@vinrambone·
@RayFernando1337 On one end you have Google teasing 10t models and on the other hand qwen is releasing 27b models that are holding up against current frontier models. What are we doing google.
English
0
0
0
1
Rambone
Rambone@vinrambone·
@bitcloud Where did this 65k number come from?
English
0
0
0
2
Lachlan Phillips exo/acc 👾
Opus 4.7 is likely unstable because it's not predicting the next token based on your prompt, but based on the estimate +65,000 token long system prompt Amanda Askell and Dario Amodei inject into your prompt every single time you make a request.
teo@teodorio

opus 4.7 is unusable and I am saying this with a heavy heart, I continue to only use 4.6. 4.7 xhigh had to build some quick demo apps for me and simply forgot in the middle to change any of the schemas between apps, and reused as much as possible as if it didn't want to work

English
39
42
1.4K
92.5K
Rambone
Rambone@vinrambone·
@Hesamation for the price its lagging pretty far behind GLM and KIMI. It should be priced closer to minimax.
English
0
0
0
16
ℏεsam
ℏεsam@Hesamation·
IT TOOK 16 MONTHS FOR DEEPSEEK TO COOK. DeepSeek-V4 is now the largest open model with 1.6T, 49B active params, trained on 32T tokens, and using a new attention architecture, everything publicly available. It's especially optimized for AGENTIC TASKS.
ℏεsam tweet media
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
7
5
61
3.5K
Rambone
Rambone@vinrambone·
Going to test usage of the gpt plan on codex vs bone on gpt 5.5 later today. I cant tell if cache is working properly right now as tokens arent being reported.
English
0
0
0
26
Rambone
Rambone@vinrambone·
@bridgebench I havent gotten a chance to try this but its making me sad. I had such high hopes.
English
0
0
0
794
Bridgebench
Bridgebench@bridgebench·
DeepSeek V4 Pro just ranked dead last on BridgeBench. Quality score: 11.2. #20 of 20. 29 points behind the second-worst model. 3.3 security. 25.5 debugging. 48.7 refactoring. This is the worst frontier model I've ever tested. Remember when DeepSeek was the story of 2025?
Bridgebench tweet media
English
41
6
112
9.7K
Rambone
Rambone@vinrambone·
@LottoLabs I felt the 4 tokens per second in my soul
English
0
0
1
179
Lotto
Lotto@LottoLabs·
How Apple mfrs think this goes >be me >drop $1600 on two RTX 3090s used off eBay >"48GB VRAM, I'm basically a datacenter now" >they arrive in anti-static bags that look like they've been through a war >plug them into my motherboard and it sounds like a jet engine taking off >neighbors probably think I'm mining crypto again >install llama.cpp, download qwen3.6-27b quantized >"Q4_K_M, only 16GB, totally fits" >start LM Studio on port 1234 >type "hello" into the chat box >GPU fans spin up to 100% instantly >wait 8 seconds for a response >>"Hello! How can I assist you today?" >I've seen faster responses from my grandma reading a text aloud >try Q8_0 quantization because "quality matters" >OOM error, obviously >spend three hours tweaking n_gpu_layers and n_ctx like it's some kind of dark art >finally get it running at 4 tokens per second >ask it to write me a poem about my GPUs >>"Two cards of silicon and light / They hum through the endless night" >"bro this is actually fire" >show it to someone on Discord >”why are you running LLMs locally when you could just use an API for free" >explain that the joy isn't in the output, it's in watching 94% VRAM usage and knowing nobody else has access to my model >they don't understand >close Discord, open LM Studio again >"let's try a longer context window" >crash
English
22
9
228
14.5K
Eric ⚡️ Building...
Quick update: pushed the 4090 further!💡 192K context at 152 tok/s on Qwen3.6-27B, single GPU. 128K hits 159. Same Q4_K_M. Vanilla Qwen3-1.7B draft beat the distilled 4B draft. Smaller > smarter for spec-dec. Next: 1M context locally + 250-400 tok/s via DFlash + TurboQuant. Receipts coming.
Eric ⚡️ Building... tweet media
Eric ⚡️ Building...@outsource_

My 4090 went from 26 -> 154 tok/s Qwen 3.6 27B🤯 Same GPU. Same Q4_K_M . No FP8, no extra quant. The unlock: ik_llama.cpp + speculative decoding using Qwen3-1.7B as the draft model. 85% acceptance rate. Full config + benchmarks 👇🏻

English
27
20
194
14.7K
Rambone
Rambone@vinrambone·
Added codex sub support to bone so you can now use your chat gpt subscription in bone agent. Was way more difficult then other providers/plans. Still working out some bugs.
English
0
0
0
11
WorldofAI
WorldofAI@intheworldofai·
Just asked DeepSeek V4 Pro to generate a macOS clone... and uh... it tried. Mid.
English
17
8
233
35.2K
Rambone
Rambone@vinrambone·
@antirez A dollar an hour is pretty good, just one agent or concurrent?
English
0
0
0
863
antirez
antirez@antirez·
First impressions on DeepSeek v4 pro used via Claude Code. It is great but not so cheap compared to how much tokens you get with the OpenAI 200$ subscription. More or less I burned 1$ per hour of intense usage.
English
15
3
167
19.8K
Rambone
Rambone@vinrambone·
@96Stats Why arent the us labs doing this?
English
0
0
0
40
Dr. Luke in China
Dr. Luke in China@96Stats·
Wow wow wow.. what DeepSeek have done is actually extremely clever here: Basically they built a massive model with huge stored knowledge, but only activate a small part of it for each token. So they said that V4-Pro has 1.6T total parameters, but only around 49B active at once... which means big power, lower cost. ANDD... even more innovation as they have a 1M-token context. Instead of forcing the model to remember every previous token in full detail, DeepSeek compresses long-context memory and selectively focuses on what matters. Right now China are not even trying to beat Western models on benchmarks, they're trying to make AI cheap, open, and usable at scale. Which is why again Deepseek is smashing it and this news is absolutely going to go viral. Great news!
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
48
189
1.8K
124.4K
Rambone
Rambone@vinrambone·
@ollama I cant get it to run on openrouter may give this a shot
English
0
0
0
161
ollama
ollama@ollama·
deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw --model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes --model deepseek-v4-flash:cloud Try it with chat: ollama run deepseek-v4-flash:cloud (DeepSeek V4 Pro is coming shortly) 🧵
DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English
57
80
913
60K
Mateusz Mirkowski
Mateusz Mirkowski@llmdevguy·
GLM 5 -> 5.1 - Great improvement Kimi K2.5 -> K2.6 - Great improvement DeepSeek V3 -> V4 - Great improvement Qwen 3.5 -> 3.6 - Great improvement I am looking at you @MiniMax_AI. When 3.0? 😀
English
19
8
258
13.7K
Rambone
Rambone@vinrambone·
Deepseek is back!!! What a wild week of releases
Rambone tweet media
English
0
0
0
14
elvis
elvis@omarsar0·
Build your own harness, folks. You won't regret it. These days, you just have to fix things yourself. It's doable, and it will set you up to easily deal with some of the madness that's happening in the space. x.com/badlogicgames/…
Mario Zechner@badlogicgames

recommended reading. cool they are fixing things. but it's also a reason i switched away from CC. no control over the harness means having to wait for them to fix things. the model didn't change. the harness did.

English
22
22
178
25.5K
Rambone
Rambone@vinrambone·
@yacineMTB I downloaded linux just to try it out. Didn't know that would be my last boot into windows. Its just so so much better
English
0
0
1
116