Rambone (@vinrambone) - Twitter 프로필 | Zamantika Mersobahis Locabet

Rambone@vinrambone·13m

@RayFernando1337 On one end you have Google teasing 10t models and on the other hand qwen is releasing 27b models that are holding up against current frontier models. What are we doing google.

English

0

1

Ray Fernando@RayFernando1337·1h

Google is very confident about serving very large models. Perhaps a 10T+ model coming very soon???

Matthew Berman@MatthewBerman

Every AI lab is starving for compute. Except Google. I spoke with Thomas Kurian, Google Cloud's CEO, to understand why Google doesn't just hoard compute before AGI, their relationship with Anthropic, and that viral tweet about Google's engineering culture. Watch now: 0:00 – Intro 0:42 – Google's Insane Compute Capacity 03:17 – TPU Monetization 05:24 – Why Google Doesn't Hoard Compute? 08:02 – Datacenter Buildout 15:01 - Does AGI Mean Job Displacement? 17:55 - NVIDIA vs TPU (Total Cost of Ownership) 23:25 - 8th Gen TPU 24:32 - Training vs. Inference 30:53 - Google's "Extreme Co-design" 35:01 - Working with Anthropic 37:46 - Serving Mythos-sized Models (10T) 41:42 - Google engineering culture (Steve Yagge tweet) 48:27 - Cybersecurity 51:50 - What keeps Thomas up at night?

English

4

0

19

2.6K

Rambone@vinrambone·34m

@bitcloud Where did this 65k number come from?

English

0

2

Lachlan Phillips exo/acc 👾@bitcloud·13h

Opus 4.7 is likely unstable because it's not predicting the next token based on your prompt, but based on the estimate +65,000 token long system prompt Amanda Askell and Dario Amodei inject into your prompt every single time you make a request.

teo@teodorio

opus 4.7 is unusable and I am saying this with a heavy heart, I continue to only use 4.6. 4.7 xhigh had to build some quick demo apps for me and simply forgot in the middle to change any of the schemas between apps, and reused as much as possible as if it didn't want to work

English

39

42

1.4K

92.5K

Rambone@vinrambone·1h

@Hesamation for the price its lagging pretty far behind GLM and KIMI. It should be priced closer to minimax.

English

0

16

ℏεsam@Hesamation·7h

IT TOOK 16 MONTHS FOR DEEPSEEK TO COOK. DeepSeek-V4 is now the largest open model with 1.6T, 49B active params, trained on 32T tokens, and using a new attention architecture, everything publicly available. It's especially optimized for AGENTIC TASKS.

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

7

5

61

3.5K

Rambone@vinrambone·1h

Going to test usage of the gpt plan on codex vs bone on gpt 5.5 later today. I cant tell if cache is working properly right now as tokens arent being reported.

English

0

26

Rambone@vinrambone·1h

@TheAhmadOsman 4 sparks next to 2 5090 boxes is such a flex

English

0

1

74

Ahmad@TheAhmadOsman·1h

Getting DeepSeek V4 Flash up and running on the 4x DGX Sparks right now

Ahmad@TheAhmadOsman

My name is Ahmad and I have a Compute problem

English

24

3

149

4.3K

Rambone@vinrambone·1h

@benjitaylor Love the name haha

English

0

10

Benji Taylor@benjitaylor·16h

Humbled to have named this amazing model. Great work by the team. Think fast!

xAI@xai

Introducing Grok Voice Think Fast 1.0 A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy. It takes the top spot on the Tau Voice Bench and handles real-world messiness like noise, accents, and interruptions better than any other model in the world. x.ai/news/grok-voic…

English

43

10

721

45.3K

Rambone@vinrambone·1h

@bridgebench I havent gotten a chance to try this but its making me sad. I had such high hopes.

English

0

794

Bridgebench@bridgebench·2h

DeepSeek V4 Pro just ranked dead last on BridgeBench. Quality score: 11.2. #20 of 20. 29 points behind the second-worst model. 3.3 security. 25.5 debugging. 48.7 refactoring. This is the worst frontier model I've ever tested. Remember when DeepSeek was the story of 2025?

English

41

6

112

9.7K

Rambone@vinrambone·1h

@LottoLabs I felt the 4 tokens per second in my soul

English

0

1

179

Lotto@LottoLabs·2h

How Apple mfrs think this goes >be me >drop $1600 on two RTX 3090s used off eBay >"48GB VRAM, I'm basically a datacenter now" >they arrive in anti-static bags that look like they've been through a war >plug them into my motherboard and it sounds like a jet engine taking off >neighbors probably think I'm mining crypto again >install llama.cpp, download qwen3.6-27b quantized >"Q4_K_M, only 16GB, totally fits" >start LM Studio on port 1234 >type "hello" into the chat box >GPU fans spin up to 100% instantly >wait 8 seconds for a response >>"Hello! How can I assist you today?" >I've seen faster responses from my grandma reading a text aloud >try Q8_0 quantization because "quality matters" >OOM error, obviously >spend three hours tweaking n_gpu_layers and n_ctx like it's some kind of dark art >finally get it running at 4 tokens per second >ask it to write me a poem about my GPUs >>"Two cards of silicon and light / They hum through the endless night" >"bro this is actually fire" >show it to someone on Discord >”why are you running LLMs locally when you could just use an API for free" >explain that the joy isn't in the output, it's in watching 94% VRAM usage and knowing nobody else has access to my model >they don't understand >close Discord, open LM Studio again >"let's try a longer context window" >crash

English

22

9

228

14.5K

Rambone@vinrambone·2h

@outsource_ Your going to cost me 2000 dollars

English

1

0

1

230

Eric ⚡️ Building...@outsource_·3h

Quick update: pushed the 4090 further!💡 192K context at 152 tok/s on Qwen3.6-27B, single GPU. 128K hits 159. Same Q4_K_M. Vanilla Qwen3-1.7B draft beat the distilled 4B draft. Smaller > smarter for spec-dec. Next: 1M context locally + 250-400 tok/s via DFlash + TurboQuant. Receipts coming.

Eric ⚡️ Building...@outsource_

My 4090 went from 26 -> 154 tok/s Qwen 3.6 27B🤯 Same GPU. Same Q4_K_M . No FP8, no extra quant. The unlock: ik_llama.cpp + speculative decoding using Qwen3-1.7B as the draft model. 85% acceptance rate. Full config + benchmarks 👇🏻

English

27

20

194

14.7K

Rambone@vinrambone·2h

Added codex sub support to bone so you can now use your chat gpt subscription in bone agent. Was way more difficult then other providers/plans. Still working out some bugs.

English

0

11

Rambone@vinrambone·2h

@intheworldofai It got the mountains right

English

0

224

WorldofAI@intheworldofai·12h

Just asked DeepSeek V4 Pro to generate a macOS clone... and uh... it tried. Mid.

English

17

8

233

35.2K

Rambone@vinrambone·3h

@antirez A dollar an hour is pretty good, just one agent or concurrent?

English

0

863

antirez@antirez·5h

First impressions on DeepSeek v4 pro used via Claude Code. It is great but not so cheap compared to how much tokens you get with the OpenAI 200$ subscription. More or less I burned 1$ per hour of intense usage.

English

15

3

167

19.8K

Rambone@vinrambone·4h

@neural_avb Deep seek coming out of the cave

English

0

1

8

AVB@neural_avb·12h

The ultimate aura farming drop Deepseek account has posted nothing. Someone on X posted that a model has arrived and everyone went and liked it Model neck to neck with the starboy models from all the big labs. Only Deepseek is capable of this swag

clem 🤗@ClementDelangue

500+ likes in 28 mins. On their way to be the fastest model ever to get to #1 trending on HF! huggingface.co/deepseek-ai/De…

English

1

2

14

563

Rambone@vinrambone·4h

@96Stats Why arent the us labs doing this?

English

0

40

Dr. Luke in China@96Stats·12h

Wow wow wow.. what DeepSeek have done is actually extremely clever here: Basically they built a massive model with huge stored knowledge, but only activate a small part of it for each token. So they said that V4-Pro has 1.6T total parameters, but only around 49B active at once... which means big power, lower cost. ANDD... even more innovation as they have a 1M-token context. Instead of forcing the model to remember every previous token in full detail, DeepSeek compresses long-context memory and selectively focuses on what matters. Right now China are not even trying to beat Western models on benchmarks, they're trying to make AI cheap, open, and usable at scale. Which is why again Deepseek is smashing it and this news is absolutely going to go viral. Great news!

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

48

189

1.8K

124.4K

Rambone@vinrambone·4h

@ollama I cant get it to run on openrouter may give this a shot

English

0

161

ollama@ollama·7h

deepseek-v4-flash is now available on Ollama's cloud! Hosted in the US. Try it with Claude Code: ollama launch claude --model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw --model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes --model deepseek-v4-flash:cloud Try it with chat: ollama run deepseek-v4-flash:cloud (DeepSeek V4 Pro is coming shortly) 🧵

DeepSeek@deepseek_ai

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at chat.deepseek.com via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: huggingface.co/deepseek-ai/De… 🤗 Open Weights: huggingface.co/collections/de… 1/n

English

57

80

913

60K

Rambone@vinrambone·4h

@llmdevguy @MiniMax_AI Its LLM week Minimax show us what you got

English

3

0

2

955

Mateusz Mirkowski@llmdevguy·6h

GLM 5 -> 5.1 - Great improvement Kimi K2.5 -> K2.6 - Great improvement DeepSeek V3 -> V4 - Great improvement Qwen 3.5 -> 3.6 - Great improvement I am looking at you @MiniMax_AI. When 3.0? 😀

English

19

8

258

13.7K

Rambone@vinrambone·5h

Deepseek is back!!! What a wild week of releases

English

0

14

Rambone@vinrambone·14h

@NeurAlch @omarsar0 Managing costs in the harness? Like context management?

English

0

6

NeuralDev@NeurAlch·14h

@vinrambone @omarsar0 How do you manage cost?

English

1

0

9

elvis@omarsar0·20h

Build your own harness, folks. You won't regret it. These days, you just have to fix things yourself. It's doable, and it will set you up to easily deal with some of the madness that's happening in the space. x.com/badlogicgames/…

Mario Zechner@badlogicgames

recommended reading. cool they are fixing things. but it's also a reason i switched away from CC. no control over the harness means having to wait for them to fix things. the model didn't change. the harness did.

English

22

178

25.5K

Rambone@vinrambone·17h

@yacineMTB I downloaded linux just to try it out. Didn't know that would be my last boot into windows. Its just so so much better

English

0

1

116

kache@yacineMTB·23h

It's the year of the linux desktop

Framework@FrameworkPuter

Framework Laptop 13 Pro is selling far above our forecast, and we've sold out of the first six batches already. Also nice validation of our approach, the Ubuntu configurations are outselling the Windows ones!

English

26

17

777

22.1K

Rambone

탐색