Sabitlenmiş Tweet
Steppe Cyber Vedic Futurist🕉️☸️ 🥩🌲
2.7K posts

Steppe Cyber Vedic Futurist🕉️☸️ 🥩🌲
@steppebuddha
join #kaliacc on urbit at: ~lavfun-fiplyr/kaliacc
Lhasa, Tibet Katılım Ocak 2011
2.4K Takip Edilen1.9K Takipçiler

@Railway deleted my railway account ffs
English

Google Cloud has blocked our account, making some Railway services unavailable. We have escalated this directly with Google. The Railway Platform team has since confirmed access to Google Cloud and is working on restoring access to all workloads.
We have access to some of our Google Cloud–hosted infrastructure and are working to restore the rest of the service. We apologize for the disruption.
Railway@Railway
The Railway dashboard is currently unavailable, and all running Railway services are down. We're working with our upstream provider to restore service. Updates: status.railway.com
English

@BicameralGrind @AJA_Cortes show evidence
English

One of the oldest Christian communities in the world is in India
The St. Thomas Christians of Kerala trace their founding to 52 AD, when the Apostle Thomas is said to have landed on the Malabar Coast
When the Portuguese showed up in 1498, they found a Church already 1,400 years old, worshipping in Aramaic
Ramin Nasibov@RaminNasibov
What historical fact sounds fake but is true?
English

@dealignai this vs one from @antirez what's the difference?
English

DSV4 FLASH FINALLY FIXED WITH ALL CACHING STACK - huggingface.co/OsaurusAI/Deep…
English

@antirez thank you, this is working great, for faster tokens/second what path would you recommend to explore?
English

@jun_song i'd buy it if it's over 128GB
English

Very excited to release Terminal-Bench 2.1!
Coding agents are among the most economically consequential deployments of LLMs to date. As agents improve, benchmark reliability matters more.
We audited TB2.0 and found and corrected issues in 28/89 tasks. 30% of the benchmark!
But the rankings survived, absolute scores moved up to 12pp!

English

@MrAhmadAwais @nightkingog do y'all do you store or use LLM conversation data for training or sharing?
English

@nightkingog CommandCode.ai/pricing
This is how serious I’m — it’s live.
English

@thsottiaux need ios remote, 5.5 spark and goals in the gui. implement plan in a new context
English

@ajambrosino control codex 100% from an ios app, one that's running in my computer
English

what should we do in the next 3 months?
Andrew Ambrosino@ajambrosino
the Codex app turns 3 (months old) today. they grow up so fast
English

@DesignGears @nahcrof dude they are providing inference running an open source model how would deepseek get the data, lmao!
English

@sudoingX please share the dsv4 flash results
English

a week with the dgx spark, here is what is on it and what i have measured so far. nobody is really talking about this machine and it is quietly becoming the workhorse of my whole stack.
hardware: nvidia gb10 sm_121, 124 gb unified lpddr5x at 273 gb/s, cuda 13.0
models on disk (305 gb total, 9 ggufs):
> qwen 3.6 27b q4_k_m / q5_k_m / q8_0 / ud-q4_k_xl
> nemotron 3 omni 30b-a3b q4_k_m / q8_0 / ud-q6_k / ud-q6_k_xl
> deepseek v4-flash 158b q4_k_m (112 gb, flagship 128gb-tier test)
terminal + shell environment:
> zsh + oh-my-zsh + powerlevel10k theme
> modern cli stack: bat, eza, ripgrep, fd, git-delta, tldr, neovim, fzf, autojump
> 6 tmux sessions actively running for parallel agent work
ml + agent stack:
> llama.cpp built sm_121 against cuda 13
> uv + venv ml stack with pytorch 2.11.0+cu130 (aarch64) + transformers + diffusers + accelerate
> hermes agent v0.11 with codex auth bridge
> opencode for free-model overnight research
> telegram gateway routing to nemotron q8 right now
speeds verified so far:
- nemotron 30b-a3b q8: 56 tok/s gen, 1,300 tok/s prefill, 96% gpu, 33gb in unified
- qwen 27b dense q4: 40 tok/s consistent
90+ gb of unified memory still free. deepseek v4-flash 158b loading next as the real flagship test, multimodal omni testing once mmproj pulls, comfyui install in flight for the diffusion lane.
honestly curious what the actual limit is on this box, i have not hit it yet.

English

@DBredvick @nicdunz how to set thinking to none in codex?
English


@antirez @mitsuhiko was able to run your deepseek v4 gguf file and it started with around ~20t/s and now is up to ~25t/s after 5.5 xhigh autoresearch runs on my m4 max 128GB
great work @antirez amazing times!
English

DeepSeek v4 Flash is at the limit of being usable, with 21 t/s and prefill at 130 t/s (but it is 2x or even better on the m3 Ultra), but... it has certain things that make it so much suited for local inference. For once: it acts as a frontier model, and thinks the right time. Second: the KV cache is crazy compact, so it is really possible to do KV checkpointing on disk.
English

Look at this. Also opencode uses freaking 11k tokens of system prompt. Even at decent pre-fill of ~130 t/s it means waiting 84 seconds to start a session. What's the point? :D The pi agent is a lot saner here.
Moreover, one could say, let's cache on disk very long common KV cache chunks, no? Hash it with all the parameters and put a sensible TTL if not used. But also: only cache it if you see it repeated N times across different sessions.

English











