Peanut Butter Runner

98 posts

Peanut Butter Runner

@p_b_runner

I love running and peanut butter

Bergabung Ocak 2022

21 Mengikuti0 Pengikut

Peanut Butter Runner@p_b_runner·9h

@Voxyz_ai This vs codegraph?

English

Vox@Voxyz_ai·1d

stop telling Claude Code/Codex "read this file". stop telling Claude Code/Codex "now read that one too". stop telling Claude Code/Codex "grep the whole repo". install codebase-memory. it indexes the Linux kernel, 28M lines, in 𝟯 𝗺𝗶𝗻𝘂𝘁𝗲𝘀. your repo takes seconds. index once and the whole repo becomes 𝗼𝗻𝗲 𝗴𝗿𝗮𝗽𝗵 of every function, file and dependency. one query replaces dozens of grep and read cycles. benchmarked across 31 real repos: → 10x fewer tokens on structural queries → 83% answer quality on complex tasks → 2.1x fewer tool calls two prompts. send them straight to your agent 👇

English

127

356

402.3K

Peanut Butter Runner@p_b_runner·12h

@tonbistudio Yooooo rare unfollowed tonbi in the feed. Gotta fix that hit follow asap!

English

tonbi@tonbistudio·1d

Nous Research just dropped MOA (Mixture of Agents) presets inside Hermes Agent. I made a quick video showing how to set it up and create your own MOA. The idea: mix multiple models to get capabilities beyond any single model you can use right now. How it works: Normally Hermes sends your conversation + tools to one model. With MOA you get several reference models plus one aggregator. The references read the conversation and offer thoughts and suggestions, but they get no tool access and never reply to you directly. The aggregator is the one that actually acts. It sees the normal conversation plus the private advice from the references, then makes the tool calls and writes the final response. From Hermes's side, the aggregator's output IS the model's response, so you can use /goal or anything else like that. Cool idea, curious to see how it really performs!

Nous Research@NousResearch

The strongest models are gated and access is granted only to a select few. Hermes Agent now exposes MoA presets as virtual models, giving you capabilities beyond the publicly available frontier: 8% higher than Opus 4.8 and 11% higher than GPT 5.5 on our upcoming benchmark.

English

135

1.4K

101.6K

Peanut Butter Runner@p_b_runner·23h

@BeatsPlanet @ornith_ Shots fired

English

DJ Naydee@BeatsPlanet·1d

@ornith_ The only bench that matters was left out, deepswe. I wonder why

English

Ornith@ornith_·2d

Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…

English

468

977

6.5K

5.1M

Peanut Butter Runner@p_b_runner·23h

@vildavedo @ornith_ No way that kind of performance on local consumer hardware?? That’s nuts!!! But what tps? And what’s your context window? Does the performance degrade as context fills? And you can only use use to run a single agent at a time right?

English

Vilda@vildavedo·2d

@ornith_ I am running Ornith-1.0-35B-GGUF:BF16 on an RX 6900 XT + Threadripper 3970X with 128 GB RAM. It's very impressive. From my testing in Zed editor over Ollama-Vulkan, it's between GPT-5.3 xHigh and 5.4 High. I'm ready to donating to dev process, no limits, local data, that's it.

English

Peanut Butter Runner@p_b_runner·2d

@mattpocockuk Less than 24 hours later: x.com/mattpocockuk/s…

Matt Pocock@mattpocockuk

Did some more writing today so improved my writing-* skills: - writing-fragments: generate fragments of text to be shaped - writing-beats: generate a first draft by finding a path through the fragments - writing-shape: final shaping, headings and organization Still rocks

English

Matt Pocock@mattpocockuk·3d

2023: Tutorial Hell 2026: Skill Hell

Svenska

940

58.8K

Peanut Butter Runner@p_b_runner·2d

@vibeonX69 @themishra4402 My M1 works great

English

kritika@vibeonX69·2d

@themishra4402 Iam still on m1

English

4.6K

Rahul 🥷@themishra4402·2d

If MacBooks are so powerful... why do most developers still use Windows?

English

146

410

62.3K

Peanut Butter Runner@p_b_runner·3d

@rankintweets I spend a lot of time talking at my computer now. Still need my keyboard though

English

Adam Rankin@rankintweets·4d

Lead engineer just got rid of his keyboard…???

English

601

386

13.5K

1.5M

Peanut Butter Runner@p_b_runner·3d

@twtayaan No fucking way. No more Mac Docker performance penalty??? Sign me up

English

Ayaan 🐧@twtayaan·4d

Apple just made Docker Desktop optional on Mac. And it is completely free. This is apple/container. 26.5k stars no Github. You can now run Linux containers natively on your Mac without installing Docker Desktop, without a background daemon hogging your RAM, and without paying $21 a month per developer for a commercial license. Here is what it does: → Runs Linux containers as lightweight VMs directly on Apple Silicon using macOS 26 virtualization → Fully OCI compatible. Pull any image from Docker Hub, GitHub Container Registry or anywhere else → Written in Swift and optimised specifically for Apple Silicon. Faster and lighter than anything Docker Desktop does on Mac → Standard container CLI syntax. If you know Docker commands you already know how to use this → Push images you build to any standard container registry and run them anywhere Docker Desktop charges $21 per developer per month for commercial use. Apple's version costs nothing and ships as open source under Apache-2.0. Microsoft made Docker Desktop optional on Windows with WSL Containers last month. Apple just did the same on Mac. Docker is not going anywhere. But the era of paying for a GUI wrapper around containers on your own machine is quietly ending. Repo here: github.com/apple/container

English

201

10.3K

932.6K

Peanut Butter Runner@p_b_runner·3d

@0xtreysync @thdxr Yep so many times it just does stuff wrong and then tries another method instead of deleting the broken code and replacing it with the correct code

English

1.1K

treysync@0xtreysync·3d

@thdxr “Fallback” is also a great slop indicator

English

216

29.9K

dax@thdxr·3d

i found a really good way to measure how much a codebase is suffering from vibe coding rg -o 'isRecord' . | wc -l

English

2.8K

422.1K

Peanut Butter Runner@p_b_runner·4d

@roenelteck @DaveShapi Yeah I’m guessing it will be Google be Tesla Tesla has the infra Google has the historical business foundation and userbase

English

Roenel@roenelteck·4d

@DaveShapi I’m betting on Gemini. I don’t think google has even released their frontier model yet. They have just given us a preview.

English

David Shapiro (L/0)@DaveShapi·4d

Okay, Gemini is by far the kindest and most patient and emotionally intelligent model. It beats grok, which is woke by comparison.

English

284

17.6K

Peanut Butter Runner@p_b_runner·19 Haz

@jietang How does the cost efficiency compare to DeepSeek V4 Pro?

English

112

jietang@jietang·17 Haz

We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.

English

181

302

3.7K

360.6K

Peanut Butter Runner@p_b_runner·19 Haz

@jgilbertson47 @nutlope How does kimi compare to GLM 5.2 regarding cost?

English

Jason Gilbertson@jgilbertson47·17 Haz

@nutlope Any chance you've tested Kimi 2.7 as well? I'm debating that vs. GLM 5.2 as my backup UI buddy when Claude credits run out.

English

9.4K

Hassan@nutlope·17 Haz

This model is insane at design. I asked GLM 5.2 (left) and Opus 4.8 (right) to build me a landing page and you can't even tell the difference. GLM cost $0.06 while opus cost $0.49. More than 6x cheaper while being faster + more token efficient. Another win for open source AI.

Z.ai@Zai_org

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai

English

352

564

8.3K

1.6M

Peanut Butter Runner@p_b_runner·19 Haz

@heyvishal_ @nutlope How does Kimi 2.7 compare on cost?

English

Vishal@heyvishal_·17 Haz

@nutlope They both look quite similar tbh. I think kimi 2.7 might be better than these 2

English

10.9K

Peanut Butter Runner@p_b_runner·17 Haz

@ggerganov I'm thinking about putting together an audio processing pipeline to handle something like a podcast, along with Diarization. People will be doing multiple voices and impersonations, which I heard makes things more complicated. Any recommendations?

English

Peanut Butter Runner@p_b_runner·14 Haz

@notjazii Did your prompt include testing or did it figure that out itself? I doubt the agent could get it right on the first try without testing of some kind.

English

J A Z I I@notjazii·13 Haz

GLM 5.2 vs Kimi K2.7 both were tested on same settings with same prompt > GLM one shotted everything in 35 mins > Kimi needed extra prompts to fix movements and took 30 mins super surpised by how good GLM 5.2 is, while being so cheap which one do you think won?

J A Z I I@notjazii

Chinese AI labs are shipping like crazy > Kimi K2.7 was released yesterday > GLM 5.2 was released a few hours ago and both are actually good open models are catching up way faster than most people expected already tested K2.7 waiting for GLM to be available in opencode so I can run the same tests drop a coding task you want me to test on both 👇

English

460

63.7K

Peanut Butter Runner@p_b_runner·14 Haz

Has anyone tried DeepSeek V4 with Reasonix AND Headroom? I'm wondering if the caching optimizations from Headroom conflict with the ones from Reasonix, and it'd be better to run Reasonix without Headroom.

English

Peanut Butter Runner@p_b_runner·14 Haz

@teortaxesTex @rk625dev Would it make sense to use Reasonix with github.com/chopratejas/he…?

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·5 Haz

@rk625dev This is Reasonix

English

268

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·4 Haz

Stop, I can only become Chinese so fast

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) tweet media

English

15.4K

Peanut Butter Runner@p_b_runner·14 Haz

@filicroval would it make sense to use it with headroom? github.com/chopratejas/he…

English

filipe@filicroval·25 May

DeepSeek just got its own native coding agent, and it’s really clean it's called Reasonix: a terminal (TUI) + desktop AI coding agent built exclusively around DeepSeek’s prefix cache. result? 94%+ cache hit rates in long sessions → you can literally leave it running for hours and the bill is basically zero. npx reasonix code → sandboxed edits, MCP tools, skills, replay logs, pending-approval workflow. so clean

English

1.7K

Peanut Butter Runner@p_b_runner·31 May

@thoughtson_tech @moritzthuening @HannesStaerk @tenstorrent What’s the point of all this? Faster lab results? Cheaper drugs? Better medicine?

English

Thoughts on Healthcare Markets and Tech@thoughtson_tech·31 May

The binding affinity prediction piece is real progress, but the Profluent-Lilly deal points to where the actual value accumulation is happening, which is upstream of prediction entirely. Prediction tells you how well a molecule fits a known pocket. Generation writes molecules that no pocket has ever seen, which is a different operation on a different search space, and that distinction is what a $2.25B platform commitment is actually pricing in. And the hardware angle here connects to something I've been turning over since writing about Profluent's closed-loop pipeline at onhealthcare.tech/p/profluents-2… , which is that inference economics at the wet-lab interface may matter more than inference economics on the model side. Tenstorrent running BoltzGen efficiently is a genuine unlock for prediction throughput, but the bottleneck in a generative design loop is not compute cost per forward pass. It is synthesis and testing cycle time. Profluent raised roughly $150M before closing that deal, and the compounding moat they are building comes from how fast the design-synthesize-test-retrain loop closes, not from how cheaply you can score a binding pose. But cheaper inference does shift the economics of who can run these pipelines at scale, which is its own structural question. If prediction becomes commodity infrastructure through hardware like this, the differentiation moves further toward the generative layer and the proprietary training data that comes from running closed-loop experiments at volume.

English

289

Moritz Thüning@moritzthuening·30 May

Everyone can now run BoltzGen, the state-of-the-art system for drug design created by the genius @HannesStaerk , on @tenstorrent hardware with incredible performance, and the same accuracy as GPUs. While Boltz-2 can predict how strong a potential drug binds to a target, BoltzGen generates potential drugs (binders) from scratch given a target. Think: Boltz-2 is the evaluator, BoltzGen is the generator. BoltzGen is integrated into TT-Boltz now and runs on Wormhole and Blackhole at any scale: single card, QuietBox, Galaxy servers, Galaxy clusters, anything. The video below shows BoltzGen designing binders, fully parallelized across the 4 cards of Tenstorrent QuietBox. Tenstorrent is the perfect fit for the @boltz_bio models and infinitely scalable. I want to run those models at unprecedented scales. A big step towards a drug designed on Tenstorrent hardware.

English

149

9.4K

Peanut Butter Runner@p_b_runner·31 May

@moritzthuening @HannesStaerk @tenstorrent I don’t understand. Custom drugs? Bespoke medicine? For… something very specific? I’m waiting for those healing pods from the Alien movie

English

123

Jelajahi

@Voxyz_ai @tonbistudio @BeatsPlanet @ornith_ @vildavedo @mattpocockuk @vibeonX69 @themishra4402