Peanut Butter Runner

98 posts

Peanut Butter Runner banner
Peanut Butter Runner

Peanut Butter Runner

@p_b_runner

I love running and peanut butter

Bergabung Ocak 2022
21 Mengikuti0 Pengikut
Vox
Vox@Voxyz_ai·
stop telling Claude Code/Codex "read this file". stop telling Claude Code/Codex "now read that one too". stop telling Claude Code/Codex "grep the whole repo". install codebase-memory. it indexes the Linux kernel, 28M lines, in 𝟯 𝗺𝗶𝗻𝘂𝘁𝗲𝘀. your repo takes seconds. index once and the whole repo becomes 𝗼𝗻𝗲 𝗴𝗿𝗮𝗽𝗵 of every function, file and dependency. one query replaces dozens of grep and read cycles. benchmarked across 31 real repos: → 10x fewer tokens on structural queries → 83% answer quality on complex tasks → 2.1x fewer tool calls two prompts. send them straight to your agent 👇
Vox tweet media
English
127
356
4K
402.3K
tonbi
tonbi@tonbistudio·
Nous Research just dropped MOA (Mixture of Agents) presets inside Hermes Agent. I made a quick video showing how to set it up and create your own MOA. The idea: mix multiple models to get capabilities beyond any single model you can use right now. How it works: Normally Hermes sends your conversation + tools to one model. With MOA you get several reference models plus one aggregator. The references read the conversation and offer thoughts and suggestions, but they get no tool access and never reply to you directly. The aggregator is the one that actually acts. It sees the normal conversation plus the private advice from the references, then makes the tool calls and writes the final response. From Hermes's side, the aggregator's output IS the model's response, so you can use /goal or anything else like that. Cool idea, curious to see how it really performs!
Nous Research@NousResearch

The strongest models are gated and access is granted only to a select few. Hermes Agent now exposes MoA presets as virtual models, giving you capabilities beyond the publicly available frontier: 8% higher than Opus 4.8 and 11% higher than GPT 5.5 on our upcoming benchmark.

English
50
135
1.4K
101.6K
DJ Naydee
DJ Naydee@BeatsPlanet·
@ornith_ The only bench that matters was left out, deepswe. I wonder why
English
1
0
1
76
Ornith
Ornith@ornith_·
Aloha! 🌺 Meet Ornith-1.0, a family of open-source LLMs specialized for agentic coding. Ornith-1.0 spans the full parameter sizes including 9B Dense, 31B Dense, 35B MoE, and 397B MoE. It achieves state-of-the-art performance among open-source models of comparable size on coding benchmarks including: ✅Terminal-Bench 2.1(77.5) ✅SWE-Bench(82.4 on verified, 62.2 on pro, 78.9 on Multilingual) ✅NL2Repo(48.2) ✅SWE Atlas(41.2 on QnA, 42.6 RF, 39.1 TW) ✅ClawEval(77.1) Post-trained on top of gemma4 and qwen3.5, Ornith-1.0 employs a novel self-improving training strategy in which reinforcement learning is used to generate not only solution rollouts, but also the task-specific scaffolds that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model generate higher-quality solutions in agentic coding.😎 All models are released under the MIT license, enabling full commercial and research use. 📖Tech Blog: deep-reinforce.com/ornith_1_0.html 🤗Huggingface: huggingface.co/collections/de…
Ornith tweet media
English
468
977
6.5K
5.1M
Peanut Butter Runner
Peanut Butter Runner@p_b_runner·
@vildavedo @ornith_ No way that kind of performance on local consumer hardware?? That’s nuts!!! But what tps? And what’s your context window? Does the performance degrade as context fills? And you can only use use to run a single agent at a time right?
English
1
0
1
28
Vilda
Vilda@vildavedo·
@ornith_ I am running Ornith-1.0-35B-GGUF:BF16 on an RX 6900 XT + Threadripper 3970X with 128 GB RAM. It's very impressive. From my testing in Zed editor over Ollama-Vulkan, it's between GPT-5.3 xHigh and 5.4 High. I'm ready to donating to dev process, no limits, local data, that's it.
English
4
1
51
7K
Matt Pocock
Matt Pocock@mattpocockuk·
2023: Tutorial Hell 2026: Skill Hell
Svenska
57
58
940
58.8K
Rahul 🥷
Rahul 🥷@themishra4402·
If MacBooks are so powerful... why do most developers still use Windows?
Rahul 🥷 tweet mediaRahul 🥷 tweet media
English
146
9
410
62.3K
Adam Rankin
Adam Rankin@rankintweets·
Lead engineer just got rid of his keyboard…???
Adam Rankin tweet media
English
601
386
13.5K
1.5M
Ayaan 🐧
Ayaan 🐧@twtayaan·
Apple just made Docker Desktop optional on Mac. And it is completely free. This is apple/container. 26.5k stars no Github. You can now run Linux containers natively on your Mac without installing Docker Desktop, without a background daemon hogging your RAM, and without paying $21 a month per developer for a commercial license. Here is what it does: → Runs Linux containers as lightweight VMs directly on Apple Silicon using macOS 26 virtualization → Fully OCI compatible. Pull any image from Docker Hub, GitHub Container Registry or anywhere else → Written in Swift and optimised specifically for Apple Silicon. Faster and lighter than anything Docker Desktop does on Mac → Standard container CLI syntax. If you know Docker commands you already know how to use this → Push images you build to any standard container registry and run them anywhere Docker Desktop charges $21 per developer per month for commercial use. Apple's version costs nothing and ships as open source under Apache-2.0. Microsoft made Docker Desktop optional on Windows with WSL Containers last month. Apple just did the same on Mac. Docker is not going anywhere. But the era of paying for a GUI wrapper around containers on your own machine is quietly ending. Repo here: github.com/apple/container
Ayaan 🐧 tweet media
English
201
1K
10.3K
932.6K
Peanut Butter Runner
Peanut Butter Runner@p_b_runner·
@0xtreysync @thdxr Yep so many times it just does stuff wrong and then tries another method instead of deleting the broken code and replacing it with the correct code
English
0
0
9
1.1K
treysync
treysync@0xtreysync·
@thdxr “Fallback” is also a great slop indicator
English
7
0
216
29.9K
dax
dax@thdxr·
i found a really good way to measure how much a codebase is suffering from vibe coding rg -o 'isRecord' . | wc -l
English
91
46
2.8K
422.1K
Roenel
Roenel@roenelteck·
@DaveShapi I’m betting on Gemini. I don’t think google has even released their frontier model yet. They have just given us a preview.
English
1
0
1
24
David Shapiro (L/0)
David Shapiro (L/0)@DaveShapi·
Okay, Gemini is by far the kindest and most patient and emotionally intelligent model. It beats grok, which is woke by comparison.
English
66
13
284
17.6K
jietang
jietang@jietang·
We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a solid 1M-token context. GLM-5.2's new capabilities include: Solid 1M Context: A solid 1M-token context that stably sustains long-horizon work Advanced Coding with Flexible Effort: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency Improved Architecture: We propose IndexShare, which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20% Pure Open: An MIT open-source license — no regional limits, technical access without borders Supporting long-horizon tasks starts with making long context engineering-usable: the model must maintain quality across long, messy coding-agent trajectories, not just accept more tokens. A 1M context is easy to claim, but much harder to keep reliable under real engineering pressure. To this end, we substantially expanded 1M-context training for coding-agent scenarios, covering large-scale implementation, automated research, performance optimization, and complex debugging. The result is a long-context system that is not only wide in scope, but solid in execution: a practical substrate for sustained engineering work. This capability is reflected in GLM-5.2's performance on three long-horizon coding benchmarks. FrontierSWE measures whether an agent can complete open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. On this benchmark, GLM-5.2 trails Opus 4.8 by only 1%, while edging out GPT-5.5 by 1% and Opus 4.7 by 11%. On PostTrainBench, where each agent is given an H100 GPU and evaluated by how much it can improve small models through post-training, GLM-5.2 outperforms both Opus 4.7 and GPT-5.5, ranking second only to Opus 4.8. On SWE-Marathon, an ultra-long-horizon software engineering benchmark covering tasks such as building compilers, optimizing kernels, and developing production-grade services, GLM-5.2 still has room to grow, trailing Opus 4.8 by 13% while remaining second only to the Opus series. Across all three benchmarks, GLM-5.2 is the highest-ranked open-source model, showing that its 1M context has translated into practical long-horizon delivery capability.
jietang tweet mediajietang tweet media
English
181
302
3.7K
360.6K
Jason Gilbertson
Jason Gilbertson@jgilbertson47·
@nutlope Any chance you've tested Kimi 2.7 as well? I'm debating that vs. GLM 5.2 as my backup UI buddy when Claude credits run out.
English
2
0
4
9.4K
Hassan
Hassan@nutlope·
This model is insane at design. I asked GLM 5.2 (left) and Opus 4.8 (right) to build me a landing page and you can't even tell the difference. GLM cost $0.06 while opus cost $0.49. More than 6x cheaper while being faster + more token efficient. Another win for open source AI.
Z.ai@Zai_org

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: z.ai/blog/glm-5.2 Weights: huggingface.co/zai-org/GLM-5.2 API: docs.z.ai/guides/llm/glm… Coding Plan: z.ai/subscribe Chat: chat.z.ai

English
352
564
8.3K
1.6M
Vishal
Vishal@heyvishal_·
@nutlope They both look quite similar tbh. I think kimi 2.7 might be better than these 2
English
3
0
17
10.9K
Peanut Butter Runner
Peanut Butter Runner@p_b_runner·
@ggerganov I'm thinking about putting together an audio processing pipeline to handle something like a podcast, along with Diarization. People will be doing multiple voices and impersonations, which I heard makes things more complicated. Any recommendations?
English
0
0
0
4
Peanut Butter Runner
Peanut Butter Runner@p_b_runner·
@notjazii Did your prompt include testing or did it figure that out itself? I doubt the agent could get it right on the first try without testing of some kind.
English
0
0
0
39
J A Z I I
J A Z I I@notjazii·
GLM 5.2 vs Kimi K2.7 both were tested on same settings with same prompt > GLM one shotted everything in 35 mins > Kimi needed extra prompts to fix movements and took 30 mins super surpised by how good GLM 5.2 is, while being so cheap which one do you think won?
J A Z I I@notjazii

Chinese AI labs are shipping like crazy > Kimi K2.7 was released yesterday > GLM 5.2 was released a few hours ago and both are actually good open models are catching up way faster than most people expected already tested K2.7 waiting for GLM to be available in opencode so I can run the same tests drop a coding task you want me to test on both 👇

English
54
25
460
63.7K
Peanut Butter Runner
Peanut Butter Runner@p_b_runner·
Has anyone tried DeepSeek V4 with Reasonix AND Headroom? I'm wondering if the caching optimizations from Headroom conflict with the ones from Reasonix, and it'd be better to run Reasonix without Headroom.
English
0
0
0
20
filipe
filipe@filicroval·
DeepSeek just got its own native coding agent, and it’s really clean it's called Reasonix: a terminal (TUI) + desktop AI coding agent built exclusively around DeepSeek’s prefix cache. result? 94%+ cache hit rates in long sessions → you can literally leave it running for hours and the bill is basically zero. npx reasonix code → sandboxed edits, MCP tools, skills, replay logs, pending-approval workflow. so clean
English
3
4
11
1.7K
Thoughts on Healthcare Markets and Tech
The binding affinity prediction piece is real progress, but the Profluent-Lilly deal points to where the actual value accumulation is happening, which is upstream of prediction entirely. Prediction tells you how well a molecule fits a known pocket. Generation writes molecules that no pocket has ever seen, which is a different operation on a different search space, and that distinction is what a $2.25B platform commitment is actually pricing in. And the hardware angle here connects to something I've been turning over since writing about Profluent's closed-loop pipeline at onhealthcare.tech/p/profluents-2… , which is that inference economics at the wet-lab interface may matter more than inference economics on the model side. Tenstorrent running BoltzGen efficiently is a genuine unlock for prediction throughput, but the bottleneck in a generative design loop is not compute cost per forward pass. It is synthesis and testing cycle time. Profluent raised roughly $150M before closing that deal, and the compounding moat they are building comes from how fast the design-synthesize-test-retrain loop closes, not from how cheaply you can score a binding pose. But cheaper inference does shift the economics of who can run these pipelines at scale, which is its own structural question. If prediction becomes commodity infrastructure through hardware like this, the differentiation moves further toward the generative layer and the proprietary training data that comes from running closed-loop experiments at volume.
English
1
0
0
289
Moritz Thüning
Moritz Thüning@moritzthuening·
Everyone can now run BoltzGen, the state-of-the-art system for drug design created by the genius @HannesStaerk , on @tenstorrent hardware with incredible performance, and the same accuracy as GPUs. While Boltz-2 can predict how strong a potential drug binds to a target, BoltzGen generates potential drugs (binders) from scratch given a target. Think: Boltz-2 is the evaluator, BoltzGen is the generator. BoltzGen is integrated into TT-Boltz now and runs on Wormhole and Blackhole at any scale: single card, QuietBox, Galaxy servers, Galaxy clusters, anything. The video below shows BoltzGen designing binders, fully parallelized across the 4 cards of Tenstorrent QuietBox. Tenstorrent is the perfect fit for the @boltz_bio models and infinitely scalable. I want to run those models at unprecedented scales. A big step towards a drug designed on Tenstorrent hardware.
English
5
24
149
9.4K