Shawn Thuris

2.8K posts

Shawn Thuris banner
Shawn Thuris

Shawn Thuris

@Thuris

IT and web consultant @thurisandco. Podcast: https://t.co/iy8vGv0zTn. Data analytics MBA. Sometime recitalist and opera tenor. ~hodrun-solmud on the urbs

East Bay Katılım Temmuz 2008
185 Takip Edilen288 Takipçiler
rahul
rahul@0interestrates·
it's easy to approve zuck's diff, but do you have the courage to request changes on zuck's diff?
rahul tweet media
English
46
62
3K
546.5K
Shawn Thuris
Shawn Thuris@Thuris·
Felt that earthquake very clearly in Hayward, slow shaking for about 5 seconds
English
1
0
1
805
Shawn Thuris
Shawn Thuris@Thuris·
Today's the 20th anniversary of OMG PONIES. Twen-tieth...
Shawn Thuris tweet media
English
0
0
0
26
Shawn Thuris
Shawn Thuris@Thuris·
@pierceboggan @JoeMayo I do rounds of clarifying questions then set GPT 5.4 xhigh or Opus 4.6 high + fast loose on it with Autopilot, then go back and clean up as needed.
English
0
0
3
98
Joe Mayo
Joe Mayo@JoeMayo·
OH: Copilot is a copilot, not an autopilot
English
2
1
9
1.6K
sucks
sucks@powerbottomdad1·
been on reta 4 days: they are going to sell 10 trillion dollars of this thing
English
101
32
2.7K
1.3M
Thetic
Thetic@TheticThrone·
@wokal_distance That’s not a girl. Its a dude face-swapping. Still a cool person visiting places on a motorcycle
English
3
0
141
91.5K
Shawn Thuris
Shawn Thuris@Thuris·
@BHolmesDev This reflects my experience pretty much exactly. I hate talking to GPT 5.4, but I love watching it grind through something until it actually works. And I like talking to Opus 4.6, and I dislike having to follow it around and make sure it did everything and did it right.
English
0
0
0
160
Ben Holmes
Ben Holmes@BHolmesDev·
I’ve used Opus 4.6 and GPT 5.4 on a mix of projects since release, and want to break down where I think they uniquely excel. It’s more nuanced than you’d think! Rigor of code - GPT 5.4. It goes the distance validating its work without asking. Opus needs explicit instruction to do this, and even then, it misses more edge cases. Clarity of code - Opus 4.6. Claude is a better communicator, which carries into the code. Variable names are clearer and less mechanical, which improves reviewability. This is very important since code review is the bottleneck for most engineering teams. It also adds the right amount of doc comments. GPT simply never comments or explains its work; it’s like working with an obtuse engineer that wants the solution to speak for itself. Sometimes it does, other times not. Similarly, rigor of plans goes to GPT 5.4, while clarity of plans goes to Opus 4.6. An interesting point though: GPT performs better talking through a strategy without a plan, while Opus needs planning mode to put in any rigor. I find myself forgetting plan mode altogether using GPT 5.4. Quality of research - toss-up. Opus spends longer researching with web search, but GPT spends longer studying the existing codebase. You may think codebase research matters more, but researching how others solve the same problem can be just as important. Maybe more important for greenfield. Quality of conversation - Opus 4.6. It’s just better to talk to, which matters using these things everyday. GPT 5.4 was clearly trained to challenge the user more, which results in a tendency to *always* say you are wrong. I’ve had bizarre interactions where GPT claims something is “not quite right,” the restates exactly what we’ve decided on in the last turn. On a personal level, it’s annoying. On a practical level, it makes iteration on a plan slower. THAT SAID, it takes sufficient pushing for Opus to challenge your thinking in this way. Simply say “I’m impartial” and ask questions to avoid that, as you would a person. Overall winner - Opus to make it work, GPT to make it good. I don’t have a good system of when to switch tools, but on average, I prefer Opus early on and GPT for optimization and discussing architectural decisions. Opus is also better for any design related tasks (but state management in frontend apps is better handled by GPT).
English
140
92
1.5K
201.8K
Shawn Thuris
Shawn Thuris@Thuris·
I work as a grocery store night manager because solo dev/IT work by itself was too unpredictable. On my lunch I get out my laptop. When I get home at 1am I'm up until 3 or 4 doing agentic coding. I earned an MBA a while ago on the foolish assumption it would get just a toe in the door anywhere. I earn enough to survive but I could be adding a lot more value to the world than I am.
English
0
0
0
28
Kiri
Kiri@Kyrannio·
The hiring process of old seems hilariously broken. I have so many incredible and talented friends looking for work, some who are even working corporate jobs currently and seeking to go even more all in on AI. If you're seeking to hire someone or else job searching, maybe comment below, or if we can all brainstorm some ideas for improvement, that would be great. For those outside of our X bubble especially it seems very rough when it comes to the basic application and interview process as a whole.
English
22
7
52
2.1K
Shawn Thuris
Shawn Thuris@Thuris·
@liuqian16 Same thing happening to me right now in Copilot CLI and in VS Code
English
1
0
0
28
小安
小安@liuqian16·
天塌了!!!Copilot 罢工了!!!
小安 tweet media
中文
1
0
0
35
Shawn Thuris
Shawn Thuris@Thuris·
I hit Copilot rate limiting tonight for the first time. I'd been using Opus 4.6 high in VS Code, probably 8 or 9 turns during an hour, nothing that big. Wouldn't even let me switch to something lighter. I used Gemini 3.1 Pro direct from Google to help me finish up. If I'd been in Copilot CLI trying to troubleshoot a server and this happened though...?
English
0
0
0
67
Shawn Thuris retweetledi
DEJAN
DEJAN@dejanseo·
Implemented Google's TurboQuant paper on Gemma 3 4B with a custom Triton kernel for fused quantized attention. It's real. Results on RTX 4090: 2-bit FUSED: character-for-character identical to fp16 baseline. On every prompt. At 16x theoretical compression. The Triton kernel reads uint8 key indices directly — never materializes fp16 keys. Pre-rotate query once (R is orthogonal so ⟨q, Rᵀ·centroids[idx]⟩ = ⟨R·q, centroids[idx]⟩), then per-position work is just a table lookup + dot. Speed (avg tok/s across 3 prompts): → fp16 baseline: 17.7 → 4-bit fused: 16.5 (-7%) → 2-bit fused: 17.7 (0% — matches baseline) VRAM (KV cache delta): → fp16: 26 MB → 4-bit fused: 4 MB → 2-bit fused: 7 MB The paper's theoretical guarantees hold up completely in practice. Zero accuracy loss, zero speed loss, fraction of the memory. Paper: arxiv.org/abs/2504.19874
Google Research@GoogleResearch

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI

English
31
82
1.1K
130.9K
Shawn Thuris retweetledi
Mitko Vasilev
Mitko Vasilev@iotcoi·
I just implemented Google’s TurboQuant for vLLM. My USB-charger-sized HP ZGX now fits 4,083,072 KV-cache tokens on GB10. This may be the biggest open inference breakthrough of 2026 so far. Training is the flex. Inference is the forever bill.
Mitko Vasilev tweet media
English
69
237
3K
206.9K
Shawn Thuris retweetledi
Wes Bos
Wes Bos@wesbos·
if a CEO of a company is posting an absolute statement about ai and the future, they are ramping up to launch a feature that does exactly that next week
English
26
17
367
25.6K
Shawn Thuris retweetledi
Google Research
Google Research@GoogleResearch·
Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: goo.gle/4bsq2qI
GIF
English
1K
5.8K
39K
19.2M
Shawn Thuris
Shawn Thuris@Thuris·
@witcheer I've got my Copilot subscription connected and using it for all code-related stuff (any personal stuff I still do through OpenRouter).
English
0
0
1
142
witcheer ☯︎
witcheer ☯︎@witcheer·
Hermes agent v0.4.0. I run this thing 24/7. here's what just changed under my feet. /1/ you can now expose hermes as an OpenAI-compatible API endpoint. /v1/chat/completions. your agent becomes a model. anything that can call an OpenAI API can now talk to your hermes instance like it's a hosted LLM, except it has tools, memory, skills, and cron jobs behind it. there's also a /api/jobs REST endpoint for managing cron jobs programmatically. I have 15 crons. being able to create and modify them through an API instead of through chat changes my automation surface completely. /2/ six new messaging adapters in one release. Signal, DingTalk, SMS via Twilio, Mattermost, Matrix, and a generic webhook adapter. that's on top of the Telegram, Discord, Slack, and WhatsApp that already existed. ten platforms total now. /3/ @file and @url context injection with tab completion. type @ and start typing a filename, tab-complete it, and the file's contents get injected into your message. same for URLs. Claude Code has this. now hermes does too. /4/ context compression got rebuilt from scratch. structured summaries with iterative updates instead of the "summarise everything and throw it away" approach from before. there's token-budget tail protection so the most recent turns survive compression. /5/ four new providers: GitHub Copilot (full OAuth), Alibaba Cloud / DashScope, Kilo Code, and OpenCode Zen/Go. I'm on Z.AI/GLM-5 and this doesn't change my setup directly. but Copilot at 400k context is interesting for anyone with a GitHub subscription looking for a cost-effective agent brain. /6/ /queue lets you stack prompts while the agent is still working. instead of waiting for it to finish, you type your next instruction and it gets queued. in my workflow I'll read a cron output, want to follow up on three things, and used to have to wait between each one. still feels early. still finding edges. but the foundation is getting solid fast.
witcheer ☯︎ tweet media
Teknium (e/λ)@Teknium

Hermes Agent v0.4.0 — 300 merged PRs this week. Biggest release we've done. Background self-improvement, OpenAI Responses API endpoint for your agent, new messaging platforms, new providers, MCP server management, and a lot more.

English
13
9
181
13.9K
Shawn Thuris
Shawn Thuris@Thuris·
Sonnet has better things to do than review a stupid bash script (5.4 dutifully came in and waded through it)
Shawn Thuris tweet media
English
0
0
0
19
Shawn Thuris
Shawn Thuris@Thuris·
@Teknium @danielrmay In Telegram this would be annoying... Could the same thing be accomplished by limiting /model to sessions with no history, ie after a /new?
English
1
0
1
57
Teknium (e/λ)
Teknium (e/λ)@Teknium·
Hermes was built originally around openrouter and originally only accepted openrouter. Seems like a vestigial bug but will be addressed asap /modal mid convo has historically been buggy and may be removed so people use the proper `hermes model` command if we can't get this thing right
English
3
0
29
1.6K
Daniel May
Daniel May@danielrmay·
i was excited to use hermes until i ran into an unfortunate bug where it silently ships data to openrouter instead of your chosen local model github.com/NousResearch/h… ??? watching very closely to see how quickly this critical is resolved
English
3
0
11
1.6K
Kiri
Kiri@Kyrannio·
Why do I find this so genuinely hilarious
Kiri tweet media
English
4
0
9
588