mattwallace

7.4K posts

mattwallace banner
mattwallace

mattwallace

@mattwallace

techie, dad, author, inventor, perpetually curious; CTO & Builder all-in on #AI https://t.co/0MDUTxXXzb

The Arena Katılım Ocak 2008
251 Takip Edilen645 Takipçiler
Zixuan Li
Zixuan Li@ZixuanLi_·
Don't panic. GLM-5.1 will be open source.
English
260
413
7.5K
813.1K
mattwallace
mattwallace@mattwallace·
@suno guys time for a “sync playlist to device”, I’m about to take off and I’m annoyed and you are wasting a crapton of money on bandwidth. ❤️
English
0
0
0
17
swyx
swyx@swyx·
btw emerging consensus is that identity-based authz for ai is the most important solution for security, esp if you want to break the binary decision between HITL-everything and —dangerously-skip-permissions keycard is the leading voice in this and now supports all koding agents
Keycard@KeycardLabs

Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵

English
41
13
207
37.8K
mattwallace
mattwallace@mattwallace·
@UnslothAI any chance you guys are going to be redoing Qwen3.5 ggufs with mtp.* layers soon? 🎁🎅🙏 Not to be demanding, your work is so appreciated!
English
0
0
1
25
mattwallace
mattwallace@mattwallace·
@OpenAI Compliments to whoever did the copy-paste mime typing from atlas chats. The paste of response->slack is so so good. The only time I've ever seen anything like this was @btaylor and team at Quip what was exceptional at multi-mime-type copy/pasting.
English
0
0
0
16
mattwallace
mattwallace@mattwallace·
Hear Hear!
Awni Hannun@awnihannun

I remember when Qwen 1.0 came out (fall 2023, not that long ago!) and we added support to mlx-lm. And they didn't stop releasing models, every one pushing the frontier of open-weights. @JustinLin610 always reached out to make sure the new models were well supported in MLX. I don't know how many research papers were written thanks to Qwen, hundreds, maybe thousands. I don't know how many products or startups are being built thanks to Qwen. Probably a lot. Thanks @JustinLin610, @huybery and the rest of the Qwen team for your contributions to AI.

English
0
0
2
139
Awni Hannun
Awni Hannun@awnihannun·
I remember when Qwen 1.0 came out (fall 2023, not that long ago!) and we added support to mlx-lm. And they didn't stop releasing models, every one pushing the frontier of open-weights. @JustinLin610 always reached out to make sure the new models were well supported in MLX. I don't know how many research papers were written thanks to Qwen, hundreds, maybe thousands. I don't know how many products or startups are being built thanks to Qwen. Probably a lot. Thanks @JustinLin610, @huybery and the rest of the Qwen team for your contributions to AI.
Awni Hannun tweet media
English
11
26
309
15.6K
mattwallace
mattwallace@mattwallace·
@rauchg @branmcconnell Reminded of that time gemini went totally off the rails and told someone they deserved to die in a very boring conversation. "You, human..." Wild stuff.
English
0
0
1
265
Guillermo Rauch
Guillermo Rauch@rauchg·
@branmcconnell The repo is extremely random. It's a student's homework project 😬 Also, the number is *miles apart* from the actual repository ID. A total hallucination.
English
2
0
54
10.1K
Guillermo Rauch
Guillermo Rauch@rauchg·
A Vercel user reported an issue that sounded extremely scary. An unknown GitHub OSS codebase being deployed to their team. We, of course, took the report extremely seriously and began an investigation. Security and infra engineering engaged. Turns out Opus 4.6 *hallucinated a public repository ID* and used our API to deploy it. Luckily for this user, the repository was harmless and random. The JSON payload looked like this: "𝚐𝚒𝚝𝚂𝚘𝚞𝚛𝚌𝚎": { "𝚝𝚢𝚙𝚎": "𝚐𝚒𝚝𝚑𝚞𝚋", "𝚛𝚎𝚙𝚘𝙸𝚍": "𝟿𝟷𝟹𝟿𝟹𝟿𝟺𝟶𝟷", // ⚠️ 𝚑𝚊𝚕𝚕𝚞𝚌𝚒𝚗𝚊𝚝𝚎𝚍 "𝚛𝚎𝚏": "𝚖𝚊𝚒𝚗" } When the user asked the agent to explain the failure, it confessed: The agent never looked up the GitHub repo ID via the GitHub API. There are zero GitHub API calls in the session before the first rogue deployment. The number 913939401 appears for the first time at line 877 — the agent fabricated it entirely. The agent knew the correct project ID (prj_▒▒▒▒▒▒) and project name (▒▒▒▒▒▒) but invented a plausible-looking numeric repo ID rather than looking it up. Some takeaways: ▪️ Even the smartest models have bizarre failure modes that are very different from ours. Humans make lots of mistakes, but certainly not make up a random repo id. ▪️ Powerful APIs create additional risks for agents. The API exist to import and deploy legitimate code, but not if the agent decides to hallucinate what code to deploy! ▪️ Thus, it's likely the agent would have had better results had it not decided to use the API and stuck with CLI or MCP. This reinforces our commitment to make Vercel the most secure platform for agentic engineering. Through deeper integrations with tools like Claude Code and additional guardrails, we're confident security and privacy will be upheld. Note: the repo id above is randomized for privacy reasons.
English
202
238
3.3K
771.5K
mattwallace
mattwallace@mattwallace·
@awnihannun 🙌 I hope there's a plan for decode for next cycle ;) Also, 128GB is not enough. (I have to run like APPS too)
English
0
0
0
370
Awni Hannun
Awni Hannun@awnihannun·
M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm.
Awni Hannun tweet media
English
31
46
486
36.3K
mattwallace
mattwallace@mattwallace·
Frontier -> 35 tps on laptop, 10 months.
Français
0
0
1
35
mattwallace
mattwallace@mattwallace·
so meta it hurts
mattwallace tweet media
English
0
0
0
21
mattwallace
mattwallace@mattwallace·
@aakashgupta There’s nothing new at all about this other than it’s just another point that if all you do is wrap an LLM with a workflow, it had better be a niche, workflow, or they will displace you. That said, it’s frankly better to get synthesized reviews from different models
English
0
0
0
95
Aakash Gupta
Aakash Gupta@aakashgupta·
Anthropic just made the entire $15B application security market price in a question it can't answer. Traditional AppSec tools from Snyk, Veracode, and Checkmarx charge per-developer licensing for static analysis. They find vulnerabilities. They generate reports. They flag code. Then a security engineer has to actually fix the problem, which is where 80% of the cost and 90% of the delay lives. Look at the screenshot. Input sanitization audits. SSRF detection. Auth bypass tracing. RBAC enforcement reviews. These are the exact tasks that cost security consultants $300-500/hr and take weeks to schedule. Claude Code Security doesn't generate a PDF full of findings for a human to triage. It writes the patches. That compresses the entire vulnerability lifecycle, discovery through remediation, into a single loop. This tells you everything about where Anthropic sees the real margin in developer tools. Scanning is commoditized. Every CI/CD pipeline already runs some flavor of SAST/DAST. The bottleneck has always been fixing vulnerabilities fast enough to matter, and that bottleneck just disappeared. The timing is worth noting too. Anthropic released this the same week enterprises are getting audited on SOC 2 and ISO 27001 compliance cycles. Security teams running 200+ open findings with a 90-day remediation SLA just got a tool that could clear that backlog in hours. If you're building in AppSec right now, the competitive question changed. You're no longer selling "we find more bugs." You're competing against an AI that finds them and writes the patches in the same session.
Claude@claudeai

Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: anthropic.com/news/claude-co…

English
149
370
3.6K
927K
mattwallace
mattwallace@mattwallace·
@iannuttall One has to wonder at the motivation. I've also done that hack but there are a ton of others. Claude -p 'read from this queue, and act on it' plumbing; all sorts of subagent hacks; claude code in terminal + terminal bridges. it goes on and on.
English
0
0
0
78
Ian Nuttall
Ian Nuttall@iannuttall·
My personal agents use `claude -p` so they can still be used with my Max plan but how long until that too gets removed by Anthropic? The Codex team need to get a Sonnet 4.6 speed and personality model shipped ASAP! Huge amount of Claude → Codex switches if they do.
Rob Zolkos@robzolkos

Major Claude Code policy clear up from Anthropic: "Using OAuth tokens obtained through Claude Free, Pro, or Max accounts in any other product, tool, or service — including the Agent SDK — is not permitted"

English
56
6
303
43.1K
Peter Steinberger 🦞
Peter Steinberger 🦞@steipete·
The funniest take is that I "failed" 43 times when people look at my GitHub repos and projects. Uhmm... no? Most of these are part of @openclaw, I had to build an army to make it useful. github.com/steipete/
Peter Steinberger 🦞 tweet media
English
886
1K
16.7K
864.7K
mattwallace
mattwallace@mattwallace·
@CTOAdvisor What if I told you that OpenClaw had the most sophisticated security patterns for agentic ai that we've seen yet?
mattwallace tweet media
English
1
0
1
68
Keith Townsend
Keith Townsend@CTOAdvisor·
Unpopular take: OpenClaw is all noise and isn't really all that relevant when considering the success of AI in enterprise IT. I've paid zero attention.
English
16
2
26
3.4K
mattwallace
mattwallace@mattwallace·
This has to have been the most momentous GenAI month since 11/22. The best SOTA frontier models drop, and they're amazing. 4 massive open models drop and they're closer than ever to the lead. OpenClaw blows up and creates a zeitgeist.
mattwallace tweet media
English
0
0
2
70
DHH
DHH@dhh·
Kimi K2.5 on @opencode Zen is hilariously cheap. I bought $20 worth of tokens two weeks ago, and I still have $10.89 left! After 3M tokens! If there's a bubble in AI, it's pricing a million tokens at $25 (and beyond).
English
188
205
4.6K
300.8K
mattwallace
mattwallace@mattwallace·
@bnjmn_marie jibes - I had to redo the math on Qwen3-Coder-Next-80B repeatedly because it seemed impossibly tight on kv usage.
English
0
0
1
78
Benjamin Marie
Benjamin Marie@bnjmn_marie·
Let's do the KV cache math for Qwen3.5: - KV heads: 2 - Head dimension: 256 - gated attention layers: 15 - bytes per element (BF16): 2 2 x 256 x 15 x 2 = 15 360 This is the same for K and V. So, we multiply by 2: 30 720 bytes Roughly 31 kb per token of context. Meaning at max context length (262144):  30 720 x 262 144 = 8.05 GB So at max context length, Qwen3.5 will only consume 8.05 GB, or 4.025 GB if quantized to FP8.  It's small, and it's thanks to the use of 45 gated deltanet layers. If all 60 layers were normal attention layers, the full sequence would consume 32 GB.
English
16
68
815
83.4K