mattwallace

7.4K posts

mattwallace

@mattwallace

techie, dad, author, inventor, perpetually curious; CTO & Builder all-in on #AI https://t.co/0MDUTxXXzb

The Arena Katılım Ocak 2008

251 Takip Edilen645 Takipçiler

Sabitlenmiş Tweet

mattwallace@mattwallace·29 May

cat docs.md | cllm translate to french > docs_french.md github.com/m9e/cllm/blob/… what's next?

English

2.8K

mattwallace@mattwallace·3d

@_loganlee @ZixuanLi_ Everyone

English

1.1K

Logan Lee@_loganlee·3d

@ZixuanLi_ Kimi K2.5 vs. GLM 5.1 Who wins?

English

23.7K

Zixuan Li@ZixuanLi_·3d

Don't panic. GLM-5.1 will be open source.

English

260

413

7.5K

813.1K

mattwallace@mattwallace·3d

@suno guys time for a “sync playlist to device”, I’m about to take off and I’m annoyed and you are wasting a crapton of money on bandwidth. ❤️

English

mattwallace@mattwallace·4d

@swyx @auth0 @WorkOS @Cloudflare @KamiwazaAI 🤣 So noted!

English

swyx@swyx·4d

@mattwallace @auth0 @WorkOS @Cloudflare @KamiwazaAI man that twitter account has an awful bio

English

swyx@swyx·4d

btw emerging consensus is that identity-based authz for ai is the most important solution for security, esp if you want to break the binary decision between HITL-everything and —dangerously-skip-permissions keycard is the leading voice in this and now supports all koding agents

Keycard@KeycardLabs

Your coding agents inherit your credentials and your permissions. No identity system in the stack can tell the difference between you and the agent acting in your name. Today: Keycard for Coding Agents 🧵

English

207

37.8K

mattwallace@mattwallace·4d

@swyx @auth0 @WorkOS @Cloudflare and @KamiwazaAI I might add - authnz across platform, model, tool, service -- but more importantly, graph resolution with every-request forwardauth checks.

English

swyx@swyx·4d

(they are not the ONLY ones of course - @aiDotEngineer/search?query=identity" target="_blank" rel="nofollow noopener">youtube.com/@aiDotEngineer… for more from @auth0 , @workos, @Cloudflare and more!)

English

3.5K

mattwallace@mattwallace·15 Mar

brb printing up "What the f did YOU build today?" t-shirts printed.

Demis Hassabis@demishassabis

Cool use case of AlphaFold, this is just the beginning of digital biology!

English

mattwallace@mattwallace·15 Mar

@UnslothAI any chance you guys are going to be redoing Qwen3.5 ggufs with mtp.* layers soon? 🎁🎅🙏 Not to be demanding, your work is so appreciated!

English

mattwallace@mattwallace·5 Mar

@OpenAI Compliments to whoever did the copy-paste mime typing from atlas chats. The paste of response->slack is so so good. The only time I've ever seen anything like this was @btaylor and team at Quip what was exceptional at multi-mime-type copy/pasting.

English

mattwallace@mattwallace·4 Mar

Hear Hear!

Awni Hannun@awnihannun

I remember when Qwen 1.0 came out (fall 2023, not that long ago!) and we added support to mlx-lm. And they didn't stop releasing models, every one pushing the frontier of open-weights. @JustinLin610 always reached out to make sure the new models were well supported in MLX. I don't know how many research papers were written thanks to Qwen, hundreds, maybe thousands. I don't know how many products or startups are being built thanks to Qwen. Probably a lot. Thanks @JustinLin610, @huybery and the rest of the Qwen team for your contributions to AI.

English

139

mattwallace@mattwallace·4 Mar

@awnihannun @JustinLin610 Hear hear

English

143

Awni Hannun@awnihannun·3 Mar

English

309

15.6K

mattwallace@mattwallace·4 Mar

@rauchg @branmcconnell Reminded of that time gemini went totally off the rails and told someone they deserved to die in a very boring conversation. "You, human..." Wild stuff.

English

265

Guillermo Rauch@rauchg·4 Mar

@branmcconnell The repo is extremely random. It's a student's homework project 😬 Also, the number is *miles apart* from the actual repository ID. A total hallucination.

English

10.1K

Guillermo Rauch@rauchg·3 Mar

A Vercel user reported an issue that sounded extremely scary. An unknown GitHub OSS codebase being deployed to their team. We, of course, took the report extremely seriously and began an investigation. Security and infra engineering engaged. Turns out Opus 4.6 *hallucinated a public repository ID* and used our API to deploy it. Luckily for this user, the repository was harmless and random. The JSON payload looked like this: "𝚐𝚒𝚝𝚂𝚘𝚞𝚛𝚌𝚎": { "𝚝𝚢𝚙𝚎": "𝚐𝚒𝚝𝚑𝚞𝚋", "𝚛𝚎𝚙𝚘𝙸𝚍": "𝟿𝟷𝟹𝟿𝟹𝟿𝟺𝟶𝟷", // ⚠️ 𝚑𝚊𝚕𝚕𝚞𝚌𝚒𝚗𝚊𝚝𝚎𝚍 "𝚛𝚎𝚏": "𝚖𝚊𝚒𝚗" } When the user asked the agent to explain the failure, it confessed: The agent never looked up the GitHub repo ID via the GitHub API. There are zero GitHub API calls in the session before the first rogue deployment. The number 913939401 appears for the first time at line 877 — the agent fabricated it entirely. The agent knew the correct project ID (prj_▒▒▒▒▒▒) and project name (▒▒▒▒▒▒) but invented a plausible-looking numeric repo ID rather than looking it up. Some takeaways: ▪️ Even the smartest models have bizarre failure modes that are very different from ours. Humans make lots of mistakes, but certainly not make up a random repo id. ▪️ Powerful APIs create additional risks for agents. The API exist to import and deploy legitimate code, but not if the agent decides to hallucinate what code to deploy! ▪️ Thus, it's likely the agent would have had better results had it not decided to use the API and stuck with CLI or MCP. This reinforces our commitment to make Vercel the most secure platform for agentic engineering. Through deeper integrations with tools like Claude Code and additional guardrails, we're confident security and privacy will be upheld. Note: the repo id above is randomized for privacy reasons.

English

202

238

3.3K

771.5K

mattwallace@mattwallace·3 Mar

@awnihannun 🙌 I hope there's a plan for decode for next cycle ;) Also, 128GB is not enough. (I have to run like APPS too)

English

370

Awni Hannun@awnihannun·3 Mar

M5 Max is a local AI powerhouse in a laptop form factor. So awesome to see this thing released. Up to 8x faster prefill / image generation compared to M1 Max. Benchmarks done with MLX / mlx-lm.

English

486

36.3K

mattwallace@mattwallace·3 Mar

Frontier -> 35 tps on laptop, 10 months.

Français

mattwallace@mattwallace·27 Şub

so meta it hurts

English

mattwallace@mattwallace·21 Şub

@aakashgupta There’s nothing new at all about this other than it’s just another point that if all you do is wrap an LLM with a workflow, it had better be a niche, workflow, or they will displace you. That said, it’s frankly better to get synthesized reviews from different models

English

Aakash Gupta@aakashgupta·20 Şub

Anthropic just made the entire $15B application security market price in a question it can't answer. Traditional AppSec tools from Snyk, Veracode, and Checkmarx charge per-developer licensing for static analysis. They find vulnerabilities. They generate reports. They flag code. Then a security engineer has to actually fix the problem, which is where 80% of the cost and 90% of the delay lives. Look at the screenshot. Input sanitization audits. SSRF detection. Auth bypass tracing. RBAC enforcement reviews. These are the exact tasks that cost security consultants $300-500/hr and take weeks to schedule. Claude Code Security doesn't generate a PDF full of findings for a human to triage. It writes the patches. That compresses the entire vulnerability lifecycle, discovery through remediation, into a single loop. This tells you everything about where Anthropic sees the real margin in developer tools. Scanning is commoditized. Every CI/CD pipeline already runs some flavor of SAST/DAST. The bottleneck has always been fixing vulnerabilities fast enough to matter, and that bottleneck just disappeared. The timing is worth noting too. Anthropic released this the same week enterprises are getting audited on SOC 2 and ISO 27001 compliance cycles. Security teams running 200+ open findings with a 90-day remediation SLA just got a tool that could clear that backlog in hours. If you're building in AppSec right now, the competitive question changed. You're no longer selling "we find more bugs." You're competing against an AI that finds them and writes the patches in the same session.

Claude@claudeai

Introducing Claude Code Security, now in limited research preview. It scans codebases for vulnerabilities and suggests targeted software patches for human review, allowing teams to find and fix issues that traditional tools often miss. Learn more: anthropic.com/news/claude-co…

English

149

370

3.6K

927K

mattwallace@mattwallace·18 Şub

@iannuttall One has to wonder at the motivation. I've also done that hack but there are a ton of others. Claude -p 'read from this queue, and act on it' plumbing; all sorts of subagent hacks; claude code in terminal + terminal bridges. it goes on and on.

English

Ian Nuttall@iannuttall·18 Şub

My personal agents use `claude -p` so they can still be used with my Max plan but how long until that too gets removed by Anthropic? The Codex team need to get a Sonnet 4.6 speed and personality model shipped ASAP! Huge amount of Claude → Codex switches if they do.

Rob Zolkos@robzolkos

Major Claude Code policy clear up from Anthropic: "Using OAuth tokens obtained through Claude Free, Pro, or Max accounts in any other product, tool, or service — including the Agent SDK — is not permitted"

English

303

43.1K

mattwallace@mattwallace·18 Şub

@steipete @openclaw critically

English

Peter Steinberger 🦞@steipete·18 Şub

The funniest take is that I "failed" 43 times when people look at my GitHub repos and projects. Uhmm... no? Most of these are part of @openclaw, I had to build an army to make it useful. github.com/steipete/

English

886

16.7K

864.7K

mattwallace@mattwallace·18 Şub

@CTOAdvisor What if I told you that OpenClaw had the most sophisticated security patterns for agentic ai that we've seen yet?

English

Keith Townsend@CTOAdvisor·17 Şub

Unpopular take: OpenClaw is all noise and isn't really all that relevant when considering the success of AI in enterprise IT. I've paid zero attention.

English

3.4K

mattwallace@mattwallace·18 Şub

This has to have been the most momentous GenAI month since 11/22. The best SOTA frontier models drop, and they're amazing. 4 massive open models drop and they're closer than ever to the lead. OpenClaw blows up and creates a zeitgeist.

English

mattwallace@mattwallace·18 Şub

@dhh @opencode There's plenty of margin at that price.

English

266

DHH@dhh·18 Şub

Kimi K2.5 on @opencode Zen is hilariously cheap. I bought $20 worth of tokens two weeks ago, and I still have $10.89 left! After 3M tokens! If there's a bubble in AI, it's pricing a million tokens at $25 (and beyond).

English

188

205

4.6K

300.8K

mattwallace@mattwallace·17 Şub

@bnjmn_marie jibes - I had to redo the math on Qwen3-Coder-Next-80B repeatedly because it seemed impossibly tight on kv usage.

English

Benjamin Marie@bnjmn_marie·16 Şub

Let's do the KV cache math for Qwen3.5: - KV heads: 2 - Head dimension: 256 - gated attention layers: 15 - bytes per element (BF16): 2 2 x 256 x 15 x 2 = 15 360 This is the same for K and V. So, we multiply by 2: 30 720 bytes Roughly 31 kb per token of context. Meaning at max context length (262144): 30 720 x 262 144 = 8.05 GB So at max context length, Qwen3.5 will only consume 8.05 GB, or 4.025 GB if quantized to FP8. It's small, and it's thanks to the use of 45 gated deltanet layers. If all 60 layers were normal attention layers, the full sequence would consume 32 GB.

English

815

83.4K

Keşfet

@_loganlee @ZixuanLi_ @suno @swyx @auth0 @WorkOS @Cloudflare @KamiwazaAI