matthew.

22.9K posts

matthew.

@topmass

AI Automation Eng • @AnthropicAI Model Safety Tester | Founder https://t.co/bilxNRUiv8 | Building https://t.co/It1eJ3D45g • Full Stack Engineer • 🇨🇦

British Columbia Katılım Eylül 2011

853 Takip Edilen9.5K Takipçiler

matthew.@topmass·10h

@NielsRogge @openclaw get quantized 27b running its miles better

English

Niels Rogge@NielsRogge·14h

Wtf... Qwen3.5-35B-A3B took 3 minutes (!!) to answer my simple question, "What's on my calendar today?" via @openclaw I don't know what these local LLM fellas are running on, but a DGX Spark sure is not the best thing

English

124

26.5K

matthew.@topmass·23h

@Davidrejll @bryan_johnson yessir agreed 💯

English

David Rejl@Davidrejll·23h

@topmass @bryan_johnson So real. I don't see how anyone still consumes those billions of videos of pure non-sense on Instagram and Tiktok. I had to uninstall all of them and I'm only here now too. We want information, not brainrot

English

Bryan Johnson@bryan_johnson·1d

I create content for X, IG, YouTube, TikTok, and email. The video below will do 1-3 million views on IG, YouTube and TikTok but will perform horribly on X (maybe 20-50k views). The video also has a longer total life on the other platforms, whereas dead on X within 24 hours.

English

269

2.6K

297.1K

matthew.@topmass·23h

@Davidrejll @bryan_johnson yeah bad for longevity too BRIAN - i use none of those apps and my feed is mostly text / github based on X about tech stuff I'm interested in learning about. seems like a good thing for longevity since we know short form video content is bad for mental health

English

David Rejl@Davidrejll·23h

@topmass @bryan_johnson Exactly. Short for content is very close to lobotomy

English

matthew.@topmass·1d

@iflessthan3 @theo as a full time user I second both or these changes great suggestions

English

164

Luis@iflessthan3·1d

ok fine. t3 code is good @theo but 1. Let me archive threads like Codex app. I only care to keep in the main view the threads I'm currently working on. But don't want to outright delete old threads 2. Let me pin projects. Then I can have the best of both worlds from manual sorting and "last user message" sorting

English

5.2K

matthew.@topmass·1d

@okinalog thank you Japanese people for showing me - i love japan :) I will buy one!

English

翁（おきな）@okinalog·1d

Overseas people may not be familiar with it, but this is a mouse called "Trackball Mouse" that Japanese people love very much. Move the pointer by sliding your thumb without shaking your arm by a millimeter. The old generation of mice had a ball on the bottom, but I have the impression that it is a thumb version. The advantage is that it can reduce the burden on the wrist and save space. More than anything else, the smooth rotation of this ball is comfortable. If you have the opportunity, please give it a try. In addition, recently, even unknown devices beyond the scope of the mouse have appeared. "Nape Pro" You can place your favorite commands on all 6 keys, and you can also move the pointer. And it's super small. This may be an effective device for those who do not have a stereotype about the trackball mouse. Please be aware of it.

翁（おきな）@okinalog

ぁぁぁぁぁあああああああああああああああああああああああああああ

English

133

1.8K

238.2K

matthew.@topmass·1d

@thekitze just got a 3gb plan for roughly 70US it is insane never had this kind of speed before but the newest first world problem has been upgrading all my hardware network cards to even take full advantage if they do fibre to your area ideally you should be able to get 1gb up / 1gb down

English

kitze 🛠️ tinkerer.club@thekitze·1d

they are finally digging to set up optic fiber on my street after 3 yrs... i can finally get rid of starlink 😭 it's gonna cost 1/4 of the price and have gigabit download and ~250 upload... twitch.tv/thekitze might be back in full swing these is my current speed ($100/mo)

English

3.2K

matthew.@topmass·1d

@aykutkardas @Cloudflare hono was built by a cloudflare employee I think or ex maybe? it's made for workers and they are a lil match made in heaven - I have built many small projects with two workers, one react+vite and the backend a hono worker that handled everything - scales infinitely no problems

English

114

Aykut@aykutkardas·1d

@topmass @Cloudflare will check, thank you.

English

1.2K

Aykut@aykutkardas·1d

No Next.js. No React. No TypeScript. just a simple html and js file. deployed on @Cloudflare workers. feels… enough.

English

828

127.5K

matthew.@topmass·1d

@yoyonofukuoka I just wanna jump on to say I freaking love Japan and Japanese people pls japanese homies let be friends I will translate and reply always

English

213

kouji 🇯🇵@yoyonofukuoka·2d

あらゆる言語が自動翻訳に対応し、世界中の人々がシームレスにコミュニケーションを取れる様になったら、国対国という従来からある構図が崩れて、常識対非常識という構図になるだろうな。

日本語

2.9K

5.1K

62.3K

55.5M

matthew.@topmass·2d

@ibuildthecloud well when you realize that you're not limited anymore you would have no need to buy a macbook and thats a problem for timothy apple

English

Darren Shepherd@ibuildthecloud·3d

Wait I've never paid attention to iPads. Why is this not just macos? It's like Mac OS, but you can't run a terminal. If you could put Brew on this and launch VMs this would be the perfect device for me.

English

161

27.3K

matthew.@topmass·4d

i have solved the claude rate limits and which is better problem I had and it is just to absolutely dive in head first with pro plan for codex and max 200 for claude and put my head down and build good bye

English

235

matthew.@topmass·4d

@stevibe sweet thanks for passing the into along I'll keep in mind that low temps still being used a lot for deterministic stuff and experiment on the models I run as well 👍

English

stevibe@stevibe·4d

@topmass They do! Berkeley's BFCL and Databricks both use low temp for function calling evals. Reproducibility basically requires it (github.com/ShishirPatil/g…) But yeah for creative use 0.7+ totally makes sense, different goals

English

stevibe@stevibe·4d

Qwen3.5-27B went 15/15 on our tool-calling benchmark. But which quant should you actually run? Tested Unsloth's Q2_K_XL all the way to Q8_K_XL TL;DR: Q8 — 15/15 ✅ Q6 — 15/15 ✅ Q5 — 14/15 Q4 — 14/15 Q3 — 14/15 Q2 — 13/15 Q6 is the sweet spot. Same perfect score as Q8, smaller footprint. Also, the results scale almost linearly, seems like ToolCall-15 is actually measuring something real.

English

907

59.9K

matthew.@topmass·4d

@stevibe That makes sense, generally I find myself on closer to SOTA models never going below 0.7 lately since most use has some creative elements. I know most labs just recommend 1 as a default now. I'm curious if they also test their models on lower temp during benchmarks

English

stevibe@stevibe·4d

Good point! Temp 1 is definitely the default for general use, but for tool calling benchmarks specifically, temp 0 actually performs better. Databricks found accuracy can swing up to 10% between temp 0 and 0.7 on function-calling tasks, since it's structured output (pick the right tool, pass the right params), you want determinism, not creativity. Same reason you'd use low temp for JSON or code gen. Reference: databricks.com/blog/unpacking…

English

matthew.@topmass·4d

@theo as someone with boomer eyesight and a huge monitor I found out I could do control & +/= to zoom in and it scales great so I recommend adding buttons/a tooltip for this so people know they can easily scale the PWA to their liking

English

1.2K

Theo - t3.gg@theo·4d

Best part is how many of my own assumptions Julius has challenged correctly. I thought the built-in terminal wasn't important, now I use it constantly. I thought a "one click PR button" would be a slop farm. It's the only way I make PRs now. I thought the "web app" (npx t3) would be just as popular as the electron app. I haven't opened the web version since we launched. This is so fun.

English

237

25.6K

Theo - t3.gg@theo·4d

I'm supposed to be filming right now. Instead I'm on PR #3. T3 Code is the most addicting coding experience I've ever had. It makes me wish I had more time for code :(

Theo - t3.gg@theo

Every time I use T3 Code I'm genuinely blown away by how good it is. Each update keeps making it more addicting. Never underestimate @jullerino, he's cooking.

English

392

47.9K

matthew.@topmass·4d

@sinnformer @BradGroux @AnthropicAI @claudeai @DarioAmodei @FTC @DigitalMeld lol brother are you saying you think the open source completely free project of openclaw is a competitor to anthropic ??

English

·@sinnformer·4d

@BradGroux @AnthropicAI @claudeai @DarioAmodei @FTC @DigitalMeld ok buddy, you just have the flair for funsies

English

Brad Groux@BradGroux·5d

You make this announcement BEFORE you make the change. Good riddance... what a shame @AnthropicAI and @claudeai. Customer service isn't hard, @DarioAmodei. This was a rug pool and y'all got caught. I hope the @FTC fines you.

Thariq@trq212

To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.

English

1.1K

75.6K

matthew.@topmass·5d

@StirlingForge @stevibe nope this is only v2 for the opus-reasoning-distilled variant of the qwen 3.5 27b model - all the qwen models otherwise don't have versioning differences 👍

English

Stirling Forge (Unsupervised)@StirlingForge·5d

@topmass @stevibe Is there a v2 for 9B parameters?

English

stevibe@stevibe·6d

Which local models can actually handle tool calling? I built a framework to find out. 15 scenarios. 12 tools. Mocked responses. Temperature 0. No cherry-picking. Tested every Qwen3.5 size from 0.8B to 397B, and since some of you asked after the distillation tests: yes, I included Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled too. Only two models went all green: the 27B dense and the distilled 27B. The 397B? Failed two tests. The 122B? Failed one. The 35B? Failed two. The timed-out results — mostly on the smaller models, are cases where the model got stuck in a loop, repeating the same tool call until it hit the 30-second limit. The test that exposed the most models: "Search for Iceland's population, then calculate 2% of it." Simple, but 35B, 122B, and 397B all used a rounded number from memory instead of the actual search result. They didn't trust their own tool output. Small models hallucinate data. Big models ignore data. The 27B just threaded it through.

English

108

233

1.9K

381.3K

matthew.@topmass·5d

@zahlekhan @TFP_tweeets @jpschroeder I hadn't even heard of it - I'll check it out and see !

English

Zahle Khan@zahlekhan·5d

@topmass @TFP_tweeets @jpschroeder Have you checked out OpenUI. It faster and built based on a year of running GenUI in production. Disclaimer: I'm one of the maintainers.

English

Justin Schroeder@jpschroeder·23 Mar

We’re open sourcing ArrowJS 1.0: the first UI framework for coding agents. Imagine React/Vue, but with no compiler, build process, or JSX transformer. It’s just TS/JS so LLMs are already *great* at it. AND run generated code securely w/ sandbox pkg. ➡️ arrow-js.com