matthew.

22.9K posts

matthew. banner
matthew.

matthew.

@topmass

AI Automation Eng • @AnthropicAI Model Safety Tester | Founder https://t.co/bilxNRUiv8 | Building https://t.co/It1eJ3D45g • Full Stack Engineer • 🇨🇦

British Columbia Katılım Eylül 2011
853 Takip Edilen9.5K Takipçiler
Niels Rogge
Niels Rogge@NielsRogge·
Wtf... Qwen3.5-35B-A3B took 3 minutes (!!) to answer my simple question, "What's on my calendar today?" via @openclaw I don't know what these local LLM fellas are running on, but a DGX Spark sure is not the best thing
English
64
3
124
26.5K
David Rejl
David Rejl@Davidrejll·
@topmass @bryan_johnson So real. I don't see how anyone still consumes those billions of videos of pure non-sense on Instagram and Tiktok. I had to uninstall all of them and I'm only here now too. We want information, not brainrot
English
1
0
1
21
Bryan Johnson
Bryan Johnson@bryan_johnson·
I create content for X, IG, YouTube, TikTok, and email. The video below will do 1-3 million views on IG, YouTube and TikTok but will perform horribly on X (maybe 20-50k views). The video also has a longer total life on the other platforms, whereas dead on X within 24 hours.
English
269
65
2.6K
297.1K
matthew.
matthew.@topmass·
@Davidrejll @bryan_johnson yeah bad for longevity too BRIAN - i use none of those apps and my feed is mostly text / github based on X about tech stuff I'm interested in learning about. seems like a good thing for longevity since we know short form video content is bad for mental health
English
1
0
2
84
Luis
Luis@iflessthan3·
ok fine. t3 code is good @theo but 1. Let me archive threads like Codex app. I only care to keep in the main view the threads I'm currently working on. But don't want to outright delete old threads 2. Let me pin projects. Then I can have the best of both worlds from manual sorting and "last user message" sorting
English
4
0
44
5.2K
matthew.
matthew.@topmass·
@okinalog thank you Japanese people for showing me - i love japan :) I will buy one!
English
1
0
1
89
翁(おきな)
翁(おきな)@okinalog·
Overseas people may not be familiar with it, but this is a mouse called "Trackball Mouse" that Japanese people love very much. Move the pointer by sliding your thumb without shaking your arm by a millimeter. The old generation of mice had a ball on the bottom, but I have the impression that it is a thumb version. The advantage is that it can reduce the burden on the wrist and save space. More than anything else, the smooth rotation of this ball is comfortable. If you have the opportunity, please give it a try. In addition, recently, even unknown devices beyond the scope of the mouse have appeared. "Nape Pro" You can place your favorite commands on all 6 keys, and you can also move the pointer. And it's super small. This may be an effective device for those who do not have a stereotype about the trackball mouse. Please be aware of it.
翁(おきな) tweet media翁(おきな) tweet media翁(おきな) tweet media
翁(おきな)@okinalog

ぁぁぁぁぁあああああああああああああああああああああああああああ

English
66
133
1.8K
238.2K
matthew.
matthew.@topmass·
@thekitze just got a 3gb plan for roughly 70US it is insane never had this kind of speed before but the newest first world problem has been upgrading all my hardware network cards to even take full advantage if they do fibre to your area ideally you should be able to get 1gb up / 1gb down
English
0
0
1
37
kitze 🛠️ tinkerer.club
they are finally digging to set up optic fiber on my street after 3 yrs... i can finally get rid of starlink 😭 it's gonna cost 1/4 of the price and have gigabit download and ~250 upload... twitch.tv/thekitze might be back in full swing these is my current speed ($100/mo)
kitze 🛠️ tinkerer.club tweet media
English
14
0
26
3.2K
matthew.
matthew.@topmass·
@aykutkardas @Cloudflare hono was built by a cloudflare employee I think or ex maybe? it's made for workers and they are a lil match made in heaven - I have built many small projects with two workers, one react+vite and the backend a hono worker that handled everything - scales infinitely no problems
English
1
0
2
114
Aykut
Aykut@aykutkardas·
No Next.js. No React. No TypeScript. just a simple html and js file. deployed on @Cloudflare workers. feels… enough.
English
33
31
828
127.5K
matthew.
matthew.@topmass·
@yoyonofukuoka I just wanna jump on to say I freaking love Japan and Japanese people pls japanese homies let be friends I will translate and reply always
English
0
0
1
213
kouji 🇯🇵
kouji 🇯🇵@yoyonofukuoka·
あらゆる言語が自動翻訳に対応し、世界中の人々がシームレスにコミュニケーションを取れる様になったら、国対国という従来からある構図が崩れて、常識対非常識という構図になるだろうな。
日本語
2.9K
5.1K
62.3K
55.5M
matthew.
matthew.@topmass·
@ibuildthecloud well when you realize that you're not limited anymore you would have no need to buy a macbook and thats a problem for timothy apple
English
0
0
0
25
Darren Shepherd
Darren Shepherd@ibuildthecloud·
Wait I've never paid attention to iPads. Why is this not just macos? It's like Mac OS, but you can't run a terminal. If you could put Brew on this and launch VMs this would be the perfect device for me.
Darren Shepherd tweet media
English
90
3
161
27.3K
matthew.
matthew.@topmass·
i have solved the claude rate limits and which is better problem I had and it is just to absolutely dive in head first with pro plan for codex and max 200 for claude and put my head down and build good bye
English
0
0
0
235
matthew.
matthew.@topmass·
@stevibe sweet thanks for passing the into along I'll keep in mind that low temps still being used a lot for deterministic stuff and experiment on the models I run as well 👍
English
0
0
1
44
stevibe
stevibe@stevibe·
@topmass They do! Berkeley's BFCL and Databricks both use low temp for function calling evals. Reproducibility basically requires it (github.com/ShishirPatil/g…) But yeah for creative use 0.7+ totally makes sense, different goals
English
1
1
2
87
stevibe
stevibe@stevibe·
Qwen3.5-27B went 15/15 on our tool-calling benchmark. But which quant should you actually run? Tested Unsloth's Q2_K_XL all the way to Q8_K_XL TL;DR: Q8 — 15/15 ✅ Q6 — 15/15 ✅ Q5 — 14/15 Q4 — 14/15 Q3 — 14/15 Q2 — 13/15 Q6 is the sweet spot. Same perfect score as Q8, smaller footprint. Also, the results scale almost linearly, seems like ToolCall-15 is actually measuring something real.
English
52
78
907
59.9K
matthew.
matthew.@topmass·
@stevibe That makes sense, generally I find myself on closer to SOTA models never going below 0.7 lately since most use has some creative elements. I know most labs just recommend 1 as a default now. I'm curious if they also test their models on lower temp during benchmarks
English
1
0
1
99
stevibe
stevibe@stevibe·
Good point! Temp 1 is definitely the default for general use, but for tool calling benchmarks specifically, temp 0 actually performs better. Databricks found accuracy can swing up to 10% between temp 0 and 0.7 on function-calling tasks, since it's structured output (pick the right tool, pass the right params), you want determinism, not creativity. Same reason you'd use low temp for JSON or code gen. Reference: databricks.com/blog/unpacking…
English
1
0
6
1K
matthew.
matthew.@topmass·
@theo as someone with boomer eyesight and a huge monitor I found out I could do control & +/= to zoom in and it scales great so I recommend adding buttons/a tooltip for this so people know they can easily scale the PWA to their liking
English
0
0
4
1.2K
Theo - t3.gg
Theo - t3.gg@theo·
Best part is how many of my own assumptions Julius has challenged correctly. I thought the built-in terminal wasn't important, now I use it constantly. I thought a "one click PR button" would be a slop farm. It's the only way I make PRs now. I thought the "web app" (npx t3) would be just as popular as the electron app. I haven't opened the web version since we launched. This is so fun.
English
11
2
237
25.6K
Brad Groux
Brad Groux@BradGroux·
You make this announcement BEFORE you make the change. Good riddance... what a shame @AnthropicAI and @claudeai. Customer service isn't hard, @DarioAmodei. This was a rug pool and y'all got caught. I hope the @FTC fines you.
Thariq@trq212

To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.

English
48
42
1.1K
75.6K
matthew.
matthew.@topmass·
@StirlingForge @stevibe nope this is only v2 for the opus-reasoning-distilled variant of the qwen 3.5 27b model - all the qwen models otherwise don't have versioning differences 👍
English
0
0
1
98
stevibe
stevibe@stevibe·
Which local models can actually handle tool calling? I built a framework to find out. 15 scenarios. 12 tools. Mocked responses. Temperature 0. No cherry-picking. Tested every Qwen3.5 size from 0.8B to 397B, and since some of you asked after the distillation tests: yes, I included Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled too. Only two models went all green: the 27B dense and the distilled 27B. The 397B? Failed two tests. The 122B? Failed one. The 35B? Failed two. The timed-out results — mostly on the smaller models, are cases where the model got stuck in a loop, repeating the same tool call until it hit the 30-second limit. The test that exposed the most models: "Search for Iceland's population, then calculate 2% of it." Simple, but 35B, 122B, and 397B all used a rounded number from memory instead of the actual search result. They didn't trust their own tool output. Small models hallucinate data. Big models ignore data. The 27B just threaded it through.
English
108
233
1.9K
381.3K
Justin Schroeder
Justin Schroeder@jpschroeder·
We’re open sourcing ArrowJS 1.0: the first UI framework for coding agents. Imagine React/Vue, but with no compiler, build process, or JSX transformer. It’s just TS/JS so LLMs are already *great* at it. AND run generated code securely w/ sandbox pkg. ➡️ arrow-js.com
English
82
141
1.7K
231K
Uncle Bob Martin
Uncle Bob Martin@unclebobmartin·
I gave the new Claude a try, but Codex is still better IMHO.
English
48
18
328
34K