matthew.
22.9K posts

matthew.
@topmass
AI Automation Eng • @AnthropicAI Model Safety Tester | Founder https://t.co/bilxNRUiv8 | Building https://t.co/It1eJ3D45g • Full Stack Engineer • 🇨🇦
British Columbia Katılım Eylül 2011
853 Takip Edilen9.5K Takipçiler

Wtf... Qwen3.5-35B-A3B took 3 minutes (!!) to answer my simple question, "What's on my calendar today?" via @openclaw
I don't know what these local LLM fellas are running on, but a DGX Spark sure is not the best thing
English

@topmass @bryan_johnson So real. I don't see how anyone still consumes those billions of videos of pure non-sense on Instagram and Tiktok. I had to uninstall all of them and I'm only here now too.
We want information, not brainrot
English

@Davidrejll @bryan_johnson yeah bad for longevity too BRIAN - i use none of those apps and my feed is mostly text / github based on X about tech stuff I'm interested in learning about. seems like a good thing for longevity since we know short form video content is bad for mental health
English

@topmass @bryan_johnson Exactly. Short for content is very close to lobotomy
English

@iflessthan3 @theo as a full time user I second both or these changes great suggestions
English

ok fine. t3 code is good @theo
but
1. Let me archive threads like Codex app. I only care to keep in the main view the threads I'm currently working on. But don't want to outright delete old threads
2. Let me pin projects. Then I can have the best of both worlds from manual sorting and "last user message" sorting
English

Overseas people may not be familiar with it, but this is a mouse called "Trackball Mouse" that Japanese people love very much. Move the pointer by sliding your thumb without shaking your arm by a millimeter. The old generation of mice had a ball on the bottom, but I have the impression that it is a thumb version. The advantage is that it can reduce the burden on the wrist and save space. More than anything else, the smooth rotation of this ball is comfortable. If you have the opportunity, please give it a try.
In addition, recently, even unknown devices beyond the scope of the mouse have appeared. "Nape Pro" You can place your favorite commands on all 6 keys, and you can also move the pointer. And it's super small. This may be an effective device for those who do not have a stereotype about the trackball mouse. Please be aware of it.



翁(おきな)@okinalog
ぁぁぁぁぁあああああああああああああああああああああああああああ
English

@thekitze just got a 3gb plan for roughly 70US it is insane never had this kind of speed before but the newest first world problem has been upgrading all my hardware network cards to even take full advantage
if they do fibre to your area ideally you should be able to get 1gb up / 1gb down
English

they are finally digging to set up optic fiber on my street after 3 yrs... i can finally get rid of starlink 😭 it's gonna cost 1/4 of the price and have gigabit download and ~250 upload... twitch.tv/thekitze might be back in full swing
these is my current speed ($100/mo)

English

@aykutkardas @Cloudflare hono was built by a cloudflare employee I think or ex maybe? it's made for workers and they are a lil match made in heaven - I have built many small projects with two workers, one react+vite and the backend a hono worker that handled everything - scales infinitely no problems
English

No Next.js. No React. No TypeScript.
just a simple html and js file.
deployed on @Cloudflare workers.
feels… enough.
English

@yoyonofukuoka I just wanna jump on to say I freaking love Japan and Japanese people pls japanese homies let be friends I will translate and reply always
English

@ibuildthecloud well when you realize that you're not limited anymore you would have no need to buy a macbook and thats a problem for timothy apple
English

@topmass They do! Berkeley's BFCL and Databricks both use low temp for function calling evals. Reproducibility basically requires it (github.com/ShishirPatil/g…)
But yeah for creative use 0.7+ totally makes sense, different goals
English

Qwen3.5-27B went 15/15 on our tool-calling benchmark.
But which quant should you actually run?
Tested Unsloth's Q2_K_XL all the way to Q8_K_XL
TL;DR:
Q8 — 15/15 ✅
Q6 — 15/15 ✅
Q5 — 14/15
Q4 — 14/15
Q3 — 14/15
Q2 — 13/15
Q6 is the sweet spot. Same perfect score as Q8, smaller footprint.
Also, the results scale almost linearly, seems like ToolCall-15 is actually measuring something real.
English

@stevibe That makes sense, generally I find myself on closer to SOTA models never going below 0.7 lately since most use has some creative elements. I know most labs just recommend 1 as a default now. I'm curious if they also test their models on lower temp during benchmarks
English

Good point! Temp 1 is definitely the default for general use, but for tool calling benchmarks specifically, temp 0 actually performs better.
Databricks found accuracy can swing up to 10% between temp 0 and 0.7 on function-calling tasks, since it's structured output (pick the right tool, pass the right params), you want determinism, not creativity. Same reason you'd use low temp for JSON or code gen.
Reference:
databricks.com/blog/unpacking…
English

Best part is how many of my own assumptions Julius has challenged correctly.
I thought the built-in terminal wasn't important, now I use it constantly.
I thought a "one click PR button" would be a slop farm. It's the only way I make PRs now.
I thought the "web app" (npx t3) would be just as popular as the electron app. I haven't opened the web version since we launched.
This is so fun.
English

I'm supposed to be filming right now. Instead I'm on PR #3.
T3 Code is the most addicting coding experience I've ever had. It makes me wish I had more time for code :(
Theo - t3.gg@theo
Every time I use T3 Code I'm genuinely blown away by how good it is. Each update keeps making it more addicting. Never underestimate @jullerino, he's cooking.
English

@sinnformer @BradGroux @AnthropicAI @claudeai @DarioAmodei @FTC @DigitalMeld lol brother are you saying you think the open source completely free project of openclaw is a competitor to anthropic ??
English

@BradGroux @AnthropicAI @claudeai @DarioAmodei @FTC @DigitalMeld ok buddy, you just have the flair for funsies

English

You make this announcement BEFORE you make the change. Good riddance... what a shame @AnthropicAI and @claudeai. Customer service isn't hard, @DarioAmodei. This was a rug pool and y'all got caught.
I hope the @FTC fines you.
Thariq@trq212
To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before.
English

@StirlingForge @stevibe nope this is only v2 for the opus-reasoning-distilled variant of the qwen 3.5 27b model - all the qwen models otherwise don't have versioning differences 👍
English

Which local models can actually handle tool calling?
I built a framework to find out.
15 scenarios. 12 tools. Mocked responses. Temperature 0. No cherry-picking.
Tested every Qwen3.5 size from 0.8B to 397B, and since some of you asked after the distillation tests: yes, I included Jackrong's Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled too.
Only two models went all green: the 27B dense and the distilled 27B.
The 397B? Failed two tests. The 122B? Failed one. The 35B? Failed two.
The timed-out results — mostly on the smaller models, are cases where the model got stuck in a loop, repeating the same tool call until it hit the 30-second limit.
The test that exposed the most models: "Search for Iceland's population, then calculate 2% of it." Simple, but 35B, 122B, and 397B all used a rounded number from memory instead of the actual search result. They didn't trust their own tool output.
Small models hallucinate data.
Big models ignore data.
The 27B just threaded it through.
English

@topmass @TFP_tweeets @jpschroeder Have you checked out OpenUI. It faster and built based on a year of running GenUI in production.
Disclaimer: I'm one of the maintainers.
English

We’re open sourcing ArrowJS 1.0: the first UI framework for coding agents.
Imagine React/Vue, but with no compiler, build process, or JSX transformer. It’s just TS/JS so LLMs are already *great* at it.
AND run generated code securely w/ sandbox pkg.
➡️ arrow-js.com
English





