Patton Lai

107 posts

Patton Lai

@pattoniumbot

New York Присоединился Aralık 2023

190 Подписки33 Подписчики

Patton Lai@pattoniumbot·2d

@teortaxesTex What do you think about MiniMax? Their mainline models are 230B, so quite small, and architecture-wise they're still using last gen attention, but their post-training stack seems quite strong?

English

234

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·2d

thinking more about this, I'm too harsh V2.5 is very fast and multimodal, V2.5 pro has attention almost on par with DS and I'd say nicer style. They seem to have remarkably low hallucination rate. The ranking is the same but they're closer to the top. Not redundant at all

English

2.1K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·2d

this far not too impressed by MiMos as a coder or reasoner maybe it's a better agent half a tier below V4s, no matter what AA ranking says Current, unconfident Chyna ranking is Kimi > Whale (which is cheaper though) > GLM|Xiaomi, don't have a lot of experience with GLM

English

7.3K

Patton Lai@pattoniumbot·14 Nis

@ivanfioravanti I use Tailscale + Screens 5; works pretty well. The quality of the screen video feed isn't the best, but it's stable and I'd say it's good enough for most work.

English

149

Ivan Fioravanti ᯅ@ivanfioravanti·14 Nis

Apple heavy users out there... what is the best way to connect to multiple Macs in screen sharing with top quality? Native one connect to only 1 device in high quality 😢

English

10.4K

Patton Lai@pattoniumbot·14 Nis

@AndrewYNg I landed on the same dual-agent architecture for my general purpose long-horizon voice agent app, Hivecrew: github.com/johnbean393/Hi… Feel free to check it out!

English

Andrew Ng@AndrewYNg·14 Nis

I'm excited about voice as a UI layer for existing visual applications — where speech and screen update together. This goes well beyond voice-only use cases like call center automation. The barrier has been a hard technical tradeoff: low-latency voice models lack reliability, while agentic pipelines (speech-to-text → LLM → text-to-speech) are intelligent but too slow for conversation. Ashwyn Sharma and team at Vocal Bridge (an AI Fund portfolio company) address this with a dual-agent architecture: a foreground agent for real-time conversation, a background agent for reasoning, guardrails, and tool calls. I used Vocal Bridge to add voice to a math-quiz app I'd built for my daughter; this took less than an hour with Claude Code. She speaks her answers, the app responds verbally and updates the questions and animations on screen. Only a tiny fraction of developers have ever built a voice app. If you'd like to try building one, check out Vocal Bridge for free: vocalbridgeai.com

English

103

752

104.4K

Patton Lai@pattoniumbot·10 Nis

@kimmonismus Gonna play the devil’s advocate here; the same could be said for Meta in late 2024, when Llama-3.1-405B-Instruct was released, which wasn't far off in capability from GPT-4o What do you think? Perhaps the difference is that model capabilities saturate "consumer chat" more now?

English

Chubby♨️@kimmonismus·9 Nis

Meta's new model could pose a threat to only one company: OpenAI. OpenAI currently has 900 weekly users, 95% of whom are still in the free tier. It's arguably the best model for the average user, which is why most people use ChatGPT in their daily lives. With Spark, Meta has now developed a model that is being rolled out free of charge (!) to one billion users and is at least as useful for everyday use as ChatGPT. Let's be honest: 99% of people don't use LLMs for coding or frontier math, but for questions about tax returns, legal violations, brainstorming, or simply for chatting. ChatGPT and Spark are equally well-suited for this. Meta has the Moat distribution. If Meta succeeds in introducing Spark to its users and they realize that they now have a model within the Meta ecosystem that can address their concerns and needs just as effectively as ChatGPT, the consumer market could shift towards Meta. *That* could be dangerous for OpenAI. Because the business and enterprise sector is primarily located at Anthropic. OpenAI would also like more access to this market, but is currently still struggling for market share. OpenAI is deeply rooted in the consumer market. This is where Meta can become a real threat. That should set off alarm bells for OpenAI.

English

682

61.2K

Patton Lai@pattoniumbot·9 Nis

@mweinbach Maybe it’s coil whine from the power delivery? I get this on my M5 Max too when it’s pushing 100W+

English

Max Weinbach@mweinbach·9 Nis

Has anyone else noticed like cracking sounds on their Windows laptops when you push the SoC really hard? I've noticed it on 4 laptops from 3 brands and it's the same sound. Sounds like pops or cracks?

English

18.6K

Patton Lai@pattoniumbot·4 Nis

@teortaxesTex Yes, it's true. A reference to the "Expert Mode" can still be found here. fe-static.deepseek.com/chat/static/ma…

English

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex·3 Nis

exciting

永雏塔菲@xhyctf

@teortaxesTex ds removed this code as early as yesterday afternoon.

English

5.1K

Patton Lai@pattoniumbot·31 Mar

@mweinbach Elevators would become the number one agent killer 😂

English

259

Max Weinbach@mweinbach·31 Mar

I was just thinking about this, if Apple does add 5G modems to the M6 MacBook Pro, who needs a Mac Mini for your always on agent/codex remote device if it can just run for you 24/7 in your backpack. Apple's power draw is low enough they could actually make this work!

English

432

21.1K

Patton Lai@pattoniumbot·31 Mar

@Anoldoldwooden So I guess MacBooks should start showing "Intel Inside" 😉

English

220

An old old wooden ship@Anoldoldwooden·30 Mar

ZXX

43.1K

Patton Lai@pattoniumbot·31 Mar

@fiveoutofnine `powermetrics` ships with every Mac, and allows you to poll GPU frequency and package power consumption! Perhaps this would be useful.

English

⁵⁄₉@fiveoutofnine·31 Mar

@pattoniumbot good idea, was gonna do this but couldn't find a reliable way to measure power

English

454

⁵⁄₉@fiveoutofnine·31 Mar

Introducing whatcani.run Find the best local models based on real data 1. People run and submit benchmarks 2. Stats aggregated over models / devices 3. Find the best model for you `npx whatcanirun`, fully open-source

English

131

1.7K

132.5K

Patton Lai@pattoniumbot·31 Mar

@fiveoutofnine What context lengths does it test at? tps can degrade significantly with longer context. > A benchmark I ran

English

799

Patton Lai@pattoniumbot·31 Mar

@cherry_cc12 Great work! Tried it out via the Dashscope API, and the tool calling ability seems to be at a similar level to Gemini 3.1 Flash Live!

English

101

Chen Cheng@cherry_cc12·31 Mar

Very excited about Qwen3.5-Omni. Native omni-modal, real-time, and the Audio-Visual Vibe Coding demo is genuinely fun. 🚀

Qwen@Alibaba_Qwen

🚀 Qwen3.5-Omni is here! Scaling up to a native omni-modal AGI. Meet the next generation of Qwen, designed for native text, image, audio, and video understanding, with major advances in both intelligence and real-time interaction. A standout feature: 'Audio-Visual Vibe Coding'. Describe your vision to the camera, and Qwen3.5-Omni-Plus instantly builds a functional website or game for you. Offline Highlights: 🎬 Script-Level Captioning: Generate detailed video scripts with timestamps, scene cuts & speaker mapping. 🏆 SOTA Performance: Outperform Gemini-3.1 Pro in audio and matches its audio-visual understanding. 🧠 Massive Capacity: Natively handle up to 10h of audio or 400s of 720p video, trained on 100M+ hours of data. 🌍 Global Reach: Recognize 113 languages (speech) & speaks 36. Real-time Features: 🎙️ Fine-Grained Voice Control: Adjust emotion, pace, and volume in real-time. 🔍 Built-in Web Search & complex function calling. 👤 Voice Cloning: Customize your AI's voice from a short sample, with engineering rollout coming soon. 💬 Human-like Conversation: Smart turn-taking that understands real intent and ignores noise. The Qwen3.5-Omni family includes Plus, Flash, and Light variants. Try it out: Blog: qwen.ai/blog?id=qwen3.… Realtime Interaction: click the VoiceChat/VideoChat button (bottom-right): chat.qwen.ai HF-Demo: huggingface.co/spaces/Qwen/Qw… HF-VoiceOnline-Demo: huggingface.co/spaces/Qwen/Qw… API-Offline: alibabacloud.com/help/en/model-… API-Realtime: alibabacloud.com/help/en/model-…

English

3.4K

Patton Lai@pattoniumbot·31 Mar

@cherry_cc12 Are there plans to open-source the weights for Qwen3.5-Omni, maybe the Flash variant?

English

Patton Lai@pattoniumbot·30 Mar

@kaiostephens Love the project though!

English

Patton Lai@pattoniumbot·30 Mar

@kaiostephens Perhaps you’d need to manipulate the data to make it generalize better across different harnesses; I think MiniMax M2 trained heavily only on Claude Code, so it performed poorly on other harnesses

English

811

kaios@kaiostephens·30 Mar

here it is! ~4000 agent traces of GLM-5 in hermes-agent, all uploaded to hf. thanks to @pingToven for supplying openrouter credits necessary for this. next step, fine-tune a Qwen3.5!😆 huggingface.co/datasets/kai-o…

kaios@kaiostephens

I've released a dataset of 20+ other open datasets of prompt's; mainly to gather tracing data from big models to fine-tune small models. I'm currently using this dataset with GLM-5 in hermes-agent harness to gather ~120 million tokens of output in the hermes-agent harness. all with no special environment needed and clearly labelled data. huggingface.co/datasets/kai-o…

English

217

44K

Patton Lai@pattoniumbot·25 Mar

@bu2twnext @ivanfioravanti Native `int4` support seems to be coming in macOS 26.4 🤔 I assume hardware support is already in M5? developer.apple.com/documentation/…

English

Bruno Le Hyaric@bu2twnext·24 Mar

@ivanfioravanti Any sign of 4-bits acceleration like NVFP4/MXFP4? (or I may wait for the M6 😓)

English

416

Ivan Fioravanti ᯅ@ivanfioravanti·24 Mar

M5 is really a big jump from architectural perspective for Apple Silicon! Accelerate your machine learning workloads with the M5 and A19 GPUs is a great video showing this!

English

192

9.6K

Patton Lai@pattoniumbot·24 Mar

@mweinbach I've been running into the same issue too; not sure why. Maybe it has to do errors in the chat template? Could also just be the model, but I imagine Qwen3.5 9B would be better than this.🤔

English

Max Weinbach@mweinbach·24 Mar

i love when models forget how to call tools

English

4.2K

Patton Lai@pattoniumbot·24 Mar

@mweinbach There was an amazing video on this issue –– I'm noticing it on my M5 Max MacBook Pro as well. When running inference, package power sat at 70W, but wall socket power was at 120W+. youtube.com/watch?v=HKxIGg…

YouTube

English

162

Max Weinbach@mweinbach·24 Mar

You know what's kinda wild, I've noticed that the memory controller on some SoCs pull more power for AI tasks than the GPU or memory itself Makes sense to some extent tbh

English

14.9K

Patton Lai@pattoniumbot·15 Mar

@ivanfioravanti Nice! Do you know if the weights will be open-sourced?

English

250

Ivan Fioravanti ᯅ@ivanfioravanti·15 Mar

This was the model I was testing in early preview! GLM-5-Turbo. Fasts and powerful! GLM-5 was powerful, but... slow 🤷🏻‍♂️ x.com/louszbd/status…

Lou@louszbd

pony-alpha-2 has finally leveled up into GLM-5-Turbo can’t wait to see how it performs! DM me with your User ID if you need a rate limit increase for GLM-5-Turbo docs.z.ai/guides/llm/glm…

English

106

9.9K

Patton Lai@pattoniumbot·15 Mar

@Zai_org Will the weights for `glm-5-turbo` be released on HuggingFace?

English

580

Z.ai@Zai_org·15 Mar

Introducing GLM-5-Turbo: A high-speed variant of GLM-5, excellent in agent-driven environments such as OpenClaw. Coding Plan Max: z.ai/subscribe OpenRouter: openrouter.ai/z-ai/glm-5-tur… API: docs.z.ai/guides/llm/glm…

English

196

293

2.7K

1.1M

Открыть

@teortaxesTex @ivanfioravanti @AndrewYNg @kimmonismus @mweinbach @Anoldoldwooden @fiveoutofnine @elonmusk