乌鱼小子

1.7K posts

乌鱼小子

乌鱼小子

@mintisan

走出自己的舒适区,然后跑进别人的舒适区去瞅瞅.

Shenzhen.China Katılım Mart 2014
1.5K Takip Edilen128 Takipçiler
Dan Woods
Dan Woods@danveloper·
We’re gonna see someone running a 1T model at 100tok/s on a $2500 laptop by like a month from now. Long Apple, this is the best AI inference platform.
English
16
11
225
15.4K
Mayank Pratap Singh
Mayank Pratap Singh@Mayank_022·
I coded a Speech-to-Text model from scratch. 𝐇𝐞𝐫𝐞 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐥𝐨𝐠 𝐟𝐨𝐫 𝐭𝐡𝐞 𝐬𝐚𝐦𝐞: blogs.mayankpratapsingh.in/chapters/speec… No APIs. No pre-trained models. Just PyTorch, an A100 GPU, and hours of debugging. This started months ago. I wanted to understand how machines hear. Not surface-level understanding. I wanted to build the whole thing myself. So I built it piece by piece: autoencoders, VAEs, VQ-VAEs, Residual Vector Quantization, and CTC loss. Each one took days to get right. Trained for 3 hours on 13,100 audio clips. Got complete garbage. Changed the tokenizer from BPE to character-level. Rechecked everything. Asked @neural_avb who built STT models before. His answer: these models are tricky to train and need days of compute, not hours. Cut the dataset to 200 clips. After 2 hours, actual words appeared. Overfitted? Absolutely. But watching noise turn into recognizable English was satisfying. I have made a blog about this as well so you can learn about the same and my process - Audio fundamentals and waveform representation - Why attention breaks on raw audio - Convolutional downsampling - Transformer encoder with positional encoding - Vector Quantization, straight-through estimator, and RVQ - CTC loss and greedy decoding - Full training loop with VQ loss warmup - What went wrong and what finally worked Resources: - Blog: blogs.mayankpratapsingh.in/chapters/speec… - Code: github.com/Mayankpratapsi… More Resoures CTC loss distill.pub/2017/ctc/ @neural_avb videos @avb_fj" target="_blank" rel="nofollow noopener">youtube.com/@avb_fj SoundStream Paper arxiv.org/abs/2107.03312 LJ speech dataset keithito.com/LJ-Speech-Data… wav2vec paper arxiv.org/abs/2006.11477 RVQ blog drscotthawley.github.io/blog/posts/202… Next up: I've already trained two TTS architectures from scratch. Video post about those coming soon. But first, I'm dropping a visual breakdown of Vision Transformers, covering how they work and how to fine-tune them. Follow me @Mayank_022 you're into audio deep learning. Repost so others can find this
English
21
37
447
18.8K
全麦面包🍞
全麦面包🍞@im_magneto·
以前我的观点一直是:神经网络不仅慢负载也高,简单任务还是应该依赖传统算法解决。 结果 我这几个月从头理了一遍模型的设计思路 现在我的分类模型在性能不如 Cortex-A53 的单核MCU上跑模型 识别任务不到 1ms 完成. 比起来,我的传统算法处理简单色彩检测的任务居然要 2ms,还不准确……
中文
33
5
252
33.3K
李继刚
李继刚@lijigang·
言行不一的制度,会催生出表演。
中文
13
40
380
30.1K
花果山大圣
花果山大圣@shengxj1·
脱脂牛马 这个词太尼玛逗了 下班健身本身很自律的行为一下子变得好苦
中文
36
39
595
55K
STRRL.gpt
STRRL.gpt@strrlthedev·
说起来一台 M 芯片的 macbook / macmini 理论上是可以同时作为 github action 的 macos arm64, linux arm64, linux amd64 的 runner 的. 同时个人 Pro 和 Team 下 GitHub Action 的额度实在是少, 于是我们自己怒搓了一个工具, 可以一键把自己的 mac 注册成 runner 用 😡 快做完了, 做完就开源
中文
19
5
125
15.4K
Ivan Fioravanti ᯅ
Ivan Fioravanti ᯅ@ivanfioravanti·
MiniMax M2.7 first benchmarks look great, confirming the good vibes! Thanks @v2fffvxhyz for sharing this!
Ivan Fioravanti ᯅ tweet media
English
8
12
171
11.7K
yazin
yazin@yazins·
Introducing: OpenGranola 🔥 I built an open source meeting copilot for macOS. It transcribes both sides of your call on-device, searches your own notes in real time, and hands you talking points right when the conversation needs them. No audio leaves your Mac. Point it at a folder of markdown files, pick any LLM through OpenRouter (Claude, GPT-4o, Gemini, Llama), and it just works. It's invisible to screen share too — nobody knows you have it. The whole thing is open source. Link below
English
161
107
2.3K
283.1K
0xSero
0xSero@0xSero·
I got access to MiniMax-M2.7 It successfully setup Hermes agent on my Mac mini (sshing from my MacBook to it) got tg setup and ported my Droid config there. Excellent Send your prompts I will pick a few to show off in a video. A prompt you send will go straight to MiniMax.
0xSero tweet media
English
43
6
308
19.6K
Belen Alastruey
Belen Alastruey@b_alastruey·
Happy to share 🌍Omnilingual Machine Translation🌍 In this work @AIatMeta we explore translation systems supporting 1,600+ languages. We show how our models (1B to 8B) can outperform baselines of up to 70B while having much larger language coverage. 📄:ai.meta.com/research/publi…
Belen Alastruey tweet media
English
11
43
187
22K
Unsloth AI
Unsloth AI@UnslothAI·
Introducing Unsloth Studio ✨ A new open-source web UI to train and run LLMs. • Run models locally on Mac, Windows, Linux • Train 500+ models 2x faster with 70% less VRAM • Supports GGUF, vision, audio, embedding models • Auto-create datasets from PDF, CSV, DOCX • Self-healing tool calling and code execution • Compare models side by side + export to GGUF GitHub: github.com/unslothai/unsl… Blog and Guide: unsloth.ai/docs/new/studio Available now on Hugging Face, NVIDIA, Docker and Colab.
English
214
822
5K
1.4M
乌鱼小子
乌鱼小子@mintisan·
@hwwaanng 本来就是抽卡,多抽几次就可以成功率上来😁
中文
0
0
0
882
Hwang
Hwang@hwwaanng·
现在 Coding 就是,水多了加面,面多了加水。 Claude 不行换 Codex,Codex 不行就试试庸医 Gemini ,可能有奇效。
中文
55
29
439
55.3K
Mr Panda
Mr Panda@PandaTalk8·
求教推上的大佬们, 国产的模型选前三名, 你会选哪一家。
中文
130
0
60
67.9K
乌鱼小子
乌鱼小子@mintisan·
@onevcat 如果可以记录提交与提交之间所有给 llm 的输入/输出,就更好了
中文
0
0
0
582
onevcat
onevcat@onevcat·
实际用 jj github.com/jj-vcs/jj 也有一段时间了,写了一篇关于 jj 的安利。Git 是协作的标准,但在本地和 AI agent 配合干活这件事上,意外地 jj 的心智模型明显更合适。简单高效不中断,给 agent 的提示词终于可以只聊业务不聊流程了。文中有实际场景对比和一个配套的 agent skill,也欢迎取用。 onevcat.com/2026/03/jj-for…
onevcat tweet media
中文
12
14
160
32.7K
Matthew Berman
Matthew Berman@MatthewBerman·
.@nvidia hand delivered a pre-production unit of the @Dell Pro Max with GB300 to my house. 100lbs beast with 750GB+ of unified memory to power the best open-source models in the world. What should I test first?
English
302
103
1.9K
251.4K