Bytefer

157 posts

Bytefer

@TBytefer

Building Vidpai, a local‑first AI content creation studio for Mac that runs open‑source models on your machine. https://t.co/7IN7nJTOl8

参加日 Ağustos 2022

140 フォロー中303 フォロワー

Bytefer@TBytefer·8 Mar

@Prince_Canuma Thanks, I'll test it out.

English

Prince Canuma@Prince_Canuma·8 Mar

@TBytefer My pleasure! Yes, it does output progress if you stream

English

100

Prince Canuma@Prince_Canuma·7 Mar

mlx-audio v0.4.0 is here 🚀 What's new: → Qwen3-TTS: fastest generation on Apple silicon and first batch support. > Sequential (<80 ms TTFB at 2.75x realtime) > Batch support (<210 ms TTFB at 4.12x for batch of 4-8) → Audio separation UI & server → nvfp4, mxfp4, mxfp8 quantization → Streaming /v1/audio/speech endpoint → Realtime STT streaming toggle New models: → Echo TTS → Voxtral Mini 4B, → MingOmni TTS (MoE + Dense) → KittenTTS → Parakeet v3 → MedASR → Spoken language identification (MMS-LID) → Sortformer diarization + Smart Turn v3 semantic (VAD) Plus fixes for Kokoro Chinese TTS, Pocket TTS, Whisper, Qwen3-ASR, and more. Thank you very much to @lllucas, @beshkenadze, @KarnikShreyas, @andimarafioti, @mnoukhov and welcome the 13 new contributors 🙌🏽 Get started today: > pip install -U mlx-audio Leave us a star ⭐ github.com/Blaizzy/mlx-au…

English

269

21.5K

Bytefer@TBytefer·6 Mar

Chatterbox Turbo beats ElevenLabs Turbo v2.5 & Cartesia Sonic 3 is unbelievable. 1. Paralinguistic tags 2. <150ms time-to-first-sound 3. Instant voice cloning from just 5 seconds of audio. @Prince_Canuma Thanks to mlx-audio, you can easily use it in Vidpai.

English

Bytefer@TBytefer·6 Mar

npx skills add vercel-labs/agent-browser --skill dogfood dogfood — Systematic exploratory testing. Navigates an app like a real user, finds bugs and UX issues, and produces a structured report with screenshots and repro videos. agent-browser.dev/skills

English

Bytefer@TBytefer·6 Mar

My new QA agent is here 🚀 🧭 Explored like a real user (pages, forms, buttons) 📸 Caught bugs with numbered repro steps + screenshots 🎥 Recorded videos for interactive issues 📊 Classified by severity 📝 Generated a markdown report (ready for the team)

English

Bytefer@TBytefer·4 Mar

ZXX

Bytefer@TBytefer·4 Mar

Got a great video that dies in silence because there are no captions? Use local AI to generate perfectly synced subtitles, then style them into eye‑catching captions — ready for TikTok, Reels, Shorts, and X.

English

Bytefer@TBytefer·4 Mar

@DnuLkjkjh You can build upon the FluidAudio library and utilize the NVIDIA Parakeet model for transcription, which offers both high speed and greater accuracy. For purely English scenarios, use the V2 model; for multilingual scenarios, use the V3 model.

English

dnu@DnuLkjkjh·4 Mar

mlx-audio is seriously underrated. I've been running whisper models through MLX for on-device transcription on M-series Macs — the performance jump from M4 to M5 should make real-time transcription of 2+ hour recordings completely smooth. right now M4 handles it but you can feel the memory pressure on long sessions with the larger models

English

Bytefer@TBytefer·4 Mar

🔥 M5, M5 Pro, M5 Max are finally here! The real game-changer? On-device AI processing: 🎧 Blaizzy/mlx-audio ⚡️ FluidInference/FluidAudio Two open-source projects that turn your Mac into an AI-powered Swiss Army knife for audio & video. #M5 #AI #OpenSource #Mac

English

130

Bytefer@TBytefer·4 Mar

@Prince_Canuma The new era of MLX has arrived, and mlx-audio is my favorite open-source project. Thanks to the MLX King.

English

Prince Canuma@Prince_Canuma·3 Mar

M5 Pro & M5 Max just dropped! Here are the specs: > M5 Pro: 18-core CPU, 20-core GPU, 307 GB/s bandwidth > M5 Max: 18-core CPU, 40-core GPU, 614 GB/s bandwidth > Neural Accelerators in every GPU core. 4x AI performance jump. Up to 128GB unified memory. Running LLMs locally on Apple Silicon using MLX has never looked this good.

English

143

10.9K

Bytefer@TBytefer·4 Mar

@lmstudio This is awesome! LM Studio is my best partner. Running local MLX models is so convenient.

English

134

LM Studio@lmstudio·3 Mar

LM Studio featured in Apple's new M5 announcement! 🥳🚀

English

1.1K

60.1K

Bytefer@TBytefer·3 Mar

👉 Try Vidpai and start your local AI studio today. vidpai.com/en

English

Bytefer@TBytefer·3 Mar

On a Mac but still feel locked out of “real” AI tools? Most of them are like: ❌ No NVIDIA, no AI studio for you That’s why I built Vidpai. ✅ Run OpenAI / NVIDIA / Alibaba open‑source models locally on your Mac ✅ TTS, STT, voice cloning, creating AI videos

English

110

Bytefer@TBytefer·21 Kas

@ludrex35 It might be a network issue during initialization, you could try refreshing the page.

English

Ludrex@ludrex35·21 Kas

@TBytefer I'd love to use this, but it's getting stuck at 95%

English

Bytefer@TBytefer·21 Kas

It's not just fast. It's >12,000 characters per second fast. 🚀 Supertonic TTS performs 100% on your device: ⚡ High-end GPU: 12,000+ chars/sec 💻 Your Laptop: ~2,500 chars/sec How fast is YOUR machine? Benchmark it now! tts.batchtool.com #Gemini3 #TTS #AI

English

352

Bytefer@TBytefer·5 Haz

@paoloanzn VIDEO

English

4nzn@paoloanzn·4 Haz

the creator who's banking $47K monthly from TikTok asked me not to share this automation... these brain-rot style videos are pulling 10M+ views consistently on TikTok and YouTube Shorts faceless channels are banking $15K-$50K monthly from this exact format here's what my n8n workflow does automatically: - generates viral Peter Griffin & Stewie conversations - clones their actual voices using AI - overlays them on Minecraft parkour gameplay - adds perfectly timed subtitles - renders complete videos ready to upload these videos exploit dopamine triggers the background gameplay keeps viewers glued while the AI-generated dialogue delivers the actual content it's psychological manipulation at its finest I spent 40+ hours perfecting this: - gathering rare audio samples for voice cloning - training custom AI voice models for both characters - building the complete automation workflow - testing render quality and timing the results speak for themselves channels using this format are going from 0 to 100K followers in weeks some are hitting 50M+ monthly views with zero effort after setup the workflow handles everything: -content ideation -script generation -voice synthesis - video composition - final rendering you literally press one button and get viral-ready content while everyone's still manually editing videos, smart creators are scaling with automation like this the opportunity window is MASSIVE right now these formats are exploding across all platforms Comment "VIDEO" + repost this + follow me (MUST DO ALL) and I'll send you the complete JSON file plus setup guide a lot of people will ingore this and last yet another occasion to finally make it don't sleep on it.

English

1.3K

4.6K

758.7K

Bytefer@TBytefer·29 Kas

@tombielecki @xenovacom @deepseek_ai Download the model on first use, then use the Cache API to save the downloaded model: developer.mozilla.org/en-US/docs/Web….

English

Tom Bielecki@tombielecki·28 Kas

@xenovacom @deepseek_ai Are the models somehow cached locally? Or do users have to wait for the model to download each time they open the page?

English

207

Xenova@xenovacom·28 Kas

We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🤯 Let's take a look: 🔀 Janus from @deepseek_ai for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text) 👁️ Qwen2-VL from @alibaba_qwen for dynamic-resolution image understanding 🔢 JinaCLIP from @JinaAI_ for general-purpose multilingual multimodal embeddings 🌋 LLaVA-OneVision from @ByteDanceOSS for Image-Text-to-Text generation 🤸‍♀️ ViTPose for pose estimation 📄 MGP-STR for optical character recognition (OCR) 📈 PatchTST & PatchTSMixer for time series forecasting That's right, everything running 100% locally in your browser (no data sent to a server)! 🔥 Huge for privacy! Check out the release notes for more information. 👇

English

199

20.3K

Bytefer@TBytefer·29 Kas

@xenovacom @deepseek_ai It supports mathematical formulas and has the capability to understand images with dynamic resolution, which is truly impressive.

English

107

Bytefer@TBytefer·29 Kas

@ollama 👏，Ollama is so powerful, with it I was able to develop ollama-ocr: github.com/bytefer/ollama…

English

2.7K

ollama@ollama·28 Kas

We've made tool calls work with streaming responses in Ollama 0.4.6! Happy Thanksgiving! github.com/ollama/ollama/…

English

556

27.5K

Bytefer@TBytefer·29 Kas

@TheJSDev That's so cool, thanks for the recommendation.

English

The JavaScript Dev@TheJSDev·29 Kas

Discover how to leverage Llama 3.2-Vision with Ollama-OCR for precise optical character recognition, preserving text structure and format. { author: @TBytefer } #DEVCommunity #JavaScript dev.to/bytefer/ollama…

English

582

ディスカバー

@Prince_Canuma @lllucas @beshkenadze @KarnikShreyas @andimarafioti @mnoukhov @DnuLkjkjh @lmstudio