Kam Moriss

135 posts

Kam Moriss

@KamMoriss

Grind

Katılım Mayıs 2024

12 Takip Edilen2 Takipçiler

Kam Moriss retweetledi

Nav Toor@heynavtoor·2d

🚨 ElevenLabs charges $5 to $99/month for AI voice cloning. Their Business plan costs $1,320/month. Someone open sourced a voice AI that clones any voice from a short clip. 30 languages. Studio quality. Free. It's called VoxCPM2. Give it a short clip of anyone's voice. It clones their accent, emotion, tone, and pacing. Then generates any speech you want in their exact voice. 48kHz studio quality. Type "A young woman, gentle and sweet voice" and it creates that voice from scratch. No reference audio. No voice actor. No recording. You describe a voice in words. It builds it. 2 billion parameters. Trained on 2 million hours of speech. 30 languages. One command to install: pip install voxcpm Here's what VoxCPM2 does: → Voice Design: describe any voice in words. Gender, age, tone, emotion, pace. AI creates it from nothing. No reference audio needed. → Voice Cloning: upload a short audio clip. AI clones the voice perfectly. Timbre, accent, rhythm, pacing. → Controllable Cloning: clone a voice AND control the emotion. "Slightly faster, cheerful tone." Done. → Ultimate Cloning: provide audio + transcript. Every vocal nuance faithfully reproduced. → 30 languages. Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Spanish, and 21 more. No language tags needed. → Context-aware. It reads the text and adjusts emotion and rhythm automatically. News sounds like news. Stories sound like stories. → Real-time streaming. RTF as low as 0.13 on an RTX 4090. Faster than playback speed. → Runs on 8GB of VRAM. → Fine-tune with 5 to 10 minutes of your own audio using LoRA. Build a custom voice model. → 48kHz output. Studio quality. No external upsampler needed. Here's the wildest part: On the Minimax-MLS voice similarity benchmark: → English: VoxCPM2 scores 85.4%. ElevenLabs scores 61.3%. → Chinese: VoxCPM2 scores 82.5%. ElevenLabs scores 67.7%. → Arabic: VoxCPM2 scores 79.1%. ElevenLabs scores 70.6%. A free, open source model is producing more realistic voice clones than a service that charges up to $1,320/month. Professional voice actors charge $250 to $1,000+ per project. AI voice platforms charge $5 to $100/month. Recording studios charge $200/hour. This runs on your GPU. Locally. No API costs. No per-character pricing. No subscription. Free forever. Already hit #1 on GitHub Trending. Built by OpenBMB and Tsinghua University. 2 billion parameters. Apache 2.0 License. Free for commercial use. 100% Open Source.

English

100

603

4.5K

429.4K

Kam Moriss retweetledi

Om Patel@om_patel5·2d

THIS GUY BUILT A TOOL THAT LETS CLAUDE CODE AUTONOMOUSLY TEST YOUR ENTIRE iOS APP you point it at a simulator and say "test everything" Claude navigates the whole app on its own through the accessibility tree and screenshots. it figures out the UI by itself. it taps buttons, fills forms, opens every screen, tests every feature, and checks every flow in 8 minutes it found every bug the developer missed then it checked the debug logs for errors and gave a structured summary of everything it found no XCUITest scripts, no test maintenance, and no more writing confusing, complicated test cases one prompt and that's it

English

165

268.8K

Kam Moriss retweetledi

Rimsha Bhardwaj@heyrimsha·3d

SOMEONE GOT TIRED OF PAYING HIGGSFIELD AI'S SUBSCRIPTION SO HE REBUILT THE WHOLE THING AND OPEN-SOURCED IT 200+ models. text-to-image, image-to-image, text-to-video, image-to-video all in one interface you configure a virtual camera in the Cinema Studio. pick the body, the lens, the focal length, the aperture and it writes the optimized cinematic prompt for you. completely in the background you never touch the camera keywords. you just set up the shot like a real cinematographer would Kling v3, Sora 2, Veo 3, Flux Dev, Midjourney v7, GPT-4o, Seedream 5.0, Runway Gen-3 all in there self-hosted. MIT licensed. runs on your machine. your data stays local the only thing you pay for is the model API calls themselves someone built this so you never have to pay Higgsfield AI again github.com/Anil-matcha/Op…

English

135

1.2K

94.8K

Kam Moriss retweetledi

Daniel Bernal@afterxleep·2d

Here's Claude tapping, scrolling, finding bugs and fixing them. Testing the entire app experience on iOS. No Xcode. No manual steps. FlowDeck gives your agent eyes on the simulator. Post + Video: flowdeck.studio/blog/2026/04/0…

English

1.7K

170.5K

Kam Moriss retweetledi

Sharbel@sharbel·3d

🚨 Hermes Agent is one of the most powerful open source AI agent frameworks alive. But... nobody knows how to use it. This just changed. Someone mapped the entire ecosystem. 568 GitHub stars in 48 hours. It's called hermes-agent-orange-book. A complete guide from zero to production. 80+ tools, skills, plugins, and integrations documented in one place. Open the repo. Follow the Orange Book chapter by chapter and deploy a fully autonomous agent in an afternoon. Every major AI lab has internal guides like this. Now you have one too.

English

205

1.7K

117.2K

Kam Moriss retweetledi

Kanika@KanikaBK·4d

🚨 JUST IN: MICROSOFT just open sourced a VOICE AI THAT TRANSCRIBES 60 MINUTES OF AUDIO in a single pass. 100% FREE. It knows who spoke. It knows when they spoke. It knows exactly what they said. All in one shot. No chunking. No context loss. It's called VibeVoice. Not a transcription tool. Not a basic speech to text wrapper. A frontier voice AI family with ASR, TTS, and real time streaming. All open source. All free. Here's what it actually does 👇 VibeVoice ASR - Speech Recognition: → Processes 60 minutes of continuous audio in a single pass → Never slices audio into chunks so global context is never lost → Identifies WHO spoke, WHEN they spoke and WHAT they said simultaneously → Supports customized hotwords for domain specific accuracy → Works in 50+ languages natively → Already adopted by Hugging Face Transformers library → Already being built on by the open source community BY PEOPLE WHO HAD NO IDEA THIS LEVEL OF ACCURACY WAS ALREADY FREE. VibeVoice TTS - Text to Speech: → Generates up to 90 minutes of speech in a single pass → Supports up to 4 distinct speakers in one conversation → Natural turn taking and speaker consistency throughout → Expressive speech that captures emotional nuances → Supports English, Chinese and multiple other languages VibeVoice Realtime - Streaming TTS: → Only 300 millisecond first audible latency → Streams text input in real time → 0.5B parameters so it actually deploys anywhere → Robust long form generation up to 10 minutes → Lightweight enough for production use today The core innovation nobody is talking about: Most voice AI models slice long audio into short chunks. Every time they slice, they lose context. Speaker tracking breaks. Semantic coherence breaks. Accuracy drops. VibeVoice uses continuous speech tokenizers running at an ultra low frame rate of 7.5 Hz. This preserves audio fidelity while dramatically boosting computational efficiency. The entire 60 minutes stays in context. Nothing gets lost. Nobody gets misidentified. The numbers: → VibeVoice ASR 7B - available now on Hugging Face → VibeVoice Realtime 0.5B - try it on Colab right now → 50+ supported languages → 11 distinct English voice styles → 9 multilingual speaker voices → Already integrated into Hugging Face Transformers → Finetuning code now available The wildest part? A voice powered input method called Vibing just built itself on top of VibeVoice ASR. Available on macOS and Windows right now. The open source community is already shipping products on top of this. 100% Open Source. Free to use. Free to fine tune. Free to build on. 🔖 Save this before your competitors find it first. 👇

English

370

2.6K

214.7K

Kam Moriss retweetledi

Matt Ronge@mronge·4d

Running AI agents on a headless Mac Mini? You still need to see what's happening. Check logs, restart stuck tasks, monitor outputs. So we built Astropad Workbench. High-performance remote desktop for Mac, works on iPad and iPhone. Built for the AI era. astropad.com/product/workbe…

English

445

84.9K

Kam Moriss@KamMoriss·3d

Oracle is cutting 20,000–30,000 employees. To redirect $8–10 billion toward AI infrastructure. That's not a layoff announcement. That's a budget reallocation memo.

English

Kam Moriss@KamMoriss·3d

Closed models vs open models is the real war in AI right now. OpenAI and Anthropic bet on closed. Meta bets on open. Which strategy wins long-term? Reply below.

English

Kam Moriss@KamMoriss·3d

And they're open-sourcing it. Meta's strategy: give it away, make it the default, own the infrastructure layer. OpenAI and Anthropic charge for access. Meta wants to make AI free and everywhere.

English

Kam Moriss@KamMoriss·3d

Meta just launched Muse Spark — first model out of Meta Superintelligence Labs. Stock jumped 8%+ on launch day. Here's what it actually does and why it matters:

English

Kam Moriss@KamMoriss·3d

OpenAI had 50% of enterprise AI spend in 2023. Now: 27%. Anthropic just hit 40%. That's not competition. That's a takeover.

English

Kam Moriss retweetledi

Romain Torres@rom1trs·3d

I built an AI Influencer automation in Arcads ... that Automatically Create UGC Videos while you sleep Comment “UGC” and I’ll send you the full workflow 👀

English

3.5K

518

5.9K

538.4K

Kam Moriss retweetledi

Suryansh Tiwari@Suryanshti777·5d

Holy shit...someone just gave Claude a real browser. Not screenshots. Not brittle selectors. Not slow MCP loops. Real Playwright code — inside a sandbox. It’s called dev-browser — and it lets AI agents control Chrome like developers do. Here’s why this is different: Instead of inventing new “agent syntax”, dev-browser just lets AI write actual browser code. goto click fill evaluate scrape screenshot Everything. And it runs in a QuickJS sandbox — so the agent gets full browser control without touching your system. That means: • Real browser automation • Zero host access risk • Persistent tabs • Multi-script workflows • Connect to existing Chrome • Full Playwright API The key idea is simple: The fastest way for an AI to use a browser is to let it write browser code itself. So an agent can literally: Open X Scroll Extract tweets Return JSON All in one run. No plugins. No extensions. No orchestration layer. No MCP complexity. Just: install → tell Claude “use dev-browser” → done. Even better, scripts run against persistent pages. So agents can: login once navigate once reuse context continue workflows Now you get things like: • autonomous research agents • AI QA testing websites • scraping without MCP overhead • multi-step browser workflows • AI that actually uses web apps • Claude operating real dashboards And the security model is clean: Playwright power QuickJS sandbox No filesystem access No host execution So agents are powerful — but contained. Benchmarks are wild too: Dev Browser 3m 53s $0.88 29 turns 100% success Faster and cheaper than typical setups like: • Playwright MCP • Chrome extensions • browser skills We’re moving from: AI that looks at the web → AI that operates the web That’s a big shift. Because once AI can control browsers reliably, it can use any software with a UI. No API needed. No integration required. Just open the page — and work. AI coworkers just got hands.

English

337

38K

Kam Moriss retweetledi

Farza 🇵🇰🇺🇸@FarzaTV·4d

Hey, I'm open-sourcing Clicky. Go forth into the wild and build the future of education and the future of AI interfaces, my friends. I'm happy to have given a spark. Enjoy! github.com/farzaa/clicky

Farza 🇵🇰🇺🇸@FarzaTV

I built this thing called Clicky. It's an AI teacher that lives as a buddy next to your cursor. It can see your screen, talk to you, and even point at stuff, kinda like having a real teacher next to you. I've been using it the past few days to learn Davinci Resolve, 10/10.

English

279

415

5.8K

514.9K

Kam Moriss@KamMoriss·4d

Anthropic built a model so powerful they won't release it publicly. Mythos: bigger than Opus, deployed only to 12 security firms to patch vulnerabilities before it can be weaponized. An AI lab holding back its best model on purpose. That's new.

English

Kam Moriss@KamMoriss·4d

Anthropic just quietly dropped a new model. It's called Mythos. It's more powerful than Opus. It wasn't trained for cybersecurity — but that's exactly how they're deploying it first. 12 partner orgs. Defensive security only. For now.

English

Kam Moriss@KamMoriss·4d

Everyone's watching OpenAI. Anthropic just quietly 3x'd its revenue in 90 days. $9B → $30B run rate. 1,000 enterprise clients spending $1M+/year. Each. The "safety-focused" AI lab is also the fastest-growing one right now.

English

Keşfet

@elonmusk @BarackObama @taylorswift13 @cristiano @BillGates @NASA @nikifrancismediavine @katyperry